Added v2 reference architecture details #7397

cwarnermm · 2024-09-13T17:39:53Z

PR changes include:

Returned v1 guidance for 100, 15K, 50K, 79K, & 88K. All v1 guidance includes c6i and r6g AWS instance sizes.
Added v2 guidance for 2K, 40K, 80K, 93K, & 100K that includes c7i and r7g AWS instance sizes.
Landing page & nav pane includes all v1 & v2 concurrent user counts ranging from 100 through 100K.
Updated all guidance tables to include vCPU & Memory. Used AWS Pricing on Demand to calculate values based on AWS Instance size, and used r6g.xlarge.search page to doc ES node values.

Outstanding for @nab-77:

Review individual deploy pages for technical accuracy against v2 testing results and update where needed.
Review & update v1 guidance tables with X present indicating TBD value.
Reduce number of individual deploy pages, if/where preferred. Once page reductions complete, @cwarnermm to add page redirects where needed.
Review & update testing methodology
Review @agarciamontoro's feedback and provide guidance on next steps
Provide guidance on where to add additional load balancer guidance, if needed
Help identify where to link to fixed database dump with 100 million posts

github-actions · 2024-09-13T17:42:50Z

Newest code from mattermost has been published to preview environment for Git SHA d4fa702

github-actions · 2024-09-13T19:22:56Z

Newest code from mattermost has been published to preview environment for Git SHA 4793a58

github-actions · 2024-09-16T15:11:54Z

Newest code from mattermost has been published to preview environment for Git SHA 6ce29ee

github-actions · 2024-09-16T17:35:52Z

Newest code from mattermost has been published to preview environment for Git SHA c4dc03b

agarciamontoro

Great work on this, @cwarnermm! Thank you so much! I've left some comments inline, but I have some general questions (and comments on the issues you mentioned in the PR description):

What do you mean with "the load balancer guidance"?
On the 100M posts DB, here's the link to the gzipped version: https://lt-public-data.s3.amazonaws.com/100M_710_withmemberships.sql.gz
Not sure if we want to directly link to the full report in the testing methodology section: https://github.com/mattermost/performance-reports/tree/main/ceiling-tests/v2#readme
Nit: should we tweak the scale to 93k users to simply 90k users? It seems somehow cleaner.
Elasticsearch: there are some configurations (80k, 93k) for which we don't list Elasticsearch support, although we have some data backing that up (although it was proved that Elasticsearch started to become clogged with ~30k users). Do we want to add that or not?

Again, thank you, massive job here! :)

agarciamontoro · 2024-09-20T15:52:13Z

source/scale/scale-to-100-users.rst

+| RDS Reader             | 0              | db.r7g.large      |
+------------------------+----------------+-------------------+
+| Elasticsearch Node     | 0              | r6g.xlarge.search |


Should we remove the rows with 0 nodes? Or do we prefer the consistency over all docs?

agarciamontoro · 2024-09-20T15:54:51Z

source/scale/scale-to-50000-users.rst

+Without Elasticsearch
+~~~~~~~~~~~~~~~~~~~~~
+
+------------------------+----------------+-------------------+
+| **Resource Type**      | **# of Nodes** | **AWS Instance**  |
+========================+================+===================+
+| Mattermost Application | 4              | c7i.2xlarge       |
+------------------------+----------------+-------------------+
+| RDS Writer             | 1              | db.r7g.2xlarge    |
+------------------------+----------------+-------------------+
+| RDS Reader             | 3              | db.r7g.2xlarge    |
+------------------------+----------------+-------------------+
+| Elasticsearch Node     | 0              | r6g.xlarge.search |
+------------------------+----------------+-------------------+
+| Proxy                  | 1              | m7i.4xlarge       |
+------------------------+----------------+-------------------+
+
+With Elasticsearch
+~~~~~~~~~~~~~~~~~~~
+
+------------------------+----------------+-------------------+
+| **Resource Type**      | **# of Nodes** | **AWS Instance**  |
+========================+================+===================+
+| Mattermost Application | 5              | c7i.2xlarge       |
+------------------------+----------------+-------------------+
+| RDS Writer             | 1              | db.r7g.2xlarge    |
+------------------------+----------------+-------------------+
+| RDS Reader             | 4              | db.r7g.2xlarge    |
+------------------------+----------------+-------------------+
+| Elasticsearch Node     | 2              | r6g.xlarge.search |
+------------------------+----------------+-------------------+
+| Proxy                  | 1              | m7i.4xlarge       |
+------------------------+----------------+-------------------+


The data that we have is exactly this. But should we repeat the 4 app nodes and 4 DB nodes test with Elasticsearch so that we don't need to bump the number of app and DB nodes in the Elasticsearch section?

@nab-77 - Without ES section was removed ^

agarciamontoro · 2024-09-20T15:57:27Z

source/scale/scaling-for-enterprise.rst

-Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v8.1 extended support release (ESR). Elasticsearch and job servers weren't used. All tests wtih more than a single app node had an NGINX proxy running in front of them.
+Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v8.1 extended support release (ESR). Elasticsearch and job servers weren't used. All tests had an NGINX proxy running in front of them.


Why did we remove this? It is still true (as in we have one test with a single app node that doesn't have any proxy in front of it)

^ @nab-77 - This wasn't removed intentionally. Perhaps you have more context?

nab-77

I think the X placeholder are due to these being from v1 testing.

@agarciamontoro can you help unblock this? I know we won't have ES data for these deployments but we will have Nginx right?

nab-77 · 2024-10-03T12:48:37Z

source/scale/scale-to-93000-users.rst

+------------------------+-----------+----------------+-------------------+
+| RDS Reader             | 4         | 16/128         | db.r7g.4xlarge    |
+------------------------+-----------+----------------+-------------------+
+| Elasticsearch Node     | 0         | 4/32           | r6g.xlarge.search |


@nab-77 Yes, assuming that Test 4XL.550 is the correct 4xl test results

nab-77 · 2024-10-03T12:49:05Z

source/scale/scale-to-88000-users.rst

+------------------------+-----------+----------------+-------------------+
+| RDS Reader             | 4         | 16/128         | db.r6g.4xlarge    |
+------------------------+-----------+----------------+-------------------+
+| Elasticsearch Node     | X         | 4/32           | r6g.xlarge.search |


X? @cwarnermm can I help here?

@nab-77 - you're correct that any X indicates a TBD value I couldn't clearly identify via the v2 results.

nab-77 · 2024-10-03T12:49:25Z

source/scale/scale-to-79000-users.rst

+------------------------+-----------+----------------+-------------------+
+| RDS Reader             | 4         | 16/128         | db.r6g.4xlarge    |
+------------------------+-----------+----------------+-------------------+
+| Elasticsearch Node     | X         | 4/32           | r6g.xlarge.search |


X? @cwarnermm can I help here?

github-actions · 2024-10-03T13:10:35Z

Newest code from mattermost has been published to preview environment for Git SHA 1e812f5

agarciamontoro · 2024-10-03T13:49:38Z

Why are we mixing results from v1 and v2? I don't think we should do that, it will be quite confusing having different instance families in different tests.

cwarnermm · 2024-10-03T14:02:02Z

@agarciamontoro - Existing v1 pages have been moved to the new table format, and v2 pages have been added. Looking to @nab-77 for guidance on how to move forward in a unified way.

Added v2 ref arch data

d4fa702

cwarnermm added the 1: PM Review Requires review by a product manager label Sep 13, 2024

cwarnermm added this to the v10.0.0 milestone Sep 13, 2024

cwarnermm requested a review from nab-77 September 13, 2024 17:39

cwarnermm requested a review from agarciamontoro September 13, 2024 19:18

cwarnermm added the 1: Dev Review Requires review by a core commiter label Sep 13, 2024

Merge branch 'master' into reference-architectures-v2

4793a58

Merge branch 'master' into reference-architectures-v2

6ce29ee

Merge branch 'master' into reference-architectures-v2

c4dc03b

agarciamontoro mentioned this pull request Sep 17, 2024

Ceiling tests V2 mattermost/performance-reports#22

Merged

agarciamontoro requested changes Sep 20, 2024

View reviewed changes

Rolled back changes to v1 guidance; added CPU/MEM for all

d2a40f9

nab-77 requested changes Oct 3, 2024

View reviewed changes

Merge branch 'master' into reference-architectures-v2

1e812f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added v2 reference architecture details #7397

Added v2 reference architecture details #7397

cwarnermm commented Sep 13, 2024 •

edited

Loading

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

agarciamontoro left a comment

agarciamontoro Sep 20, 2024

agarciamontoro Sep 20, 2024

cwarnermm Oct 3, 2024

agarciamontoro Sep 20, 2024

cwarnermm Oct 3, 2024

nab-77 left a comment

nab-77 Oct 3, 2024

cwarnermm Oct 3, 2024

nab-77 Oct 3, 2024

cwarnermm Oct 3, 2024

nab-77 Oct 3, 2024

github-actions bot commented Oct 3, 2024

agarciamontoro commented Oct 3, 2024 •

edited

Loading

cwarnermm commented Oct 3, 2024

		Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v8.1 extended support release (ESR). Elasticsearch and job servers weren't used. All tests wtih more than a single app node had an NGINX proxy running in front of them.
		Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v8.1 extended support release (ESR). Elasticsearch and job servers weren't used. All tests had an NGINX proxy running in front of them.

Added v2 reference architecture details #7397

Are you sure you want to change the base?

Added v2 reference architecture details #7397

Conversation

cwarnermm commented Sep 13, 2024 • edited Loading

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 13, 2024

github-actions bot commented Sep 16, 2024

github-actions bot commented Sep 16, 2024

agarciamontoro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nab-77 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 3, 2024

agarciamontoro commented Oct 3, 2024 • edited Loading

cwarnermm commented Oct 3, 2024

cwarnermm commented Sep 13, 2024 •

edited

Loading

agarciamontoro commented Oct 3, 2024 •

edited

Loading