Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added v2 reference architecture details #7397

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

cwarnermm
Copy link
Member

@cwarnermm cwarnermm commented Sep 13, 2024

PR changes include:

  • Returned v1 guidance for 100, 15K, 50K, 79K, & 88K. All v1 guidance includes c6i and r6g AWS instance sizes.
  • Added v2 guidance for 2K, 40K, 80K, 93K, & 100K that includes c7i and r7g AWS instance sizes.
  • Landing page & nav pane includes all v1 & v2 concurrent user counts ranging from 100 through 100K.
  • Updated all guidance tables to include vCPU & Memory. Used AWS Pricing on Demand to calculate values based on AWS Instance size, and used r6g.xlarge.search page to doc ES node values.

Outstanding for @nab-77:

  • Review individual deploy pages for technical accuracy against v2 testing results and update where needed.
  • Review & update v1 guidance tables with X present indicating TBD value.
  • Reduce number of individual deploy pages, if/where preferred. Once page reductions complete, @cwarnermm to add page redirects where needed.
  • Review & update testing methodology
  • Review @agarciamontoro's feedback and provide guidance on next steps
  • Provide guidance on where to add additional load balancer guidance, if needed
  • Help identify where to link to fixed database dump with 100 million posts

@cwarnermm cwarnermm added the 1: PM Review Requires review by a product manager label Sep 13, 2024
@cwarnermm cwarnermm added this to the v10.0.0 milestone Sep 13, 2024
Copy link

Newest code from mattermost has been published to preview environment for Git SHA d4fa702

@cwarnermm cwarnermm added the 1: Dev Review Requires review by a core commiter label Sep 13, 2024
Copy link

Newest code from mattermost has been published to preview environment for Git SHA 4793a58

Copy link

Newest code from mattermost has been published to preview environment for Git SHA 6ce29ee

Copy link

Newest code from mattermost has been published to preview environment for Git SHA c4dc03b

Copy link
Member

@agarciamontoro agarciamontoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this, @cwarnermm! Thank you so much! I've left some comments inline, but I have some general questions (and comments on the issues you mentioned in the PR description):

  1. What do you mean with "the load balancer guidance"?
  2. On the 100M posts DB, here's the link to the gzipped version: https://lt-public-data.s3.amazonaws.com/100M_710_withmemberships.sql.gz
  3. Not sure if we want to directly link to the full report in the testing methodology section: https://github.com/mattermost/performance-reports/tree/main/ceiling-tests/v2#readme
  4. Nit: should we tweak the scale to 93k users to simply 90k users? It seems somehow cleaner.
  5. Elasticsearch: there are some configurations (80k, 93k) for which we don't list Elasticsearch support, although we have some data backing that up (although it was proved that Elasticsearch started to become clogged with ~30k users). Do we want to add that or not?

Again, thank you, massive job here! :)

Comment on lines 22 to 24
| RDS Reader | 0 | db.r7g.large |
+------------------------+----------------+-------------------+
| Elasticsearch Node | 0 | r6g.xlarge.search |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the rows with 0 nodes? Or do we prefer the consistency over all docs?

Comment on lines 18 to 50
Without Elasticsearch
~~~~~~~~~~~~~~~~~~~~~

+------------------------+----------------+-------------------+
| **Resource Type** | **# of Nodes** | **AWS Instance** |
+========================+================+===================+
| Mattermost Application | 4 | c7i.2xlarge |
+------------------------+----------------+-------------------+
| RDS Writer | 1 | db.r7g.2xlarge |
+------------------------+----------------+-------------------+
| RDS Reader | 3 | db.r7g.2xlarge |
+------------------------+----------------+-------------------+
| Elasticsearch Node | 0 | r6g.xlarge.search |
+------------------------+----------------+-------------------+
| Proxy | 1 | m7i.4xlarge |
+------------------------+----------------+-------------------+

With Elasticsearch
~~~~~~~~~~~~~~~~~~~

+------------------------+----------------+-------------------+
| **Resource Type** | **# of Nodes** | **AWS Instance** |
+========================+================+===================+
| Mattermost Application | 5 | c7i.2xlarge |
+------------------------+----------------+-------------------+
| RDS Writer | 1 | db.r7g.2xlarge |
+------------------------+----------------+-------------------+
| RDS Reader | 4 | db.r7g.2xlarge |
+------------------------+----------------+-------------------+
| Elasticsearch Node | 2 | r6g.xlarge.search |
+------------------------+----------------+-------------------+
| Proxy | 1 | m7i.4xlarge |
+------------------------+----------------+-------------------+
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data that we have is exactly this. But should we repeat the 4 app nodes and 4 DB nodes test with Elasticsearch so that we don't need to bump the number of app and DB nodes in the Elasticsearch section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nab-77 - Without ES section was removed ^

Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v8.1 extended support release (ESR). Elasticsearch and job servers weren't used. All tests wtih more than a single app node had an NGINX proxy running in front of them.
Tests were defined by configuration of the actions executed by each simulated user (and the frequency of these actions) where the coordinator metrics define a health system under load. Tests were performed using the Mattermost v8.1 extended support release (ESR). Elasticsearch and job servers weren't used. All tests had an NGINX proxy running in front of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove this? It is still true (as in we have one test with a single app node that doesn't have any proxy in front of it)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ @nab-77 - This wasn't removed intentionally. Perhaps you have more context?

Copy link
Contributor

@nab-77 nab-77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the X placeholder are due to these being from v1 testing.

@agarciamontoro can you help unblock this? I know we won't have ES data for these deployments but we will have Nginx right?

+------------------------+-----------+----------------+-------------------+
| RDS Reader | 4 | 16/128 | db.r7g.4xlarge |
+------------------------+-----------+----------------+-------------------+
| Elasticsearch Node | 0 | 4/32 | r6g.xlarge.search |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nab-77 Yes, assuming that Test 4XL.550 is the correct 4xl test results

+------------------------+-----------+----------------+-------------------+
| RDS Reader | 4 | 16/128 | db.r6g.4xlarge |
+------------------------+-----------+----------------+-------------------+
| Elasticsearch Node | X | 4/32 | r6g.xlarge.search |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X? @cwarnermm can I help here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nab-77 - you're correct that any X indicates a TBD value I couldn't clearly identify via the v2 results.

+------------------------+-----------+----------------+-------------------+
| RDS Reader | 4 | 16/128 | db.r6g.4xlarge |
+------------------------+-----------+----------------+-------------------+
| Elasticsearch Node | X | 4/32 | r6g.xlarge.search |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X? @cwarnermm can I help here?

Copy link

github-actions bot commented Oct 3, 2024

Newest code from mattermost has been published to preview environment for Git SHA 1e812f5

@agarciamontoro
Copy link
Member

agarciamontoro commented Oct 3, 2024

Why are we mixing results from v1 and v2? I don't think we should do that, it will be quite confusing having different instance families in different tests.

@cwarnermm
Copy link
Member Author

@agarciamontoro - Existing v1 pages have been moved to the new table format, and v2 pages have been added. Looking to @nab-77 for guidance on how to move forward in a unified way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1: Dev Review Requires review by a core commiter 1: PM Review Requires review by a product manager
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants