Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4: default MPIR_CVAR_CH4_ROOTS_ONLY_PMI on #7110

Merged
merged 4 commits into from
Aug 30, 2024
Merged

Conversation

hzhou
Copy link
Contributor

@hzhou hzhou commented Aug 21, 2024

Pull Request Description

Using ROOTS_ONLY_PMI significantly improves the large-scale init time. It is about time to make it default.

[skip warnings]

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

@hzhou
Copy link
Contributor Author

hzhou commented Aug 26, 2024

test:mpich/ch4/all

@hzhou
Copy link
Contributor Author

hzhou commented Aug 28, 2024

test:mpich/ch4/ofi

@hzhou hzhou requested a review from raffenet August 28, 2024 19:33
if (MPIDI_OFI_ENABLE_AV_TABLE) {
for (int r = 0; r < size; r++) {
MPIDI_OFI_addr_t *av ATTRIBUTE((unused)) = &MPIDI_OFI_AV(&MPIDIU_get_av(0, r));
MPIR_Assert(av->dest[0][0] == get_root_av_table_index(r));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is this verifying? I'm confused why the index into av->dest doesn't change with the rank r.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The av is the entry associated with the rank r.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we do this checking is explained in the comments at the front of the file. We compressed the dimensions of the av table based on the assumption that all the remote av are inserted exactly with the same order on each local context (vci, nic), thus they should have exactly the same fi_addr. The check confirms this assumption is still correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. av->dest[0][0] has type fi_addr_t if I'm not mistaken? And we are comparing that with an integer rank?

Copy link
Contributor Author

@hzhou hzhou Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is an integer under MPIDI_OFI_ENABLE_AV_TABLE. We are not comparing it to rank, it is the index to the av table.

Use ROOTS_ONLY_PMI significantly improves the large scale init time.
It is about time to make it default.
Since we moved multi vci init to MPIDI_world_post_init, we should dump
the debug info for multi vcis to MPIDI_OFI_post_init as well.
The logic in calculating index for the MPIR_CVAR_CH4_ROOTS_ONLY_PMI was
incorrect. It leaked through CI testing because the check was only done
in MPIDI_OFI_addr_exchange_all_ctx, which only runs with multiple vci,
and we never tested the combination.

* split get_av_table_index and get_root_av_table_index and add assertion
check in MPIDI_OFI_addr_exchange_root_ctx.

* fix the av index calculation not assuming processes ranks are
clustered in each node.
* In MPIDU_bc_allgather when we overwrite bc_len due to not of same_len,
we need fillout the bc_table according to the new max bc_len.

* The bc_table after MPIDU_bc_allgather is not in the rank-order, but in
the node-consecutive order. We need use rank_map to look up the table
slot.
@hzhou
Copy link
Contributor Author

hzhou commented Aug 30, 2024

Thanks for the review @raffenet . The tests are not clean, but all of them are testing congestion issue that are not related to this PR. I'll merge now. If there are recurring testing issues, we'll address them afterward.

@hzhou hzhou merged commit d372187 into pmodels:main Aug 30, 2024
4 checks passed
@hzhou hzhou deleted the 2408_roots_only branch August 30, 2024 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants