Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize hwloc topology with disallowed nodes to accurately know available resources on system #1251

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

msimberg
Copy link
Contributor

When cgroups are used (e.g. by srun) to constrain allowed PUs for a process, hwloc will by default only report the PUs and other resources that are allowed to be used with the cgroups restrictions. This can lead to pika thinking that the system has only n PUs when in reality it has m > n PUs. In itself this isn't a problem, but currently on pika main we derive the number of available PUs from the resources that hwloc reports, and thus also the CPU mask sizes may be smaller than the whole system. If using cgroups and --pika:process-mask/PIKA_PROCESS_MASK with a mask that doesn't contain the first n PUs pika's initialization will currently access CPU masks out of bounds or report an error.

This PR proposes to initialize the hwloc topology with the flag HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED, which will include resources that may not be usable because of cgroups. This allows us to accurately get the maximum size of CPU masks.

Even though we request all resources internally, pika will not be able to bind threads to PUs that are outside of the cgroups restrictions. A warning will be printed if a mask is set outside of the restrictions.

I've relaxed an error to a warning when pika's expected CPU mask for a thread does not match what is retrieved with hwloc. This happens if the thread affinity can't be set.

I've also cleaned up some of the logging in scheduled_thread_pool_impl.hpp.

One downside of this approach is that, when using cgroups, hwloc logical PU indices will not correspond to indices reported by e.g. lstopo. Physical PU indices are exactly the same as before. I think this is a small inconsistency that's worth the price.

A possible alternative approach is to create a separate topology object with HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED only for counting the number of PUs etc. to correctly size CPU masks, but then use the a topology object without disallowed resources afterwards.

@msimberg msimberg added this to the 0.29.0 milestone Sep 16, 2024
@msimberg msimberg self-assigned this Sep 16, 2024
Copy link

codacy-production bot commented Sep 16, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.03% (target: -1.00%) 70.59% (target: 90.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (d766280) 18346 13763 75.02%
Head commit (9369fac) 18351 (+5) 13772 (+9) 75.05% (+0.03%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#1251) 17 12 70.59%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

Copy link
Contributor

@aurianer aurianer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks! Do you think it's worth adding a comment in the code about logical PU indices not matching lstopo's ones?

@msimberg
Copy link
Contributor Author

Do you think it's worth adding a comment in the code about logical PU indices not matching lstopo's ones?

Maybe, I'll have a look to see where it could fit. Note: The indices will match if no cgroups restrictions are in places. It's only when restrictions are in place that they may not match. Process masks (from hwloc-bind, taskset, slurm etc.) are anyway always using physical indices, so no risk of ambiguity there.

This allows accurately knowing how many PUs etc. are available on the
system, even if all of them can't be used. When using e.g. cgroups, PUs
can be restricted so that hwloc does not report them by default in the
topology. Including all PUs with HWLOC_TOPOLOGY_FLAG_INCLUDE_DISALLOWED
means that masks can be correctly sized for all physical and logical
indices.
…icate function name and explicitly call out pool name
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

2 participants