Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVOPS-424 fix: add explicit GPU resource limit #63

Merged
merged 1 commit into from
Oct 25, 2023
Merged

Conversation

tplessas
Copy link
Contributor

Nvidia plugin already updated on both clusters - tested that these changes work on review by deploying a bunch of coref-resolution replicas under release coref-multitest, which I'll leave up until this PR is merged.

image

Two replicas ended up on the same node, which also allows us to confirm that time-slicing works fine (barring any changes related to memory such as those discussed this morning).

image

@tplessas tplessas requested a review from a team as a code owner October 25, 2023 10:33
@tplessas tplessas changed the title fix: add explicit GPU resource limit DEVOPS-424 fix: add explicit GPU resource limit Oct 25, 2023
@tplessas tplessas merged commit ef34ae7 into master Oct 25, 2023
1 check passed
@tplessas tplessas deleted the add-gpu-limit branch October 25, 2023 13:15
@anton-delphai
Copy link
Contributor

And why do we need that if we set the capacity to very big number? Isn't that equivalent to not request GPUs at all?

@tplessas
Copy link
Contributor Author

tplessas commented Oct 27, 2023

Time-slicing does not actually do what its name implies – it is just used to claim a generic stake on the GPU. This is why the resource limit cannot have a fractional part or be more than 1.

From https://github.com/NVIDIA/k8s-device-plugin#shared-access-to-gpus-with-cuda-time-slicing:

Note: Unlike with "normal" GPU requests, requesting more than one shared GPU does not imply that you will get guaranteed access to a proportional amount of compute power. It only implies that you will get access to a GPU that is shared by other clients (each of which has the freedom to run as many processes on the underlying GPU as they want). Under the hood CUDA will simply give an equal share of time to all of the GPU processes across all of the clients. The failRequestsGreaterThanOne flag is meant to help users understand this subtlety, by treating a request of 1 as an access request rather than an exclusive resource request.

Everything works just like before this PR/the cluster update – the only thing added is some more k8s bureaucracy to achieve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants