CUDA and NVENC load balancing #520

totaam · 2014-02-18T03:40:24Z

Issue migrated from trac ticket # 520

component: server | priority: major | resolution: worksforme

2014-02-18 03:40:24: totaam created the issue

Related to #504 and #466.

When we have multiple cards and/or multiple virtual cards (GRID K1, K2 and others) in the same server, we want to ensure that the load is fairly evenly distributed amongst all the (v)GPUs.

With CUDA, this isn't a problem. But with NVENC, we have no way of knowing how many contexts are still free. What happens when we reach the limit is that creating a new context will just fail...
We cannot assume that we are the only user of the device on the system, especially with proxy encoding (#504) where each proxy instances runs in its own process space.

The code added in r5488 moves the CUDA device selection (amongst other things) to a utility module and uses the percentage of free memory to choose the device to use. Since there are normally up to 32 contexts per GPU, this should work as a cheap load balancing solution: even with 4 vGPUs per PCIE slot, things will even out before we reach 20% capacity. This won't take into account the size of the encoding contexts, but since we reserve large context buffers in all cases (see r5442 - done for supporting #410) and since the sizes should be randomly distributed anyway, this should not be too much of a problem.
We lower the NVENC codec score as we create more contexts, and we also keep track of context failures to lower the score further (taking into account how recent the failure was). This should ensure that as we get closer to the limit, we become less likely to try to use NVENC, or that when we do hit the hard limit, we have a gradual grace period until we try again.

What remains to be done:

link the NVENC context failures to the CUDA context they occurred on: other devices may still have free contexts, we should try those first if asked to create a new NVENC context

maybe timeout the contexts: a context that has not been used for N seconds could probably be put to better use (may depend on current load - which is difficult to estimate in a proxy encoder context..)

in the context of proxy encoding, as we lower the NVENC codec score, we will still receive RGB frames from the server being proxied and so we need to fallback to x264 or another encoding. At the moment, we fail hard if we cannot find a fallback video encoder..

Notes:

see r5487

see r5483

The text was updated successfully, but these errors were encountered:

totaam · 2014-02-18T04:05:49Z

2014-02-18 04:05:49: totaam changed status from new to assigned

totaam · 2014-02-18T04:05:49Z

2014-02-18 04:05:49: totaam changed owner from antoine to totaam

totaam · 2014-02-18T04:05:49Z

2014-02-18 04:05:49: totaam edited the issue description

totaam · 2014-02-18T04:05:49Z

2014-02-18 04:05:49: totaam changed title from cuda and nvenc load balancing to CUDA and NVENC load balancing

totaam · 2014-02-18T09:42:52Z

2014-02-18 09:42:52: totaam changed status from assigned to new

totaam · 2014-02-18T09:42:52Z

2014-02-18 09:42:52: totaam changed owner from totaam to smo

totaam · 2014-02-18T09:42:52Z

2014-02-18 09:42:52: totaam commented

r5492 lets us fallback to any other video encoder if we can't instantiate the one we want: it will try x264 and even vpx if present. Failing that we then try to use jpeg, and as a last fallback we default to plain RGB as we received it (which is bad, but should never happen and it is still better than not sending any pixels at all!)

r5496 will try CUDA devices that have not had recent failures ahead of those that did - this takes precedence over the "percentage of free memory" sorting, this should allow us to reach close to 100% occupancy on multi-GPU / VGPU setups.

r5497 will timeout video encoding contexts if unused for more than 5 seconds. This code applies to the proxy instance only - not for the main server yet, since it's less likely to have as much contention for resources

(r5493 was missing from previous commits - oops)

smo: this is good enough for some testing... and I only have one card, so I cannot really test it very well.

Things to lookout for:

nvenc memory leak #517 regressions: I believe this load balancing code should be safe from leaks, but I cannot be certain. Easiest thing to do is resize a fast updating window (and if possible, do that on two contexts that live on different cards..), which should cause many encoder re-init: destroying and creating new NVENC contexts for the new window sizes. The GPU's free memory should remain relatively constant throughout.

utilization: can we get close to 100% of encoding contexts used? (32 contexts per card.. this will take a lot of clients and windows)

start multiple servers or use proxy encoding (delegated encoding mode #504): does the code still manage to allocate encoding contexts properly? (and fallback/retry as needed)

how is the initial connection delay: having to initialize the CUDA context after the client connects means that there will be an extra delay, even more so when there are multiple cards to probe. Is it bearable?
etc..

totaam · 2014-05-15T21:39:20Z

2014-05-15 21:39:20: smo commented

Haven't found any leaks and was able to load balance up to 10 sessions.

Tested with proxy encoding with 10 sessions with no issues.

The connection delay doesn't seem to be an issue and is hardly noticeable.

closing for now will reopen if there are issues

totaam · 2014-05-15T21:39:39Z

2014-05-15 21:39:39: smo changed status from new to closed

totaam · 2014-05-15T21:39:39Z

2014-05-15 21:39:39: smo changed resolution from ** to worksforme

totaam · 2019-11-21T07:44:03Z

2019-11-21 07:44:03: antoine commented

See also: new CUDA load balancing feature in #2416.

totaam closed this as completed May 15, 2014

totaam mentioned this issue Jan 22, 2021

CUDA device selection by name #2416

Closed

totaam mentioned this issue Jan 30, 2021

delegated encoding mode #504

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA and NVENC load balancing #520

CUDA and NVENC load balancing #520

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented Feb 18, 2014

totaam commented May 15, 2014

totaam commented May 15, 2014

totaam commented May 15, 2014

totaam commented Nov 21, 2019

CUDA and NVENC load balancing #520

CUDA and NVENC load balancing #520

Comments

totaam commented Feb 18, 2014

2014-02-18 03:40:24: totaam created the issue

totaam commented Feb 18, 2014

2014-02-18 04:05:49: totaam changed status from new to assigned

totaam commented Feb 18, 2014

2014-02-18 04:05:49: totaam changed owner from antoine to totaam

totaam commented Feb 18, 2014

2014-02-18 04:05:49: totaam edited the issue description

totaam commented Feb 18, 2014

2014-02-18 04:05:49: totaam changed title from cuda and nvenc load balancing to CUDA and NVENC load balancing

totaam commented Feb 18, 2014

2014-02-18 09:42:52: totaam changed status from assigned to new

totaam commented Feb 18, 2014

2014-02-18 09:42:52: totaam changed owner from totaam to smo

totaam commented Feb 18, 2014

2014-02-18 09:42:52: totaam commented

totaam commented May 15, 2014

2014-05-15 21:39:20: smo commented

totaam commented May 15, 2014

2014-05-15 21:39:39: smo changed status from new to closed

totaam commented May 15, 2014

2014-05-15 21:39:39: smo changed resolution from ** to worksforme

totaam commented Nov 21, 2019

2019-11-21 07:44:03: antoine commented