[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

infinity0 · 2024-07-09T11:43:23Z

Checklist

The issue has not been resolved by following the troubleshooting guide
The issue exists on a clean installation of Fooocus
The issue exists in the current version of Fooocus
The issue has not been reported before recently
The issue has been reported before but has not been fixed yet

What happened?

On ROCm / amdgpu, Fooocus doesn't garbage collect used VRAM even after several hours. This means that other applications, such as other AI image generators, cannot use the VRAM and give "out of memory" errors.

Steps to reproduce the problem

With ROCm with an AMD GPU, use Fooocus as normal - generate a few random pictures with default settings.
Wait a few hours then run radeontop. See that VRAM allocation is still many GB.
Use some other AI tool such as InvokeAI - generate a few random pictures with default settings.
See that this other AI tool gives Out of VRAM errors.
Close Fooocus and repeat (2, without waiting) and (3).
VRAM allocation in radeontop is back down to normal levels. Also, the other AI tool succeeds.

What should have happened?

Fooocus should release VRAM after it is finished generating images.

What browsers do you use to access Fooocus?

No response

Where are you running Fooocus?

Locally

What operating system are you using?

Debian GNU/Linux

Console logs

No relevant logs in Fooocus.

As described above, behaviour is observed empirically via other means, i.e.

1. `radeontop` VRAM usage before and after shutting down Fooocus
2. console logs of {other AI tool} before and after shutting down Fooocus.
   - before: OutOfMemoryError specifically VRAM
   - after: works fine

Additional information

No response

The text was updated successfully, but these errors were encountered:

infinity0 · 2024-07-09T11:47:52Z

Note this problem is unique to Fooocus/ROCm. With InvokeAI/ROCm, I can observe the VRAM being used as the image is generated, but it is correctly released after the generation is finished.

Fooocus however hangs onto the memory indefinitely (I waited literally days), preventing other AI tools from working. There is no UI way to force it to release the memory, the only way is to restart Fooocus. I'm using git master @ 5a71495 dated 2024-07-01.

infinity0 · 2024-07-09T11:52:05Z

Also, both InvokeAI and Fooocus are using PyTorch/ROCm, so what I am asking for is clearly possible. Someone more familiar with the code could probably have a look at how InvokeAI handles VRAM allocations, and port that into Fooocus.

mashb1t · 2024-07-09T12:32:39Z

I assume you're not using low vram mode, which would force unloading after generation (afaik).
Fooocus keeps the model loaded depending on configuration and startup arguments. Please provide the startup command for further debugging, thanks.

infinity0 · 2024-07-09T12:42:04Z

I'm running python3 entry_with_update.py. Problem occurs with any of the flags

--always-offload-from-vram
--always-high-vram as well as
--always-low-vram

Example usage 12940M / 16165M VRAM 80.05% which goes back down to 1594M / 16165M VRAM 9.86% after I close Fooocus.

Low vram mode (--always-low-vram) doesn't seem to help, I waited several minutes after generating and the VRAM usage is still >50%.

mashb1t · 2024-07-09T12:58:41Z

This is somewhat normal, some things are kept in cache / RAM / VRAM for Fooocus to generate images faster the next time, as they would have to be loaded again. There also currently is no offload button, but --always-offload-from-vram should work.

If you do not want this behaviour you can change the code and try to manually trigger the offload after generation yourself.

https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py#L357

I sadly don't have an AMD card and can't confirm the issue, so please connect with other community members having one by opening a new discussion and by referencing this issue. Thanks!

infinity0 · 2024-07-09T13:04:05Z

--always-offload-from-vram doesn't work.

mashb1t · 2024-07-09T13:06:06Z

I got that, but can't confirm for AMD as i don't have an AMD GPU. Please get in touch with other users by opening a new discussion.

infinity0 · 2024-07-09T13:13:02Z

Are you saying you don't believe bug reports until at least 1 other person have corroborated?? I don't see every issue being duplicated in "Discussions" in this way, but alright if you insist.

In the meantime I've written a script to automatically restart Fooocus if there are no console logs for 120 seconds. For Fooocus this needs to be run as ./timeout.py python -u entry_with_update.py as described.

#!/usr/bin/python
"""Run a command, except kill-and-re-run it if it doesn't produce stdout/stderr
within a given timeout.

If the command is a python script, you MOST LIKELY need to run it as `python -u`
for this wrapper to work properly, since python has nonstandard nonline
buffering by default.
"""
import psutil
import select
import sys
import subprocess
import signal
import threading

output_t_s = 120
sigint_t_s = 10
trmkil_t_s = 5
log_prefix = "================"

autorestart = True

def stop(subproc):
    global autorestart
    autorestart = True # only autorestart if the process was stopped by us

    print(log_prefix, 'send SIGINT', subproc)
    subproc.send_signal(signal.SIGINT)
    # send SIGINT to all children processes too, this matches the behaviour
    # when you ctrl-C in a shell, and is required for many complex programs to
    # interpret SIGINT in the expected way.
    for c in subproc.children(True):
        print(log_prefix, 'send SIGINT', c)
        c.send_signal(signal.SIGINT)

    try:
        subproc.wait(timeout=sigint_t_s)
    except subprocess.TimeoutExpired:
        print(log_prefix, 'send SIGTERM')
        subproc.terminate()
        try:
            subproc.wait(timeout=trmkil_t_s)
        except subprocess.TimeoutExpired:
            print(log_prefix, 'send SIGKILL')
            subproc.kill()
            try:
                subproc.wait(timeout=trmkil_t_s)
            except subprocess.TimeoutExpired:
                pass

def run(args): # run the command which is passed as a parameter to this script
    global autorestart
    autorestart = False # don't autorestart unless we called stop()
    subproc = psutil.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stopper = None

    print(log_prefix, 'running', args, subproc)

    while subproc.returncode is None:
        rs, _, _ = select.select([subproc.stdout, subproc.stderr], [], [], output_t_s)
        for rf in rs:
            data = rf.read1(65536)
            buf = sys.stdout.buffer if rf is subproc.stdout else sys.stderr.buffer
            buf.write(data)
            buf.flush()
        if not rs and stopper is None:
            stopper = threading.Thread(target = lambda: stop(subproc))
            stopper.start()

    if stopper:
        stopper.join()

while autorestart:
    run(sys.argv[1::])

Code uses select.select to work for both stdout+stderr, which is required by Fooocus.

infinity0 · 2024-07-09T13:14:22Z

I have asked the community here: #3258

mashb1t · 2024-07-09T13:16:12Z

Are you saying you don't believe bug reports until at least 1 other person have corroborated?? I don't see every issue being duplicated in "Discussions" in this way, but alright if you insist.

I got that, but can't confirm for AMD as i don't have an AMD GPU.

I also can't debug and/or fix this as i don't have the necessary hardware, so somebody else has to fix this.
=> asking the community is the next best thing to do, don't you aree?

…iel#3257

…3257

infinity0 · 2024-07-10T11:05:33Z

The current code intentionally does not free memory on ROCm, with a comment "seems to make things worse on ROCm".

ldm_patched/modules/model_management.py#L769 - blame, original commit by @lllyasviel

I don't see that it makes anything "worse", so here is a PR that fixes that and makes ROCm behave the same as CUDA: #3262

If @lllyasviel can remember what "worse" actually means, then here is an alternative more conservative PR that forces the free only when --always-offload-from-vram flag is given: #3263

infinity0 · 2024-07-10T11:10:55Z

With #3262, the current code will free memory between every image generation on ROCm - which is what's already happening on CUDA.

A more ideal behaviour would be to have a timeout to free the memory, so that we don't unnecessarily free it when we are about to immediately generate another image. However the current code doesn't do this for CUDA or anything else, so I consider it out-of-scope for this issue.

fkleon · 2024-07-21T09:14:18Z

Thanks @infinity0, I have also noticed in the past that the VRAM is not freed while Fooocus is running, needing to shut it down when using other applications wanting to make use of the GPU.

I've tried the fix from #3262 on my system with a RDNA2 card (ROCM 6.1, Kernel 6.7) and it works perfectly fine so far.

infinity0 added bug Something isn't working triage This needs an (initial) review labels Jul 9, 2024

mashb1t added bug (AMD) Something isn't working (AMD specific) feedback pending Waiting for further information and removed bug Something isn't working triage This needs an (initial) review labels Jul 9, 2024

mashb1t added help wanted Extra attention is needed and removed feedback pending Waiting for further information labels Jul 9, 2024

infinity0 added a commit to infinity0/Fooocus that referenced this issue Jul 10, 2024

Make --always-offload-from-vram actually work properly, fixes lllyasv…

a388003

…iel#3257

infinity0 added a commit to infinity0/Fooocus that referenced this issue Jul 10, 2024

Release memory on ROCm as well, it works fine here, fixes lllyasviel#…

72d6fc7

…3257

This was referenced Jul 10, 2024

Release memory on ROCm as well, it works fine here, fixes #3257 #3262

Open

Make --always-offload-from-vram actually work properly, fixes #3257 #3263

Open

This was linked to pull requests Jul 10, 2024

Make --always-offload-from-vram actually work properly, fixes #3257 #3263

Open

Release memory on ROCm as well, it works fine here, fixes #3257 #3262

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

infinity0 commented Jul 9, 2024 •

edited

Loading

infinity0 commented Jul 9, 2024

infinity0 commented Jul 9, 2024

mashb1t commented Jul 9, 2024

infinity0 commented Jul 9, 2024

mashb1t commented Jul 9, 2024

infinity0 commented Jul 9, 2024

mashb1t commented Jul 9, 2024

infinity0 commented Jul 9, 2024 •

edited

Loading

infinity0 commented Jul 9, 2024 •

edited

Loading

mashb1t commented Jul 9, 2024

infinity0 commented Jul 10, 2024 •

edited

Loading

infinity0 commented Jul 10, 2024

fkleon commented Jul 21, 2024

[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

Comments

infinity0 commented Jul 9, 2024 • edited Loading

Checklist

What happened?

Steps to reproduce the problem

What should have happened?

What browsers do you use to access Fooocus?

Where are you running Fooocus?

What operating system are you using?

Console logs

Additional information

infinity0 commented Jul 9, 2024

infinity0 commented Jul 9, 2024

mashb1t commented Jul 9, 2024

infinity0 commented Jul 9, 2024

mashb1t commented Jul 9, 2024

infinity0 commented Jul 9, 2024

mashb1t commented Jul 9, 2024

infinity0 commented Jul 9, 2024 • edited Loading

infinity0 commented Jul 9, 2024 • edited Loading

mashb1t commented Jul 9, 2024

infinity0 commented Jul 10, 2024 • edited Loading

infinity0 commented Jul 10, 2024

fkleon commented Jul 21, 2024

infinity0 commented Jul 9, 2024 •

edited

Loading

infinity0 commented Jul 9, 2024 •

edited

Loading

infinity0 commented Jul 9, 2024 •

edited

Loading

infinity0 commented Jul 10, 2024 •

edited

Loading