Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

Open
4 of 5 tasks
infinity0 opened this issue Jul 9, 2024 · 13 comments · May be fixed by #3263 or #3262
Open
4 of 5 tasks

[Bug]: ROCm Fooocus doesn't garbage collect allocated VRAM #3257

infinity0 opened this issue Jul 9, 2024 · 13 comments · May be fixed by #3263 or #3262
Labels
bug (AMD) Something isn't working (AMD specific) help wanted Extra attention is needed

Comments

@infinity0
Copy link

infinity0 commented Jul 9, 2024

Checklist

  • The issue has not been resolved by following the troubleshooting guide
  • The issue exists on a clean installation of Fooocus
  • The issue exists in the current version of Fooocus
  • The issue has not been reported before recently
  • The issue has been reported before but has not been fixed yet

What happened?

On ROCm / amdgpu, Fooocus doesn't garbage collect used VRAM even after several hours. This means that other applications, such as other AI image generators, cannot use the VRAM and give "out of memory" errors.

Steps to reproduce the problem

  1. With ROCm with an AMD GPU, use Fooocus as normal - generate a few random pictures with default settings.
  2. Wait a few hours then run radeontop. See that VRAM allocation is still many GB.
  3. Use some other AI tool such as InvokeAI - generate a few random pictures with default settings.
  4. See that this other AI tool gives Out of VRAM errors.
  5. Close Fooocus and repeat (2, without waiting) and (3).
  6. VRAM allocation in radeontop is back down to normal levels. Also, the other AI tool succeeds.

What should have happened?

  1. Fooocus should release VRAM after it is finished generating images.

What browsers do you use to access Fooocus?

No response

Where are you running Fooocus?

Locally

What operating system are you using?

Debian GNU/Linux

Console logs

No relevant logs in Fooocus.

As described above, behaviour is observed empirically via other means, i.e.

1. `radeontop` VRAM usage before and after shutting down Fooocus
2. console logs of {other AI tool} before and after shutting down Fooocus.
   - before: OutOfMemoryError specifically VRAM
   - after: works fine

Additional information

No response

@infinity0 infinity0 added bug Something isn't working triage This needs an (initial) review labels Jul 9, 2024
@infinity0
Copy link
Author

Note this problem is unique to Fooocus/ROCm. With InvokeAI/ROCm, I can observe the VRAM being used as the image is generated, but it is correctly released after the generation is finished.

Fooocus however hangs onto the memory indefinitely (I waited literally days), preventing other AI tools from working. There is no UI way to force it to release the memory, the only way is to restart Fooocus. I'm using git master @ 5a71495 dated 2024-07-01.

@infinity0
Copy link
Author

Also, both InvokeAI and Fooocus are using PyTorch/ROCm, so what I am asking for is clearly possible. Someone more familiar with the code could probably have a look at how InvokeAI handles VRAM allocations, and port that into Fooocus.

@mashb1t
Copy link
Collaborator

mashb1t commented Jul 9, 2024

I assume you're not using low vram mode, which would force unloading after generation (afaik).
Fooocus keeps the model loaded depending on configuration and startup arguments. Please provide the startup command for further debugging, thanks.

@mashb1t mashb1t added bug (AMD) Something isn't working (AMD specific) feedback pending Waiting for further information and removed bug Something isn't working triage This needs an (initial) review labels Jul 9, 2024
@infinity0
Copy link
Author

I'm running python3 entry_with_update.py. Problem occurs with any of the flags

  • --always-offload-from-vram
  • --always-high-vram as well as
  • --always-low-vram

Example usage 12940M / 16165M VRAM 80.05% which goes back down to 1594M / 16165M VRAM 9.86% after I close Fooocus.

Low vram mode (--always-low-vram) doesn't seem to help, I waited several minutes after generating and the VRAM usage is still >50%.

@mashb1t
Copy link
Collaborator

mashb1t commented Jul 9, 2024

This is somewhat normal, some things are kept in cache / RAM / VRAM for Fooocus to generate images faster the next time, as they would have to be loaded again. There also currently is no offload button, but --always-offload-from-vram should work.

If you do not want this behaviour you can change the code and try to manually trigger the offload after generation yourself.

https://github.com/lllyasviel/Fooocus/blob/main/ldm_patched/modules/model_management.py#L357

I sadly don't have an AMD card and can't confirm the issue, so please connect with other community members having one by opening a new discussion and by referencing this issue. Thanks!

@mashb1t mashb1t added help wanted Extra attention is needed and removed feedback pending Waiting for further information labels Jul 9, 2024
@infinity0
Copy link
Author

--always-offload-from-vram doesn't work.

@mashb1t
Copy link
Collaborator

mashb1t commented Jul 9, 2024

I got that, but can't confirm for AMD as i don't have an AMD GPU. Please get in touch with other users by opening a new discussion.

@infinity0
Copy link
Author

infinity0 commented Jul 9, 2024

Are you saying you don't believe bug reports until at least 1 other person have corroborated?? I don't see every issue being duplicated in "Discussions" in this way, but alright if you insist.

In the meantime I've written a script to automatically restart Fooocus if there are no console logs for 120 seconds. For Fooocus this needs to be run as ./timeout.py python -u entry_with_update.py as described.

#!/usr/bin/python
"""Run a command, except kill-and-re-run it if it doesn't produce stdout/stderr
within a given timeout.

If the command is a python script, you MOST LIKELY need to run it as `python -u`
for this wrapper to work properly, since python has nonstandard nonline
buffering by default.
"""
import psutil
import select
import sys
import subprocess
import signal
import threading

output_t_s = 120
sigint_t_s = 10
trmkil_t_s = 5
log_prefix = "================"

autorestart = True

def stop(subproc):
    global autorestart
    autorestart = True # only autorestart if the process was stopped by us

    print(log_prefix, 'send SIGINT', subproc)
    subproc.send_signal(signal.SIGINT)
    # send SIGINT to all children processes too, this matches the behaviour
    # when you ctrl-C in a shell, and is required for many complex programs to
    # interpret SIGINT in the expected way.
    for c in subproc.children(True):
        print(log_prefix, 'send SIGINT', c)
        c.send_signal(signal.SIGINT)

    try:
        subproc.wait(timeout=sigint_t_s)
    except subprocess.TimeoutExpired:
        print(log_prefix, 'send SIGTERM')
        subproc.terminate()
        try:
            subproc.wait(timeout=trmkil_t_s)
        except subprocess.TimeoutExpired:
            print(log_prefix, 'send SIGKILL')
            subproc.kill()
            try:
                subproc.wait(timeout=trmkil_t_s)
            except subprocess.TimeoutExpired:
                pass

def run(args): # run the command which is passed as a parameter to this script
    global autorestart
    autorestart = False # don't autorestart unless we called stop()
    subproc = psutil.Popen(args, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stopper = None

    print(log_prefix, 'running', args, subproc)

    while subproc.returncode is None:
        rs, _, _ = select.select([subproc.stdout, subproc.stderr], [], [], output_t_s)
        for rf in rs:
            data = rf.read1(65536)
            buf = sys.stdout.buffer if rf is subproc.stdout else sys.stderr.buffer
            buf.write(data)
            buf.flush()
        if not rs and stopper is None:
            stopper = threading.Thread(target = lambda: stop(subproc))
            stopper.start()

    if stopper:
        stopper.join()

while autorestart:
    run(sys.argv[1::])

Code uses select.select to work for both stdout+stderr, which is required by Fooocus.

@infinity0
Copy link
Author

infinity0 commented Jul 9, 2024

I have asked the community here: #3258

@mashb1t
Copy link
Collaborator

mashb1t commented Jul 9, 2024

Are you saying you don't believe bug reports until at least 1 other person have corroborated?? I don't see every issue being duplicated in "Discussions" in this way, but alright if you insist.

I got that, but can't confirm for AMD as i don't have an AMD GPU.

I also can't debug and/or fix this as i don't have the necessary hardware, so somebody else has to fix this.
=> asking the community is the next best thing to do, don't you aree?

@infinity0
Copy link
Author

infinity0 commented Jul 10, 2024

The current code intentionally does not free memory on ROCm, with a comment "seems to make things worse on ROCm".

ldm_patched/modules/model_management.py#L769 - blame, original commit by @lllyasviel

I don't see that it makes anything "worse", so here is a PR that fixes that and makes ROCm behave the same as CUDA: #3262

If @lllyasviel can remember what "worse" actually means, then here is an alternative more conservative PR that forces the free only when --always-offload-from-vram flag is given: #3263

@infinity0
Copy link
Author

With #3262, the current code will free memory between every image generation on ROCm - which is what's already happening on CUDA.

A more ideal behaviour would be to have a timeout to free the memory, so that we don't unnecessarily free it when we are about to immediately generate another image. However the current code doesn't do this for CUDA or anything else, so I consider it out-of-scope for this issue.

@fkleon
Copy link

fkleon commented Jul 21, 2024

Thanks @infinity0, I have also noticed in the past that the VRAM is not freed while Fooocus is running, needing to shut it down when using other applications wanting to make use of the GPU.

I've tried the fix from #3262 on my system with a RDNA2 card (ROCM 6.1, Kernel 6.7) and it works perfectly fine so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug (AMD) Something isn't working (AMD specific) help wanted Extra attention is needed
Projects
None yet
3 participants