Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of CUDA memory when training #14

Open
antithing opened this issue Jun 28, 2023 · 3 comments
Open

Out of CUDA memory when training #14

antithing opened this issue Jun 28, 2023 · 3 comments

Comments

@antithing
Copy link

On a single RTX 3090. Is there a param I can adjust to make this work?

Thanks

@antithing
Copy link
Author

I am also trying to train a model at 1920x1080, and when I change the batch size and resolution:

{
    "seed": 2020,
    "save_dir": "release_model/",
    "data_loader": {
        "name": "davis",
        "data_root": "datasets/",
        "w": 1920,
        "h": 1080,
        "sample_length": 10
    },
    "losses": {
        "hole_weight": 1,
        "valid_weight": 1,
        "adversarial_weight": 0.01,
        "GAN_LOSS": "hinge"
    },
    "trainer": {
        "type": "Adam",
        "beta1": 0,
        "beta2": 0.99,
        "lr": 1e-4,
        "d2glr": 1, 
        "batch_size": 4,
        "num_workers": 1,
        "verbosity": 2,
        "log_step": 100,
        "save_freq": 1e4,
        "valid_freq": 1e4, 
        "iterations": 50e4,
        "niter": 30e4,
        "niter_steady": 30e4
    }
}

I see this error:

inpainting\STTN-master\STTN-master\model\sttn.py", line 188, in forward
mm = m.view(b, t, 1, out_h, height, out_w, width)
RuntimeError: shape '[4, 10, 1, 60, 60, 108, 108]' is invalid for input of size 259200

@alex-flwls
Copy link

Try a lower batch size. Set it to 1 and see what happens. Also would it be possible to run the training at half or a quarter of your resolution and then upscale? Transformers are notorious for scaling quadratically with regard to their input so HD input size with temporal attention is perhaps unlikely to fit in 24GB vRAM (in my opinion, I could be wrong).

@alex-flwls
Copy link

I got this running at HD resolution (1920x1080) on an Nvidia A10-G (24GB vRAM). This is for inference, I haven't tried training a new model yet.

Here's the patch sizes I used (in model/sttn.py):
patchsize = [(480,270), (160, 90), (32, 18), (16, 9)]

And here's the hyperparameters from test.py:
w, h = 1920, 1080 ref_length = 10 neighbor_stride = 3 default_fps = 24

The results aren't great though. I think this is possibly because limiting the number of neighbour and reference frames will inhibit the ability of the model to infer inpainted regions. Also changing the patch sizes from training is probably not helping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants