Error with grid_sampler_2d_backward() #145

zikpefu · 2022-03-31T05:41:38Z

Describe the bug
I get the following traceback error everytime I attempt to train stylegan3

/home/zikpefu/ece8550/final/stylegan3/training/augment.py:231: UserWarning: Specified kernel cache directory could not be created! This disables kernel caching. Specified directory is /home/zikpefu/.cache/torch/kernels. This warning will appear only once per process. (Triggered internally at  ../aten/src/ATen/native/cuda/jit_utils.cpp:860.)
  s = torch.exp2(torch.randn([batch_size], device=device) * self.scale_std)
Traceback (most recent call last):
  File "train.py", line 286, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/software/spackages/linux-centos8-x86_64/gcc-8.3.1/anaconda3-2019.10-v5cuhr6keyz5ryxcwvv2jkzfj2gwrj4a/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/software/spackages/linux-centos8-x86_64/gcc-8.3.1/anaconda3-2019.10-v5cuhr6keyz5ryxcwvv2jkzfj2gwrj4a/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/software/spackages/linux-centos8-x86_64/gcc-8.3.1/anaconda3-2019.10-v5cuhr6keyz5ryxcwvv2jkzfj2gwrj4a/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/software/spackages/linux-centos8-x86_64/gcc-8.3.1/anaconda3-2019.10-v5cuhr6keyz5ryxcwvv2jkzfj2gwrj4a/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "train.py", line 281, in main
    launch_training(c=c, desc=desc, outdir=opts.outdir, dry_run=opts.dry_run)
  File "train.py", line 96, in launch_training
    subprocess_fn(rank=0, c=c, temp_dir=temp_dir)
  File "train.py", line 47, in subprocess_fn
    training_loop.training_loop(rank=rank, **c)
  File "/home/zikpefu/ece8550/final/stylegan3/training/training_loop.py", line 278, in training_loop
    loss.accumulate_gradients(phase=phase.name, real_img=real_img, real_c=real_c, gen_z=gen_z, gen_c=gen_c, gain=phase.interval, cur_nimg=cur_nimg)
  File "/home/zikpefu/ece8550/final/stylegan3/training/loss.py", line 81, in accumulate_gradients
    loss_Gmain.mean().mul(gain).backward()
  File "/home/zikpefu/.local/lib/python3.7/site-packages/torch/_tensor.py", line 363, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/zikpefu/.local/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
  File "/home/zikpefu/.local/lib/python3.7/site-packages/torch/autograd/function.py", line 253, in apply
    return user_fn(self, *args)
  File "/home/zikpefu/ece8550/final/stylegan3/torch_utils/ops/grid_sample_gradfix.py", line 50, in backward
    grad_input, grad_grid = _GridSample2dBackward.apply(grad_output, input, grid)
  File "/home/zikpefu/ece8550/final/stylegan3/torch_utils/ops/grid_sample_gradfix.py", line 59, in forward
    grad_input, grad_grid = op(grad_output, input, grid, 0, 0, False)
RuntimeError: aten::grid_sampler_2d_backward() is missing value for argument 'output_mask'. Declaration: aten::grid_sampler_2d_backward(Tensor grad_output, Tensor input, Tensor grid, int interpolation_mode, int padding_mode,  bool align_corners, bool[2] output_mask) -> (Tensor, Tensor)

To Reproduce
Steps to reproduce the behavior:

In 'stylegan3' directory, run command python train.py --outdir=training-runs-anime --cfg=stylegan3-t --data=dataset/anime_dataset.zip --gpus=1 --batch=32 --gamma=0.5 --mirror=1
See error

Expected behavior
If this bug didn't exist, I would be able to train stylegan3-t with anime images.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Linux
PyTorch version 1.11.0
CUDA toolkit version: Cuda compilation tools, release 10.2, V10.2.89
NVIDIA driver version: 470.42.01
GPU V100 with NVLINK
Docker: did you use Docker? No Docker

Additional context
All Desktop info is correct, any help is appreciated

The text was updated successfully, but these errors were encountered:

yuhongxia21 · 2022-04-01T02:30:50Z

I have a same problem

ARTELE · 2022-04-03T23:17:04Z

i have a same problem too

to-mi · 2022-04-04T12:45:42Z

See the related issue in PyTorch repository: pytorch/pytorch#75018

My (non-expert) analysis of this: There was a backwards-incompatible change to grid_sampler_2d_backward in PyTorch 1.11.0 (it now takes in the grad mask to possibly avoid computing unnecessary gradient for the input) and the stylegan3 code is calling this function directly.

ZibbeZabbe · 2022-04-08T10:31:23Z

specifying "pytorch=1.10.2" in the environment.yml should work. I have had issues with anaconda downloading the CPU version so I personally use "pytorch=1.10.2=py3.9_cuda11.3_cudnn8_0"

More information in PDillis/stylegan3-fun#7, includes a pull request with an environment.yml that works for me

lqu · 2022-04-20T00:15:23Z

I had the same problem, and following command fixes it. Using pytorch=1.10 should work, but I'm not using conda.

pip3 install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

@timothybrooks

…mple_gradfix to the new API thanks @timothybrooks for the fix! for #145

jannehellsten · 2022-04-22T14:47:56Z

I pushed a change that should fix stylegan3 for pytorch 1.11. I'd be curious to know if it fixes the problems mentioned in the above thread (it does fix it for me.)

zikpefu · 2022-04-22T21:31:37Z

Thanks @jannehellsten for the change!

secretsather · 2022-05-01T11:23:51Z

@jannehellsten - Still not working for me using above fix.

D:\python\repos\stable\stylegan3\training\augment.py:231: UserWarning: Specified kernel cache directory could not be created! This disables kernel caching. Specified directory is C:\Users\secre\AppData\Local\Temp/torch/kernels. This warning will appear only once per process. (Triggered internally at ..\aten\src\ATen\native\cuda\jit_utils.cpp:860.)
s = torch.exp2(torch.randn([batch_size], device=device) * self.scale_std)

secretsather · 2022-05-01T11:27:41Z

As an update, manually creating the directory worked for me. No warnings now.

nurpax · 2022-05-01T16:04:44Z

I’ve seen this too but I think that’s just a warning? It should work, just that startup takes longer. FWIW it’s a bug in PyTorch: it’s unable to create this directory due to a bug in their cache code.

@timothybrooks

…mple_gradfix to the new API thanks @timothybrooks for the fix! for NVlabs#145

to-mi mentioned this issue Apr 4, 2022

RuntimeError: aten::grid_sampler_2d_backward() is missing value for argument 'output_mask'. pytorch/pytorch#75018

Closed

ZibbeZabbe mentioned this issue Apr 11, 2022

Creating environment results in not being able to train PDillis/stylegan3-fun#7

Closed

jannehellsten added a commit that referenced this issue Apr 22, 2022

pytorch 1.11 support: don't use conv2d_gradfix on v1.11, port grid_sa…

407db86

…mple_gradfix to the new API thanks @timothybrooks for the fix! for #145

vsemecky pushed a commit to vsemecky/stylegan3 that referenced this issue May 15, 2022

pytorch 1.11 support: don't use conv2d_gradfix on v1.11, port grid_sa…

f61d77a

…mple_gradfix to the new API thanks @timothybrooks for the fix! for NVlabs#145

nupurkmr9 mentioned this issue Jan 6, 2023

training with stylegan2 RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented nupurkmr9/vision-aided-gan#11

Open

soubhiksanyal mentioned this issue Feb 28, 2023

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented hongfz16/EVA3D#7

Closed

woctezuma mentioned this issue Jan 10, 2024

RuntimeError: derivative for aten::grid_sampler_2d_backward is not implemented woctezuma/steam-stylegan2-ada-pytorch#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error with grid_sampler_2d_backward() #145

Error with grid_sampler_2d_backward() #145

zikpefu commented Mar 31, 2022 •

edited

Loading

yuhongxia21 commented Apr 1, 2022

ARTELE commented Apr 3, 2022

to-mi commented Apr 4, 2022

ZibbeZabbe commented Apr 8, 2022

lqu commented Apr 20, 2022

jannehellsten commented Apr 22, 2022

zikpefu commented Apr 22, 2022

secretsather commented May 1, 2022

secretsather commented May 1, 2022

nurpax commented May 1, 2022

Error with grid_sampler_2d_backward() #145

Error with grid_sampler_2d_backward() #145

Comments

zikpefu commented Mar 31, 2022 • edited Loading

yuhongxia21 commented Apr 1, 2022

ARTELE commented Apr 3, 2022

to-mi commented Apr 4, 2022

ZibbeZabbe commented Apr 8, 2022

lqu commented Apr 20, 2022

jannehellsten commented Apr 22, 2022

zikpefu commented Apr 22, 2022

secretsather commented May 1, 2022

secretsather commented May 1, 2022

nurpax commented May 1, 2022

zikpefu commented Mar 31, 2022 •

edited

Loading