About lr schedule: how to apply different learning rate to different parameter of the network? #25

rorschach-xiao · 2020-07-19T10:43:21Z

I want to apply different learning rate to backbone parameters and non-backbone parameters, can below configs work?

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005,
                paramwise_cfg = dict(
                    custom_keys={
                        '.backbone': dict(
                            lr_mult=0.1, decay_mult=0.9)
                        }
                    )
                 )

xvjiarui · 2020-07-19T16:14:56Z

Hi @rorschach-xiao
The config you provided will change the learning rate and weight decay of parameters that contain '.backbone' in the name.

You may refer to training tricks for changing head only.

rorschach-xiao · 2020-07-19T17:22:52Z

thanks for your reply! I noticed that in training tricks doc，you add paramwise_cfg in optimizer_config.

optimizer_config=dict(
    paramwise_cfg = dict(
        custom_keys={
            'head': dict(lr_mult=10.)}))

But in training code , mmseg use cfg.optimizer to build optimizer ,

optimizer = build_optimizer(model, cfg.optimizer)

and in mmcv/runner/optimizer/builder.py , the key paramwise_cfg will be popped from cfg.

def build_optimizer(model, cfg):
    optimizer_cfg = copy.deepcopy(cfg)
    constructor_type = optimizer_cfg.pop('constructor',
                                         'DefaultOptimizerConstructor')
    paramwise_cfg = optimizer_cfg.pop('paramwise_cfg', None)
    optim_constructor = build_optimizer_constructor(
        dict(
            type=constructor_type,
            optimizer_cfg=optimizer_cfg,
            paramwise_cfg=paramwise_cfg))
    optimizer = optim_constructor(model)
    return optimizer

I wonder the paramwise_cfg should be added into optimizer or optimizer_config?

xvjiarui · 2020-07-20T00:33:39Z

Hi @rorschach-xiao
Thanks for pointing out. paramwise_cfg should be added into optimizer instead of optimizer_config. I will create a PR to fix the doc.

* add fp16 support in recognition * use fp16 from mmcv * use fp16 from mmcv

xvjiarui added the awaiting response label Jul 19, 2020

xvjiarui mentioned this issue Jul 20, 2020

Fixed training tricks #26

Merged

xvjiarui added documentation Improvements or additions to documentation and removed awaiting response labels Jul 20, 2020

hellock closed this as completed in #26 Jul 20, 2020

bkbjsd mentioned this issue Nov 21, 2020

CUDA error: an illegal memory access was encountered #270

Closed

babakbch mentioned this issue Sep 7, 2021

RuntimeError: CUDA out of memory. #832

Closed

chiba1sonny mentioned this issue Nov 8, 2021

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

Closed

chenhaiwen mentioned this issue Nov 25, 2021

ImportError: And RuntimeError: Broken pipe #1077

Closed

dongbo811 mentioned this issue Mar 28, 2022

The training is normal, but the verification always fails #1427

Closed

deepakkupanda mentioned this issue May 26, 2022

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) #1618

Open

xiaoaxiaoxiaocao mentioned this issue Mar 5, 2023

Help me, binary segmentation acc error! #2628

Closed

sibozhang pushed a commit to sibozhang/mmsegmentation that referenced this issue Mar 22, 2024

Add fp16 support in action recognition (open-mmlab#25)

78a54ee

* add fp16 support in recognition * use fp16 from mmcv * use fp16 from mmcv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About lr schedule: how to apply different learning rate to different parameter of the network? #25

About lr schedule: how to apply different learning rate to different parameter of the network? #25

rorschach-xiao commented Jul 19, 2020

xvjiarui commented Jul 19, 2020

rorschach-xiao commented Jul 19, 2020 •

edited

Loading

xvjiarui commented Jul 20, 2020

About lr schedule: how to apply different learning rate to different parameter of the network? #25

About lr schedule: how to apply different learning rate to different parameter of the network? #25

Comments

rorschach-xiao commented Jul 19, 2020

xvjiarui commented Jul 19, 2020

rorschach-xiao commented Jul 19, 2020 • edited Loading

xvjiarui commented Jul 20, 2020

rorschach-xiao commented Jul 19, 2020 •

edited

Loading