DyHead PyTorch Implementation #10

Coldestadam · 2021-09-13T22:37:54Z

Hi there,

Please look at my notes in my readme text file, hopefully, you can use this code to get a head start or to at least have the community have something to finish the implementation. If something, is incorrect in the code, I would really like to see the mistakes to see the things I misunderstood.

Thanks,
Adam

ghost · 2021-09-13T22:38:06Z

All CLA requirements met.

jerryzhang-ss · 2021-09-29T20:33:04Z

Hi Adam,

Thanks a lot for your great work, it is really inspiring! However, I personally have some questions regarding scale-aware attention:

It seems we need to know the size of dimension S for this layer. But when we are dealing with multi-scale input that the feature map output from feature extractor is arbitrary, what should we do?
From equation (3), did they use a global average pooling along dimension (S,C)? If that were the case, we might not need the size of dimension S. But what would be the purpose of 1x1 conv layer?

Please correct me if I did not understand it correctly. Again, thank you so much for your effort!

Jerry

Coldestadam · 2021-09-29T23:49:14Z

Hi Jerry,

I am not one of the authors of this paper, but I will try to answer them in the way I understood it. Just keep in mind that I might have things wrong, as what I said in my github repo.

I think the answer to your question is in section 3.1. Basically what the authors did was reshape all the output feature maps of the Feature Pyramid Network(FPN) into one tensor with dimensions (L, S, C). The way they described that was by finding the one output feature map with the median Height and Width (H x W), then you reshape all the other output feature maps by downsampling or upsampling. This was tricky on my part since the built-in RCNN-FPN models in Pytorch have four output feature maps from the FPN, so I decided to just calculate the median of all the heights and widths and then reshaped them to the median size. Therefore after each output has a constant height and width, I concatenated all the outputs into one tensor so the dimensions became (L, H, W, C). Then I just flattened the dimensions of that tensor on the height and width dimension to have the tensor be (L, S, C), then this tensor is passed into the DyHead or any of the individual blocks.
I want you to refer to Figure 1, where (pi_L) is being multiplied by F, pi_L has dimensions (L, C). The 1x1 convolution layer is being done to reduce the dimension S. To do that, the tensor F with dimensions (L, S, C) is transposed to dimensions (S, L, C) then the convolutional layer treats (L, C) as (Height, Width). I admit that the equation makes it confusing, but that is the way I understood it from Figure 1. the 1x1 global average pooling is meant to approximate the function f in that equation. The output of that is passed into the relu and sigmoid, then that is multiplied to F to be the output of the scale-attention layer.
Does that make sense?

Also thanks for your kind words.
Thanks,
Adam

jerryzhang-ss · 2021-09-30T14:33:02Z

Hi Jerry,

I am not one of the authors of this paper, but I will try to answer them in the way I understood it. Just keep in mind that I might have things wrong, as what I said in my github repo.

I think the answer to your question is in section 3.1. Basically what the authors did was reshape all the output feature maps of the Feature Pyramid Network(FPN) into one tensor with dimensions (L, S, C). The way they described that was by finding the one output feature map with the median Height and Width (H x W), then you reshape all the other output feature maps by downsampling or upsampling. This was tricky on my part since the built-in RCNN-FPN models in Pytorch have four output feature maps from the FPN, so I decided to just calculate the median of all the heights and widths and then reshaped them to the median size. Therefore after each output has a constant height and width, I concatenated all the outputs into one tensor so the dimensions became (L, H, W, C). Then I just flattened the dimensions of that tensor on the height and width dimension to have the tensor be (L, S, C), then this tensor is passed into the DyHead or any of the individual blocks.

I want you to refer to Figure 1, where (pi_L) is being multiplied by F, pi_L has dimensions (L, C). The 1x1 convolution layer is being done to reduce the dimension S. To do that, the tensor F with dimensions (L, S, C) is transposed to dimensions (S, L, C) then the convolutional layer treats (L, C) as (Height, Width). I admit that the equation makes it confusing, but that is the way I understood it from Figure 1. the 1x1 global average pooling is meant to approximate the function f in that equation. The output of that is passed into the relu and sigmoid, then that is multiplied to F to be the output of the scale-attention layer.
Does that make sense?

Also thanks for your kind words. Thanks, Adam

Hi Adam,

Thanks for your quick response and detail explaination, it makes your thoughts more clear.

Sorry that I didn't describe my question well at the first place. By "multi-sclae input", I actually meant raw input shape. Some detection frameworks like detectron2 support keep-ratio-resizeing with range of shortest edge value, like here. This can improve the robustness of the detection model, but it will cause the feature shape out from backbone to be arbitrary. So if we fix the s_size, we would probably fail on this scenario.

qdd1234 · 2021-10-31T07:06:21Z

Hi, thanks for your reproduction. I have a question that If I apply Dynamic head to ATSS, the final prediction branch is one? Because ATSS uses three scales to predict which is shown in the picture and Dynamic head needs to concatenate the output of FPN so that the prediction scale is one?

Coldestadam added 17 commits September 10, 2021 18:55

Adding all code

9173985

changed dir name

7722e05

added imgs

66fbdef

moved Firgure 1 img

e2088ee

added task aware img

5bbc44e

dyHead img

c905c65

Finished Demonstration

81c45b2

deleted unnessecary imgs

bb81846

Merge branch 'master' of https://github.com/Coldestadam/DynamicHead

0912cfc

Delete .DS_Store

e2b90d3

Delete .DS_Store

2fc6194

Update README.md

8d69b0a

Delete dog.jpg

8cecb4d

Update README.md

8118f66

Update README.md

69951d4

Update README.md

fc41dec

Update README.md

191e8da

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DyHead PyTorch Implementation #10

DyHead PyTorch Implementation #10

Coldestadam commented Sep 13, 2021

ghost commented Sep 13, 2021 •

edited by ghost

Loading

jerryzhang-ss commented Sep 29, 2021

Coldestadam commented Sep 29, 2021

jerryzhang-ss commented Sep 30, 2021

qdd1234 commented Oct 31, 2021

DyHead PyTorch Implementation #10

Are you sure you want to change the base?

DyHead PyTorch Implementation #10

Conversation

Coldestadam commented Sep 13, 2021

ghost commented Sep 13, 2021 • edited by ghost Loading

jerryzhang-ss commented Sep 29, 2021

Coldestadam commented Sep 29, 2021

jerryzhang-ss commented Sep 30, 2021

qdd1234 commented Oct 31, 2021

ghost commented Sep 13, 2021 •

edited by ghost

Loading