Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x) #2348

Closed
positive666 opened this issue Mar 3, 2021 · 24 comments
Closed

Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x) #2348

positive666 opened this issue Mar 3, 2021 · 24 comments
Labels
enhancement New feature or request

Comments

@positive666
Copy link

positive666 commented Mar 3, 2021

🚀 Feature

Add ASFF fuse feature layers to the Head : the level1-level 3 scale maps are respectively fused into 3 corresponding scale feature maps, and the fusion weights are adaptively adjusted.

Motivation

  1. Refer to the feature fusion case of yolov3_asff. paper
  2. Add optional four yolov5_asff models structure (in yaml file )
  3. The ASFF method is very suitable for the YOLO series, and through reading the paper, I found that it has a reasonable explanatory nature. It can be incorporated into an alternative structure of V5.
  4. Integrate ASFF functions into the project and hope to make a contribution for yoloV5 project

Pitch

I add ASFFV5 classes at 310 line in https://github.com/positive666/yolov5/blob/master/models/common.py :
Add asff layers structure for yolov5(s,m,x,l),Integrated into YOLOV5's code project. and different more than v3_asff and add RFB block.such as, yolov5s.yaml:

head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17,20,23], 1, ASFFV5, [0, 512, 0.5 ]],   
   [[17,20,23], 1, ASFFV5, [1, 256, 0.5 ]],   
   [[17,20,23], 1, ASFFV5, [2, 128 ,0.5]],  
  #[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  [[26, 25, 24], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

ASFF Interpretability

The paper also explains why the weight parameter of feature fusion comes from output feature + convolution, because the fusion weight parameter and feature are closely related .
Image

COCO

System test-dev mAP Time (V100) Time (2080ti)
YOLOv3 608 33.0 20ms 26ms
YOLOv3 608+ BoFs 37.0 20ms 26ms
YOLOv3 608 (our baseline) 38.8 20ms 26ms
YOLOv3 608+ ASFF 40.6 22ms 30ms
YOLOv3 608+ ASFF* 42.4 22ms 30ms
YOLOv3 800+ ASFF* 43.9 34ms 38ms
YOLOv3 MobileNetV1 416 + BoFs 28.6 - 22 ms
YOLOv3 MobileNetV2 416 (our baseline) 29.0 - 22 ms
YOLOv3 MobileNetV2 416 +ASFF 30.6 - 24 ms

I also plan to add some other tricks, such as aware IOU, and other transformer idea etc., I will conduct some experiments and changes in the future

@positive666 positive666 added the enhancement New feature or request label Mar 3, 2021
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2021

👋 Hello @positive666, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at [email protected].

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Status

CI CPU testing

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

@cszer
Copy link

cszer commented Mar 3, 2021

Hello , check issues in yolov4 repo , authors of ASFF used all bag of species , and standalone ASFF add only 0.5 MAP

@positive666
Copy link
Author

positive666 commented Mar 3, 2021

Hello , check issues in yolov4 repo , authors of ASFF used all bag of species , and standalone ASFF add only 0.5 MAP

I am very happy to receive your reply. Yes, I have verified similar conclusions on some data sets.I want to integrate this module into V5 for the convenience of subsequent research, and my first addition is to add ASFF after PANnet. The output of this ASFFV5 layer is different from V3. I still need to study and understand in the follow-up. I originally wanted to add BIFPN, but I think the increase in the feature layer and the close connection will increase the training time, thank you for your reply。

@glenn-jocher
Copy link
Member

@positive666 thanks for the idea! I see you submitted a PR, I will take a look there.

I experimented with ASFF with YOLOv3 before, but had difficulty implementing it as we used to build our pytorch models from the darknet cfg files, which placed the output layers in very different places in the model.

I think now with all the output layers located in the Detect() layer, an ASFF implementation should be a bit easier to do.

@positive666
Copy link
Author

positive666 commented Mar 7, 2021

@glenn-jocher Thank you for your reply. Now I'm verifying this on coco.
Another question I have is. For example, my first change was that the data set was 5000 cigarettes detect dataset and the training was 300 epoch Map is always 0.7. I didn't add any additional training data set. I just want to verify that the addition of ASFF doesn't improve significantly . One of my thoughts here is that even the same MAP can't guarantee the reasoning performance in the future. Now I add some lightweight modules of attention mechanism, which have not been submitted in the PR, I will continue to do some experiments.

@glenn-jocher
Copy link
Member

@positive666 I think what you're mentioning is generalization of your results to the wider world. Typically this is why COCO is used a benchmark, as it overlaps many common usecases. It takes a long time to train though, so if you want to prototype results quickly I would recommend VOC, which still generalizes somewhat, but is much smaller and faster to train. You can train VOC in Colab in less than a day, especially the smaller models:

https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en#scrollTo=BSgFCAcMbk1R

# VOC
for b, m in zip([64, 48, 32, 16], ['yolov5s', 'yolov5m', 'yolov5l', 'yolov5x']):  # zip(batch_size, model)
  !python train.py --batch {b} --weights {m}.pt --data voc.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.finetune.yaml --project VOC --name {m}

@cszer
Copy link

cszer commented Mar 8, 2021

I have tried your modification, with my modifications that aims to small objects detection. I have achived 30.5 0.5:0.95 small map (Coco) with 75 Gflops(this module adds 20 Gflops) , but ablation is needed to verify impact

@cszer
Copy link

cszer commented Mar 11, 2021

I have done ablation , this module is useless , adds only 0.4 map to 0.5:0.95 small map for 20 Gflops

@cszer
Copy link

cszer commented Mar 11, 2021

I think now best target to study - convolutions to involutions replacement

@glenn-jocher
Copy link
Member

@cszer involutions?

@cszer
Copy link

cszer commented Mar 11, 2021

@cszer involutions?

Yes, check this paper https://arxiv.org/abs/2103.06255

@glenn-jocher
Copy link
Member

@cszer wow! Just out yesterday. Thanks for the link.

@cszer
Copy link

cszer commented Mar 11, 2021

@cszer wow! Just out yesterday. Thanks for the link.

10 telegram channels help me a lot))

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 11, 2021

@cszer what 10 telegram channels?

Paper seems interesting, a nice bridge between attention (across channels) and convolutions (across image space). AP increase is slight, but it's also accompanied by slight size and FLOPS reductions.
https://github.com/d-li14/involution#object-detection-and-instance-segmentation-on-coco

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 11, 2021

@cszer I've raised issue #1 on the involutions repo (yay): d-li14/involution#1

The straightforward implementation seems to be to use this involution() module here, replacing the MMDetection Conv modules with the local YOLOv5 Conv() module:
https://github.com/d-li14/involution/blob/main/det/mmdet/models/utils/involution_naive.py

@glenn-jocher
Copy link
Member

@cszer I've created an Involution PR #2435 to experiment.

@positive666
Copy link
Author

I have tried your modification, with my modifications that aims to small objects detection. I have achived 30.5 0.5:0.95 small map (Coco) with 75 Gflops(this module adds 20 Gflops) , but ablation is needed to verify impact

I have done ablation , this module is useless , adds only 0.4 map to 0.5:0.95 small map for 20 Gflops

@positive666
Copy link
Author

I have done ablation , this module is useless , adds only 0.4 map to 0.5:0.95 small map for 20 Gflops
@glenn-jocher @cszer ,Hello, I have trained the v5 small scale on VOC before and did some related ablation comparison experiments, and the improvement on the AP of the test set is indeed not big (adding CBAM separately without pre-training weights, on the test set of VOC2007 , Using already trained yolov5:

  1. cbam_v5s mAP@: 0.56 mAP, @.5:.95: 0.3, 16.6 Gflops; (without loading weights)
  2. asff_v5s, mAP@: 0.56 mAP, @.5:.95: 0.38, 20 Gflops;
    But I feel that my own experiments on V5s are not sufficient, and the current simple experiments cannot explain the failure of the attention mechanism. I have been busy recently. I will continue to complete the verification, but I added ASFF and CBAM to do it once. Simple ablation. This attempt has caused me some exploration and thinking. I started to pay attention to some of the difficulties in anchor target detection: the introduction of positive samples and the existence of independent and mutual interference between classification and regression. My thoughts It is about detecting the weak correlation between classification and regression. I plan to use these attention mechanisms to improve the LOSS of classification and regression, such as Aware-IOU. Thank you for your feedback.

@glenn-jocher
Copy link
Member

glenn-jocher commented Mar 12, 2021

@positive666 those mAPs seem pretty low, the baseline VOC training script (below) will train YOLOv5s to about 0.85 [email protected] (and YOLOv5x to about [email protected]):

https://colab.research.google.com/github/ultralytics/yolov5/blob/master/tutorial.ipynb?hl=en#scrollTo=BSgFCAcMbk1R

# VOC
for b, m in zip([64, 48, 32, 16], ['yolov5s', 'yolov5m', 'yolov5l', 'yolov5x']):  # zip(batch_size, model)
  !python train.py --batch {b} --weights {m}.pt --data voc.yaml --epochs 50 --cache --img 512 --nosave --hyp hyp.finetune.yaml --project VOC --name {m}

@glenn-jocher
Copy link
Member

@positive666 BTW, you can see these VOC training logs here:
https://wandb.ai/glenn-jocher/VOC

@developer0hye
Copy link
Contributor

@positive666 @glenn-jocher

How about attention layer proposed in ECANet?

Someone already checked its performance with yolov3-tiny.

Look at this results.

image

@positive666
Copy link
Author

positive666 commented May 17, 2021 via email

@phunix9
Copy link

phunix9 commented May 24, 2021

In general attention module, the improvement of baseline on YOLOV5's public data set is almost negligible. You can try it. There is indeed ECA code in my FORK warehouse, but I did not register it and tried it. I tried CBAM. And COORD, the latter may behave a little normal, but there is no improvement. My personal thinking here is that YOLOV5's backbone has been trained to have good generalization, and you can also train it yourself! good luck

------------------ 原始邮件 ------------------ 发件人: "Yonghye @.>; 发送时间: 2021年5月16日(星期天) 下午2:25 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [ultralytics/yolov5] Add ASFF (three fuse feature layers) int the Head for V5(s,m,l,x) (#2348) @positive666 @glenn-jocher How about attention layer proposed in ECANet? Someone already checked its performance with yolov3-tiny. Look at this results. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

@positive666 Hello, thank you for your contribution. I have a question after adding cbam layer acoording your code. When I start training, loss becomes Nan after several epochs(such as 10 or 100 epochs). However, when I use yolo5s.yaml without cbam layer, it can train successfully. I wonder if you know the reason. Thanks!

@farajist
Copy link

@phunix9 did you find a solution to NaN loss issue ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants