Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about how to get the result in table4 in the paper #4

Closed
cai525 opened this issue Nov 13, 2023 · 2 comments
Closed

Question about how to get the result in table4 in the paper #4

cai525 opened this issue Nov 13, 2023 · 2 comments

Comments

@cai525
Copy link

cai525 commented Nov 13, 2023

Thanks for your code. It's a remarkable work!

I'm trying to reproduce the results in the paper: FINE-TUNE THE PRETRAINED ATST MODEL FOR SOUND EVENT DETECTION. Using the same setting in this repository, I get nearly the same results as the table 2 shown. But when I try to reproduce the result in table 4(i.e. the SOTA result in this paper, psds1=0.583, psds2=0,810), I only get psds1=0.5695,psds2=0.7997, even though I change the setting of the median-filters following the guide of RCT: Random consistency training for semi-supervised sound event detection to {3,28,7, 4,7,22,48,19,10, 50}.

If the difference is caused because of the wrong setting of the median filter length? Or it may be the different mode of torchlighting that causes the difference (I set the torchlighting mode to DP, because some bugs occur when running in the DDP mode with multiply GPUs)? Or it's just because of the randomness of the result?

@SaoYear
Copy link
Member

SaoYear commented Nov 14, 2023

Hi, thanks for your interest!

The training configuration in this work is indeed complicated, which poses obstacles on the reproduction. I would like to answer you question step by step:

  1. I recently modified the home page README file and update the config.yaml file for the stage 2 training. The const_max should be set to 70, as shown in the table 1 of the paper.
  2. I recommend you to use accumulated gradient to first reproduce the results (as shown in the updated documents in the home page). Our experiments are accomplished on the A100-80G GPU, so, we did not use multiple GPU to fine-tune our model. I have already uploaded the stage1 checkpoint in the home page. You could use it for stage 2 fine-tuning, and see if you can reproduce the reported performance (there could be a difference within ~0.005). And you can refer to the last table in the repo home page to configure your training process (batch size=[4, 4, 8, 8] + accm_grad=6).
  3. About multiple GPU training: I don't think DP is a nice choice in this case. Our experiments include a randomly-used mixup augmentation. When using DP, the forward would run in two threads, the outputs are then accumulated to one device to calculate loss. Therefore, chances are that, the accumulated outputs are not using the same forward pass, which leads to the problems (may give Nan for loss) when calculating gradients. I would recommend to use DDP in this case, because it calculates the gradients right in each devices.
  4. Postprocessing: You do not need to change the median filter in the configuration files, the results in table 2 is obtained by the default median filter (7 for all classes). And the results in table 4 used the median filter setups in the stage_2.yaml.

Hope these suggestions helps!

@cai525
Copy link
Author

cai525 commented Nov 14, 2023

Thanks for your detailed reply. I will try to follow your instructions. Thank you again for your remarkable work and your kind response. 👍

@cai525 cai525 closed this as completed Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants