Question about how to get the result in table4 in the paper #4

cai525 · 2023-11-13T12:44:02Z

Thanks for your code. It's a remarkable work!

I'm trying to reproduce the results in the paper: FINE-TUNE THE PRETRAINED ATST MODEL FOR SOUND EVENT DETECTION. Using the same setting in this repository, I get nearly the same results as the table 2 shown. But when I try to reproduce the result in table 4（i.e. the SOTA result in this paper, psds1=0.583, psds2=0,810）, I only get psds1=0.5695,psds2=0.7997, even though I change the setting of the median-filters following the guide of RCT: Random consistency training for semi-supervised sound event detection to {3,28,7, 4,7,22,48,19,10, 50}.

If the difference is caused because of the wrong setting of the median filter length? Or it may be the different mode of torchlighting that causes the difference (I set the torchlighting mode to DP, because some bugs occur when running in the DDP mode with multiply GPUs)? Or it's just because of the randomness of the result?

The text was updated successfully, but these errors were encountered:

SaoYear · 2023-11-14T05:25:25Z

Hi, thanks for your interest!

The training configuration in this work is indeed complicated, which poses obstacles on the reproduction. I would like to answer you question step by step:

I recently modified the home page README file and update the config.yaml file for the stage 2 training. The const_max should be set to 70, as shown in the table 1 of the paper.
I recommend you to use accumulated gradient to first reproduce the results (as shown in the updated documents in the home page). Our experiments are accomplished on the A100-80G GPU, so, we did not use multiple GPU to fine-tune our model. I have already uploaded the stage1 checkpoint in the home page. You could use it for stage 2 fine-tuning, and see if you can reproduce the reported performance (there could be a difference within ~0.005). And you can refer to the last table in the repo home page to configure your training process (batch size=[4, 4, 8, 8] + accm_grad=6).
About multiple GPU training: I don't think DP is a nice choice in this case. Our experiments include a randomly-used mixup augmentation. When using DP, the forward would run in two threads, the outputs are then accumulated to one device to calculate loss. Therefore, chances are that, the accumulated outputs are not using the same forward pass, which leads to the problems (may give Nan for loss) when calculating gradients. I would recommend to use DDP in this case, because it calculates the gradients right in each devices.
Postprocessing: You do not need to change the median filter in the configuration files, the results in table 2 is obtained by the default median filter (7 for all classes). And the results in table 4 used the median filter setups in the stage_2.yaml.

Hope these suggestions helps!

cai525 · 2023-11-14T06:29:09Z

Thanks for your detailed reply. I will try to follow your instructions. Thank you again for your remarkable work and your kind response. 👍

cai525 closed this as completed Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about how to get the result in table4 in the paper #4

Question about how to get the result in table4 in the paper #4

cai525 commented Nov 13, 2023

SaoYear commented Nov 14, 2023

cai525 commented Nov 14, 2023

Question about how to get the result in table4 in the paper #4

Question about how to get the result in table4 in the paper #4

Comments

cai525 commented Nov 13, 2023

SaoYear commented Nov 14, 2023

cai525 commented Nov 14, 2023