Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trivial questions about the used models #3

Closed
Ming-er opened this issue Nov 13, 2023 · 3 comments
Closed

Trivial questions about the used models #3

Ming-er opened this issue Nov 13, 2023 · 3 comments

Comments

@Ming-er
Copy link

Ming-er commented Nov 13, 2023

Dear author, really sorry to bother your again.
I find that the atst-c2f model generally performs better than the atst-frame model no matter in tagging or detection tasks. Why don't you utilize this model to conduct downstream desed training? By the way, will the atst-c2f model be publicly available?

@SaoYear
Copy link
Member

SaoYear commented Nov 13, 2023

Hi, no worry, thanks for your very interesting question!

You r right, according to the work of ATST-Frame, C2F model performances better than ATST-Frame only. The reasons that we did not use it in this work are in two folds:

  1. Simply because of the poorer performance of C2F when we use it in the first training stage of ATST-SED. In this work, we utilized the model fine-tuned on AS-2M as the pretrained model. However, since C2F unavoidably contains the clip-level information distilled from the ATST-Clip, it actually performed poorer than ATST-Frame. The CLS token of ATST-Clip has a negative affect on the SED, according to our previous experience. And the distillation in AS-2M (C2F) trains the ATST-Frame to perform similarly as the ATST-Clip CLS token.

  2. The right way of using C2F in the DESED set is to fine-tune the ATST-Clip-AS_2M first and then distill it to ATST-Frame-AS_2M. However, we did not implement this process in the ATST-SED. Because our main focus was to fine-tune the pre-trained model, and we did not want to complicate the fine-tuning procedure in the development stage. Honestly, we did not know the performance of fine-tuning C2F model yet.

All models trained/fine-tuned in the ATST-Frame will be released in the audiossl repo. We still need some time (in one month) to organize the codes and ckpt files.

@Ming-er
Copy link
Author

Ming-er commented Nov 13, 2023

I get it, thanks for your reply~

@SaoYear
Copy link
Member

SaoYear commented Nov 16, 2023

I close this issue if there is no further question. You are welcome to ask any other question in a new issue : )

@SaoYear SaoYear closed this as completed Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants