Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use on own audio stream #12

Closed
HeChengHui opened this issue Jun 7, 2024 · 3 comments
Closed

Use on own audio stream #12

HeChengHui opened this issue Jun 7, 2024 · 3 comments

Comments

@HeChengHui
Copy link

@SaoYear
Thank you for your work!
I want to be able to detect a unique audio cue in my video stream that captures audio. Assuming that i have trained a fine-tuned model, how can i go about using it in a causal manner?

@SaoYear
Copy link
Member

SaoYear commented Jun 12, 2024

Hi! Thanks for the interest and sorry for the late response!

The CRNN model for SED is causal while the ATST-SED is not. ATST-SED composes of a pretrained ATST model, which is based on the Transformer model, and since the self-attention modules in ATST is NOT a "masked" version. The ATST cannot be used in an online manner.

As for obtaining the audio cue, if you want to capture the audio cue in an offline way, you can refer to this issue.

Hope these help and please contact me if you have any other problem.

Nian

@HeChengHui
Copy link
Author

@SaoYear

Thank you for the clarification.

The CRNN model for SED is causal

what model are you referring? is it a different repo?

@SaoYear
Copy link
Member

SaoYear commented Jun 13, 2024

Sorry,

the CRNN means the baseline model of the DCASE challenge 2021-2023, you can find it here.

This CRNN is also included in the ATST-SED architecture, which is why I mentioned it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants