-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AliMeeting multi-condition training recipe #751
Conversation
--max-states 8 | ||
``` | ||
|
||
Pretrained model is available at <https://huggingface.co/desh2608/icefall-asr-alimeeting-pruned-transducer-stateless7> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you upload some test_waves to this huggingface repo?
It makes testing the model easier in sherpa.
Please have a look at
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14/tree/main/test_wavs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no further changes needed, perhaps this can be merged? (There is a failing test, but it seems unrelated to the changes in this PR.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@desh2608 You can try your pre-trained model at https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition
Thanks again for the contribution.
By the way, I have added the model to |
Thanks! I'll check it out. |
@desh2608 (Your test waves are 32-bit encoded using floats. I converted it to 16-bit encoded using int16 in the above PR). |
Thanks. Done. |
This is similar to the AMI ASR recipe, i.e., we train a single model by combining training data from different mic settings: IHM, IHM + reverb, SDM, and GSS-enhanced audio, and use this model to evaluate different conditions.
This recipe also includes text normalization similar to the M2MeT challenge baseline. Here are the results using
--epoch 15 --avg 8
and modified beam search:As a comparison, the eval set results for single-speaker ASR from the baseline paper are:
In the above table, the
Eval-Ali-near
is equivalent to IHM, andEval-Ali-far
is equivalent to SDM.