Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AliMeeting multi-condition training recipe #751

Merged
merged 2 commits into from
Dec 10, 2022

Conversation

desh2608
Copy link
Collaborator

@desh2608 desh2608 commented Dec 9, 2022

This is similar to the AMI ASR recipe, i.e., we train a single model by combining training data from different mic settings: IHM, IHM + reverb, SDM, and GSS-enhanced audio, and use this model to evaluate different conditions.

This recipe also includes text normalization similar to the M2MeT challenge baseline. Here are the results using --epoch 15 --avg 8 and modified beam search:

Evaluation set eval CER test CER
IHM 9.58 11.53
SDM 23.37 25.85
MDM (GSS-enhanced) 11.82 14.22

As a comparison, the eval set results for single-speaker ASR from the baseline paper are:

image

In the above table, the Eval-Ali-near is equivalent to IHM, and Eval-Ali-far is equivalent to SDM.

--max-states 8
```

Pretrained model is available at <https://huggingface.co/desh2608/icefall-asr-alimeeting-pruned-transducer-stateless7>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you upload some test_waves to this huggingface repo?
It makes testing the model easier in sherpa.

Please have a look at
https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless8-2022-11-14/tree/main/test_wavs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator Author

@desh2608 desh2608 Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no further changes needed, perhaps this can be merged? (There is a failing test, but it seems unrelated to the changes in this PR.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@desh2608 You can try your pre-trained model at https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition

Thanks again for the contribution.

@csukuangfj csukuangfj merged commit c4aaf3e into k2-fsa:master Dec 10, 2022
@csukuangfj
Copy link
Collaborator

By the way, I have added the model to
https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition

Screen Shot 2022-12-10 at 6 25 25 PM

@desh2608
Copy link
Collaborator Author

By the way, I have added the model to https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition

Screen Shot 2022-12-10 at 6 25 25 PM

Thanks! I'll check it out.

@csukuangfj
Copy link
Collaborator

@desh2608
Could you please merge
https://huggingface.co/desh2608/icefall-asr-alimeeting-pruned-transducer-stateless7/discussions/1

(Your test waves are 32-bit encoded using floats. I converted it to 16-bit encoded using int16 in the above PR).

@desh2608
Copy link
Collaborator Author

@desh2608 Could you please merge https://huggingface.co/desh2608/icefall-asr-alimeeting-pruned-transducer-stateless7/discussions/1

(Your test waves are 32-bit encoded using floats. I converted it to 16-bit encoded using int16 in the above PR).

Thanks. Done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants