Call on more volunteers to add recipes with new datasets or new models #394

luomingshuang · 2022-06-03T08:46:02Z

The next generation Kaldi is developing rapidly. We have gotten some competitive results in some large and popular datasets based on k2, icefall, and Lhotse. Now, we want to apply it to many datasets. We welcome more volunteers to the work of adding recipes with new datasets or new models. You can comment on this issue if you want to add some dataset or model. So I can put your name on the appropriate volunteer place. This can avoid the overlap of everyone's work. (Note: You can choose some dataset by yourself even if it doesn't appear in the following form.)

dataset	model	volunteer
WenetSpeech	pruned_transducer_stateless2	@luomingshuang
AISHELL4	pruned_transducer_stateless5 #399	@luomingshuang
MGB2	conformer_ctc #396	@AmirHussein96
Swithboard		@ngoel17
TAL_CSASR	pruned_transducer_stateless5	@luomingshuang
AISHELL2	pruned_transducer_stateless5 (Done!)	@yuekaizhang
AISHELL3
THCHS-30
TED-LIUM
TED-LIUMv2
Iban
TIBMD@MUC
..	..	..

luomingshuang · 2022-06-03T10:29:04Z

I suggest you can have look at this concrete tutorial https://icefall.readthedocs.io/en/latest/contributing/how-to-create-a-recipe.html

Here, I want to provide a simple tutorial about how to add a recipe quickly and easily. Before you build a recipe, I strongly suggest you look at our other existing recipes https://github.com/k2-fsa/icefall/tree/master/egs and this tutorial https://icefall.readthedocs.io/en/latest/contributing/how-to-create-a-recipe.html. You will find that there is not much you need to modify or add. There are some steps about how to add a recipe for icefall (You don't have to be afraid to make mistakes, because many people will help you complete it together as long as you submit your PR):

if I build a pruned_transducer_stateless2 recipe for an English dataset, such as tedlium:

Firstly, You should make a directory called egs/tedlium/ASR and cd egs/tedlium/ASR. Then you can establish a soft connection for shared with ln -s ../../../egs/librispeech/ASR/shared .
Based on the step 1, you can build a file prepare.sh to prepare the data for training and testing. (I suggest you learn fro m other prepare.sh, such as egs/librispeech/ASR/prepare.sh, egs/tedlium3/ASR/prepare.sh.) You have to build a directory called local. The files and functions used in prepare.sh are in local. You can copy the py files from egs/librispeech/ASR/local/ or egs/tedlium3/ASR/local/. You should use the text to train BPE model. BTW, you have to build a compute_fbank_tedlium.py to compute the fbank feature. If there is no recipe for this dataset in Lhotse, you can submit a PR to build it.
You will get the data for training and decoding after step 2. In this step, you need to build a model directory called pruned_transducer_stateless2. It includes the training and decoding files. Here, you can copy the files from egs/librispeech/ASR/pruned_transducer_stateless2 to egs/tedlium/pruned_transducer_stateless2. And you also need to do some changes where the data is read in train.py and decode.py, modifying to the corresponding dataset name (such as change librispeech to tedlium). Then you can use GPUs for training.
After finishing step 3, you can get some trained model in pruned_transducer_stateless2. You can choose use which decoding method, epoch, and average for decoding.
After finishing training and decoding, you can add README.md and RESULTS.md to record your training log and results.
It will be wonderful if you put your pretrained model and decoding logs to huggingface. About how to build a Colab notebook, you can refer to it https://colab.research.google.com/drive/1CO1bXJ-2khDckZIW8zjOPHGSKLHpTDlp?usp=sharing. About how to build a new model in huggingface, you can refer it https://huggingface.co/csukuangfj/icefall-asr-librispeech-transducer-stateless-bpe-500-2022-02-07.

if I build a pruned_transducer_stateless2 recipe for a Chinese dataset, such as thchs30:

Firstly, You should make a directory called egs/thchs30/ASR and cd egs/thchs30/ASR. Then you can establish a soft connection for shared with ln -s ../../../egs/aishell/ASR/shared .
Based on the step 1, you can build a file prepare.sh to prepare the data for training and testing. (I suggest you learn fro m other prepare.sh, such as egs/wenetspeech/ASR/prepare.sh, egs/aidatatang_200zh/ASR/prepare.sh.) You have to build a directory called local. The files and functions used in prepare.sh are in local. You can copy the py files from egs/wenetspeech/ASR/local/ or egs/aidatatang_200zh/ASR/local/. You can decide whether word segmentation is necessary according to your dataset text. BTW, you have to build a compute_fbank_thchs30.py to compute the fbank feature. If there is no recipe for this dataset in Lhotse, you can submit a PR to build it.
You will get the data for training and decoding after step 2. In this step, you need to build a model directory called pruned_transducer_stateless2. It includes the training and decoding files. Here, you can copy the files from egs/wenetspeech/ASR/pruned_transducer_stateless2 to egs/thchs30/pruned_transducer_stateless2. And you also need to do some changes where the data is read in train.py and decode.py, modifying to the corresponding dataset name (such as change wenetspeech to thchs30). Then you can use GPUs for training.
After finishing step 3, you can get some trained model in pruned_transducer_stateless2. You can choose use which decoding method, epoch, and average for decoding.
After finishing training and decoding, you can add README.md and RESULTS.md to record your training log and results.
It will be wonderful if you can provied a Colab notebook and put your pretrained model and decoding logs to huggingface. About how to build a Colab notebook, you can refer to it https://colab.research.google.com/drive/1EV4e1CHa1GZgEF-bZgizqI9RyFFehIiN?usp=sharing. About how to build a new model in huggingface, you can refer it https://huggingface.co/luomingshuang/icefall_asr_wenetspeech_pruned_transducer_stateless2.

csukuangfj · 2022-06-03T10:30:33Z

Could you add the tutorial to https://github.com/k2-fsa/icefall/tree/master/docs ?

luomingshuang · 2022-06-03T10:34:57Z

Oh, I find there are very concrete tutorial in https://github.com/k2-fsa/icefall/tree/master/docs. I just write a simple tutorial here. I think https://icefall.readthedocs.io/en/latest/contributing/index.html is enough.

ngoel17 · 2022-06-06T20:25:49Z

fisher-swbd recipe coming soon.

yuekaizhang · 2022-06-17T02:30:36Z

I would like try Pruned_Stateless_Transducer_2 on aishell2 if no one is doing it.

csukuangfj · 2022-06-17T02:33:55Z

I would like try Pruned_Stateless_Transducer_2 on aishell2 if no one is doing it.

@yuekaizhang You are very welcome.

PS: Please use Pruned_Stateless_Transducer_4 or Pruned_Stateless_Transducer_5, which supports saving
averaged models periodically during training. It helps to improve the performance.

desh2608 · 2022-08-29T19:52:44Z

Mentioning here to avoid recipe duplication. I will work on recipes for AMI and AliMeeting this fall. For both these datasets, there are close-talk and far-field recordings available. The idea would be to train a single model that can handle both settings. Additionally, we can also use GSS-enhanced multi-channel data for training, although this is optional. (We found during the CHiME-6 challenge that it helps significantly for overlapped speech.)

videodanchik · 2022-09-19T21:57:38Z

I'm working on the Tedlium conformer_ctc2 recipe.

teowenshen · 2022-09-30T08:38:27Z

I am working on a Japanese CSJ recipe. So far I have managed a working lang_char model using the conv_emformer_transducer_stateless2 setup, yielding the preliminary results below at 28 epochs.

dataset	CER
eval1	5.67
eval2	4.2
eval3	4.4

In the spirit of pythonising the recipe, I have rewritten the bash and perl data preparation scripts from kaldi's recipe. However, this yielded a somewhat different transcript than Kaldi, so my results are not directly comparable with espnet and kaldi.

I will send in a pull request once a version comparable to espnet and kaldi is up.

csukuangfj · 2022-09-30T08:44:53Z

I am working on a Japanese CSJ recipe. So far I have managed a working lang_char model using the conv_emformer_transducer_stateless2 setup, yielding the preliminary results below at 28 epochs.

dataset CER
eval1 5.67
eval2 4.2
eval3 4.4
In the spirit of pythonising the recipe, I have rewritten the bash and perl data preparation scripts from kaldi's recipe. However, this yielded a somewhat different transcript than Kaldi, so my results are not directly comparable with espnet and kaldi.

I will send in a pull request once a version comparable to espnet and kaldi is up.

Thanks!

desh2608 · 2022-11-21T15:24:59Z

AMI recipe is now available: #698

AmirHussein96 · 2022-11-22T16:49:38Z

MGB2 is also available: #396

desh2608 · 2022-12-07T05:13:00Z

ASR recipes often require some form of corpus-specific text normalization. We are trying to make such normalizations available in the manifest preparation stage in Lhotse (e.g., see AMI, CHiME-6, AliMeeting recipes in Lhotse). The specific implementations are done in the lhotse.recipes.utils and called using an additional normalize_text argument in the prepare function. If you are working on an ASR recipe for a dataset that requires some specific text normalization, please consider adding this functionality in the Lhotse recipe so that people using Lhotse outside of icefall may also benefit from it.

desh2608 · 2022-12-10T14:04:02Z

AliMeeting multi-condition training recipe is merged: #751

luomingshuang pinned this issue Jun 3, 2022

yuekaizhang mentioned this issue Jul 6, 2022

[Ready] [Recipes] add aishell2 #465

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call on more volunteers to add recipes with new datasets or new models #394

Call on more volunteers to add recipes with new datasets or new models #394

luomingshuang commented Jun 3, 2022 •

edited

Loading

luomingshuang commented Jun 3, 2022 •

edited

Loading

csukuangfj commented Jun 3, 2022

luomingshuang commented Jun 3, 2022

ngoel17 commented Jun 6, 2022

yuekaizhang commented Jun 17, 2022

csukuangfj commented Jun 17, 2022

desh2608 commented Aug 29, 2022

videodanchik commented Sep 19, 2022

teowenshen commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

desh2608 commented Nov 21, 2022 •

edited

Loading

AmirHussein96 commented Nov 22, 2022

desh2608 commented Dec 7, 2022

desh2608 commented Dec 10, 2022

Call on more volunteers to add recipes with new datasets or new models #394

Call on more volunteers to add recipes with new datasets or new models #394

Comments

luomingshuang commented Jun 3, 2022 • edited Loading

luomingshuang commented Jun 3, 2022 • edited Loading

csukuangfj commented Jun 3, 2022

luomingshuang commented Jun 3, 2022

ngoel17 commented Jun 6, 2022

yuekaizhang commented Jun 17, 2022

csukuangfj commented Jun 17, 2022

desh2608 commented Aug 29, 2022

videodanchik commented Sep 19, 2022

teowenshen commented Sep 30, 2022

csukuangfj commented Sep 30, 2022

desh2608 commented Nov 21, 2022 • edited Loading

AmirHussein96 commented Nov 22, 2022

desh2608 commented Dec 7, 2022

desh2608 commented Dec 10, 2022

luomingshuang commented Jun 3, 2022 •

edited

Loading

luomingshuang commented Jun 3, 2022 •

edited

Loading

desh2608 commented Nov 21, 2022 •

edited

Loading