Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMI diarization and overlap detector #4265

Merged
merged 65 commits into from
Oct 3, 2020
Merged

Conversation

desh2608
Copy link
Contributor

This PR contains the following:

  • Diarization recipe for AMI mix-headset data --- using AHC, spectral, and VBx clustering
  • Training and evaluation of a TDNN-LSTM overlap detector

Data preparation and ASR training
ASR (with oracle segments), SAD, and diarization
Decoding and scoring with diarized output
RNNLM rescoring and multichannel recipe
Added diarization scoring for overlapping regions only
Added diarizer auto-download (Pavel's script)
@desh2608
Copy link
Contributor Author

@RuslanSel thanks for pointing out.
@johnjosephmorgan yeah you are right. I have fixed these issues now. Hopefully should work.

@johnjosephmorgan
Copy link
Contributor

johnjosephmorgan commented Sep 28, 2020 via email

@desh2608
Copy link
Contributor Author

@johnjosephmorgan sorry it seems I have been distracted today! Pushed the local/overlap directory now

@RuslanSel
Copy link

@desh2608. Looks like local/overlap/prepare_overlap_graph.py still missing.

@johnjosephmorgan
Copy link
Contributor

Hi @desh2608 I am getting an extra affix _1a appended to a directory name.
I think the problem is in
local/train_overlap_detector.sh
where
local/overlap/run_tdnn_lstm_1a.sh
gets called.
The argument to the --dir option has an affix _1a appended to it.
I think in
local/overlap/run_tdnn_lstm_1a.sh
another affix _1a gets appended to the directory name.

@johnjosephmorgan
Copy link
Contributor

desh2608
In the script
steps/overlap/post_process_output.sh
on line 59, the script
local/overlap/output_to_rttm.py
is called.
This needs to be fixed.
I'm not sure if you just need to change the local to steps or if you want to put this script under local/overlap.

@johnjosephmorgan
Copy link
Contributor

@desh2608
BTW: I changed the local to steps and stage 9 finished.
I ran through the whole recipe.

@desh2608
Copy link
Contributor Author

desh2608 commented Sep 29, 2020

@johnjosephmorgan thanks, I'll change it. I think it's better to keep it under steps since it is a generic script.

Thanks for running through the recipe. Really appreciate it!

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 3, 2020

@danpovey perhaps this can be merged.

@danpovey danpovey merged commit bcd163c into kaldi-asr:master Oct 3, 2020
@pzelasko
Copy link
Contributor

pzelasko commented Oct 4, 2020

Out of curiosity, what DERs did you achieve with these models? I think it would be useful to add a RESULTS file for this directory with this information (also for others to know whether they successfully reproduced the models).

@johnjosephmorgan
Copy link
Contributor

johnjosephmorgan commented Oct 4, 2020 via email

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 4, 2020

@johnjosephmorgan those are somewhat worse than what I got (36.5% missed and 16.1% false alarm). Perhaps there is some hyperparameter config that is different. I'll check and confirm.

@pzelasko The DERs on dev/eval are:
AHC: 27.0/28.3
Spectral: 27.6/26.9
VBx: 26.8/26.2
Note that these are with oracle SAD.

@johnjosephmorgan
Copy link
Contributor

johnjosephmorgan commented Oct 4, 2020 via email

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 4, 2020

@johnjosephmorgan that shouldn't affect the overlap detection result. The overlap detector is trained separately in this script.

@johnjosephmorgan
Copy link
Contributor

johnjosephmorgan commented Oct 4, 2020 via email

@johnjosephmorgan
Copy link
Contributor

@desh2608 In local/train_overlap_detector.sh you say that at decode time the overlap detector could be used to do SAD. Is it a matter of consolidating the single and overlap labels as speech?

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 4, 2020

Yes it can in principle. But the detector is trained using annotations which are not precise enough to do good speech activity detection. The CHiME-6 recipe trains a SAD on 16k data, and the model is also available on kaldi-asr.org.

@RuslanSel
Copy link

I've got an error message,
when I ran stage 8 local/train_overlap_detector.sh.
steps/nnet3/get_egs_targets.sh: number of utterances 23 in your training data is too small versus --num-utts-subset=12
... you probably have so little data that it doesn't make sense to train a neural net.

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 5, 2020

@RuslanSel can you check if your training data was prepared correctly?

@RuslanSel
Copy link

My data/train/wav.scp has 23 Mix-Headset.wav files.
Is this the right amount?

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 5, 2020

That doesn't sound right. I have 133 Mix-Headset wav files in my training set. You can still train with that data, by setting --num-utts-subset to less than 0.25*23, i.e., 5 or fewer, in train_tdnn_lstm_1a.sh, but I doubt the neural network would be able to learn much with so little data.

@RuslanSel
Copy link

Thanks.
Something wrong with my download.
What $mic is right for
local/ami_download.sh $mic $AMI_DIR
cause now mic: unbound variable

@desh2608
Copy link
Contributor Author

desh2608 commented Oct 5, 2020

You can set mic=ihm for this recipe.

@johnjosephmorgan
Copy link
Contributor

johnjosephmorgan commented Oct 5, 2020 via email

@RuslanSel
Copy link

I set mic=ihm, but there are zero Mixed-Headset.wavs after download.
Only Headset files.

@desh2608 desh2608 deleted the ami_ovl branch November 5, 2020 01:30
@johnjosephmorgan
Copy link
Contributor

Hello @desh2608 Could you post the trained overlap models to the kaldi webpage?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants