AMI diarization and overlap detector #4265

desh2608 · 2020-09-11T16:11:51Z

This PR contains the following:

Diarization recipe for AMI mix-headset data --- using AHC, spectral, and VBx clustering
Training and evaluation of a TDNN-LSTM overlap detector

Data preparation and ASR training

ASR (with oracle segments), SAD, and diarization

Decoding and scoring with diarized output

RNNLM rescoring and multichannel recipe

Added diarization scoring for overlapping regions only

Added diarizer auto-download (Pavel's script)

Minor fix

desh2608 · 2020-09-28T20:33:24Z

@RuslanSel thanks for pointing out.
@johnjosephmorgan yeah you are right. I have fixed these issues now. Hopefully should work.

johnjosephmorgan · 2020-09-28T20:56:42Z

@desh2608 I see the changes in local/train_overlap_detector.sh and the deletion in wsj/s5/steps/overlap but I do not see local/overlap

…

On Sep 28, 2020, at 4:33 PM, Desh Raj ***@***.***> wrote: @RuslanSel <https://github.com/RuslanSel> thanks for pointing out. @johnjosephmorgan <https://github.com/johnjosephmorgan> yeah you are right. I have fixed these issues now. Hopefully should work. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7AG5Q6EIMBZOHOTYPH3SIDXKJANCNFSM4RHZTEUQ>.

desh2608 · 2020-09-28T23:00:48Z

@johnjosephmorgan sorry it seems I have been distracted today! Pushed the local/overlap directory now

RuslanSel · 2020-09-29T05:02:36Z

@desh2608. Looks like local/overlap/prepare_overlap_graph.py still missing.

johnjosephmorgan · 2020-09-29T11:34:21Z

Hi @desh2608 I am getting an extra affix _1a appended to a directory name.
I think the problem is in
local/train_overlap_detector.sh
where
local/overlap/run_tdnn_lstm_1a.sh
gets called.
The argument to the --dir option has an affix _1a appended to it.
I think in
local/overlap/run_tdnn_lstm_1a.sh
another affix _1a gets appended to the directory name.

johnjosephmorgan · 2020-09-29T16:43:54Z

desh2608
In the script
steps/overlap/post_process_output.sh
on line 59, the script
local/overlap/output_to_rttm.py
is called.
This needs to be fixed.
I'm not sure if you just need to change the local to steps or if you want to put this script under local/overlap.

johnjosephmorgan · 2020-09-29T18:00:04Z

@desh2608
BTW: I changed the local to steps and stage 9 finished.
I ran through the whole recipe.

desh2608 · 2020-09-29T18:19:30Z

@johnjosephmorgan thanks, I'll change it. I think it's better to keep it under steps since it is a generic script.

Thanks for running through the recipe. Really appreciate it!

desh2608 · 2020-10-03T12:39:52Z

@danpovey perhaps this can be merged.

pzelasko · 2020-10-04T01:44:59Z

Out of curiosity, what DERs did you achieve with these models? I think it would be useful to add a RESULTS file for this directory with this information (also for others to know whether they successfully reproduced the models).

johnjosephmorgan · 2020-10-04T15:24:40Z

FWIW, here is the output of running stage 9 where the evaluation takes place. I don't understand the results yet. ./run.sh: performing overlap detection on dev --convert_data_dir_to_whole true --output-scale 1 2 1 data/dev exp/overlap_1a/tdnn_lstm_1a_1a exp/overlap_1a/dev diff: exp/overlap_1a/dev/final.raw: No such file or directory steps/nnet3/compute_output.sh --nj 15 --cmd run.pl --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 300 --apply-exp true --frame-subsampling-factor 1 data/dev_whole exp/overlap_1a/dev/overlap exp/overlap_1a/dev utils/data/get_utt2dur.sh: data/dev_whole/utt2dur already exists with the expected length. We won't recompute it. local/detect_overlaps.sh: Decoding output utils/data/get_utt2dur.sh: data/dev_whole/utt2dur already exists with the expected length. We won't recompute it. local/detect_overlaps.sh: Created output overlap RTTM at exp/overlap_1a/dev/rttm_overlap ./run.sh: evaluating output.. MISSED SPEAKER TIME = 1711.01 secs ( 40.9 percent of scored speaker time) FALARM SPEAKER TIME = 1062.66 secs ( 25.4 percent of scored speaker time) ./run.sh: performing overlap detection on eval --convert_data_dir_to_whole true --output-scale 1 2 1 data/eval exp/overlap_1a/tdnn_lstm_1a_1a exp/overlap_1a/eval utils/fix_data_dir.sh: file data/eval_whole/utt2spk is not in sorted order or not unique, sorting it fix_data_dir.sh: kept all 16 utterances. fix_data_dir.sh: old files are kept in data/eval_whole/.backup fix_data_dir.sh: kept all 16 utterances. fix_data_dir.sh: old files are kept in data/eval_whole/.backup steps/make_mfcc.sh --mfcc-config conf/mfcc_hires.conf --nj 16 --cmd run.pl --write-utt2num-frames true data/eval_whole utils/validate_data_dir.sh: Successfully validated data-directory data/eval_whole steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. steps/make_mfcc.sh: Succeeded creating MFCC features for eval_whole steps/compute_cmvn_stats.sh data/eval_whole Succeeded creating CMVN stats for eval_whole fix_data_dir.sh: kept all 16 utterances. fix_data_dir.sh: old files are kept in data/eval_whole/.backup diff: exp/overlap_1a/eval/final.raw: No such file or directory steps/nnet3/compute_output.sh --nj 16 --cmd run.pl --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 300 --apply-exp true --frame-subsampling-factor 1 data/eval_whole exp/overlap_1a/eval/overlap exp/overlap_1a/eval utils/data/get_utt2dur.sh: data/eval_whole/utt2dur already exists with the expected length. We won't recompute it. local/detect_overlaps.sh: Decoding output utils/data/get_utt2dur.sh: data/eval_whole/utt2dur already exists with the expected length. We won't recompute it. local/detect_overlaps.sh: Created output overlap RTTM at exp/overlap_1a/eval/rttm_overlap ./run.sh: evaluating output.. MISSED SPEAKER TIME = 1868.99 secs ( 46.1 percent of scored speaker time) FALARM SPEAKER TIME = 856.43 secs ( 21.1 percent of scored speaker time)

…

On 10/3/20, Piotr Żelasko ***@***.***> wrote: Out of curiosity, what DERs did you achieve with these models? I think it would be useful to add a RESULTS file for this directory with this information (also for others to know whether they successfully reproduced the models). -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #4265 (comment)

desh2608 · 2020-10-04T16:18:23Z

@johnjosephmorgan those are somewhat worse than what I got (36.5% missed and 16.1% false alarm). Perhaps there is some hyperparameter config that is different. I'll check and confirm.

@pzelasko The DERs on dev/eval are:
AHC: 27.0/28.3
Spectral: 27.6/26.9
VBx: 26.8/26.2
Note that these are with oracle SAD.

johnjosephmorgan · 2020-10-04T18:23:33Z

@desh2806 recall that I trained the Voxceleb models on MFCC vectors of dimension 30. Could this explain the worse results?

…

On Oct 4, 2020, at 12:18 PM, Desh Raj ***@***.***> wrote: @johnjosephmorgan <https://github.com/johnjosephmorgan> those are somewhat worse than what I got (36.5% missed and 16.1% false alarm). Perhaps there is some hyperparameter config that is different. I'll check and confirm. @pzelasko <https://github.com/pzelasko> The DERs on dev/eval are: AHC: 27.0/28.3 Spectral: 27.6/26.9 VBx: 26.8/26.2 Note that these are with oracle SAD. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7ABLI252O2YJVBKAJ2LSJCN5XANCNFSM4RHZTEUQ>.

desh2608 · 2020-10-04T19:23:35Z

@johnjosephmorgan that shouldn't affect the overlap detection result. The overlap detector is trained separately in this script.

johnjosephmorgan · 2020-10-04T21:10:02Z

@desh2608 I have a different question: Could you train a speech activity detector with the same AMI corpus you used to train the overlap detector? I ask because I'd like to use 16k data to train an SAD instead of the 8k data used in the aspire recipe. On Oct 4, 2020, at 3:23 PM, Desh Raj <[email protected]> wrote:OK

…

@johnjosephmorgan <https://github.com/johnjosephmorgan> that shouldn't affect the overlap detection result. The overlap detector is trained separately in this <https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5c/local/train_overlap_detector.sh> script. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7AAWAJQ4MT46DXCH6YDSJDDUHANCNFSM4RHZTEUQ>.

johnjosephmorgan · 2020-10-04T22:51:15Z

@desh2608 In local/train_overlap_detector.sh you say that at decode time the overlap detector could be used to do SAD. Is it a matter of consolidating the single and overlap labels as speech?

desh2608 · 2020-10-04T23:36:21Z

Yes it can in principle. But the detector is trained using annotations which are not precise enough to do good speech activity detection. The CHiME-6 recipe trains a SAD on 16k data, and the model is also available on kaldi-asr.org.

RuslanSel · 2020-10-05T17:21:32Z

I've got an error message,
when I ran stage 8 local/train_overlap_detector.sh.
steps/nnet3/get_egs_targets.sh: number of utterances 23 in your training data is too small versus --num-utts-subset=12
... you probably have so little data that it doesn't make sense to train a neural net.

desh2608 · 2020-10-05T18:13:09Z

@RuslanSel can you check if your training data was prepared correctly?

RuslanSel · 2020-10-05T18:28:04Z

My data/train/wav.scp has 23 Mix-Headset.wav files.
Is this the right amount?

desh2608 · 2020-10-05T18:40:27Z

That doesn't sound right. I have 133 Mix-Headset wav files in my training set. You can still train with that data, by setting --num-utts-subset to less than 0.25*23, i.e., 5 or fewer, in train_tdnn_lstm_1a.sh, but I doubt the neural network would be able to learn much with so little data.

RuslanSel · 2020-10-05T18:48:41Z

Thanks.
Something wrong with my download.
What $mic is right for
local/ami_download.sh $mic $AMI_DIR
cause now mic: unbound variable

desh2608 · 2020-10-05T18:53:34Z

You can set mic=ihm for this recipe.

johnjosephmorgan · 2020-10-05T18:55:43Z

I have 132 files in data/tran_ovl/wav.scp.

Maybe I'm missing 1 file? @desh2608 could you verify that you have 133 files and not 132.

…

On Oct 5, 2020, at 2:48 PM, RuslanSel ***@***.***> wrote: Thanks. Something wrong with my download. What $mic is right for local/ami_download.sh $mic $AMI_DIR cause now mic: unbound variable — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7AHWT427CRFUDIILPQDSJIIJRANCNFSM4RHZTEUQ>.

RuslanSel · 2020-10-06T06:23:06Z

I set mic=ihm, but there are zero Mixed-Headset.wavs after download.
Only Headset files.

johnjosephmorgan · 2020-12-16T15:08:42Z

Hello @desh2608 Could you post the trained overlap models to the kaldi webpage?

desh2608 added 30 commits May 20, 2020 11:50

data preparation and ASR training

092d8aa

added README for recipe

6d40690

minor fix

4b0e73a

changed uttid naming convention so that spkid is a prefix

e3d5d6b

Merge pull request #1 from jsalt2020-asrdiar/libricss

a97aedc

Data preparation and ASR training

added decoding with oracle segment info

33f6157

added decoding with oracle segments

4a8f787

added speech activity detection with WebRTC-VAD; 3.7% on dev

e025ebe

added diarization stage in decoding; 25% DER on dev

4d20a42

Merge pull request #2 from jsalt2020-asrdiar/libricss

581cee8

ASR (with oracle segments), SAD, and diarization

added decoding for diarized output

aef04b3

added decoding with diarized output; WER 54% on dev (38% with oracle)

61fa30f

Merge pull request kaldi-asr#3 from jsalt2020-asrdiar/libricss

bba30e5

Decoding and scoring with diarized output

added missing files

391f33e

added RNNLM rescoring for oracle segment decoding

fc0697b

rnnlm rescoring for diarized decoding

491a458

minor fix

1a8b7a9

removed redundant script

1cca02d

removed wpe changes

5a10fba

added multichannel recipe

43ce69e

Merge pull request kaldi-asr#4 from jsalt2020-asrdiar/libricss

831c107

RNNLM rescoring and multichannel recipe

added diarization scoring for overlapping regions only

c2a8e38

Merge pull request kaldi-asr#5 from jsalt2020-asrdiar/libricss

1ed173d

Added diarization scoring for overlapping regions only

added diarizer auto-download (Pavel's script)

3744936

Merge pull request kaldi-asr#6 from jsalt2020-asrdiar/libricss

ceebf59

Added diarizer auto-download (Pavel's script)

minor fix'

1424438

minor edit

2ffb955

Merge pull request kaldi-asr#7 from jsalt2020-asrdiar/libricss

d6110de

Minor fix

added BUT's VBx diarization

3172d0f

minor fix

8c9047b

added overlap in local

20162fd

minor fix in path

4ed90a9

danpovey merged commit bcd163c into kaldi-asr:master Oct 3, 2020

desh2608 deleted the ami_ovl branch November 5, 2020 01:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMI diarization and overlap detector #4265

AMI diarization and overlap detector #4265

desh2608 commented Sep 11, 2020

desh2608 commented Sep 28, 2020

johnjosephmorgan commented Sep 28, 2020 via email

desh2608 commented Sep 28, 2020

RuslanSel commented Sep 29, 2020

johnjosephmorgan commented Sep 29, 2020

johnjosephmorgan commented Sep 29, 2020

johnjosephmorgan commented Sep 29, 2020

desh2608 commented Sep 29, 2020 •

edited

Loading

desh2608 commented Oct 3, 2020

pzelasko commented Oct 4, 2020

johnjosephmorgan commented Oct 4, 2020 via email

desh2608 commented Oct 4, 2020

johnjosephmorgan commented Oct 4, 2020 via email

desh2608 commented Oct 4, 2020

johnjosephmorgan commented Oct 4, 2020 via email

johnjosephmorgan commented Oct 4, 2020

desh2608 commented Oct 4, 2020

RuslanSel commented Oct 5, 2020

desh2608 commented Oct 5, 2020

RuslanSel commented Oct 5, 2020

desh2608 commented Oct 5, 2020

RuslanSel commented Oct 5, 2020

desh2608 commented Oct 5, 2020

johnjosephmorgan commented Oct 5, 2020 via email

RuslanSel commented Oct 6, 2020

johnjosephmorgan commented Dec 16, 2020

AMI diarization and overlap detector #4265

AMI diarization and overlap detector #4265

Conversation

desh2608 commented Sep 11, 2020

desh2608 commented Sep 28, 2020

johnjosephmorgan commented Sep 28, 2020 via email

desh2608 commented Sep 28, 2020

RuslanSel commented Sep 29, 2020

johnjosephmorgan commented Sep 29, 2020

johnjosephmorgan commented Sep 29, 2020

johnjosephmorgan commented Sep 29, 2020

desh2608 commented Sep 29, 2020 • edited Loading

desh2608 commented Oct 3, 2020

pzelasko commented Oct 4, 2020

johnjosephmorgan commented Oct 4, 2020 via email

desh2608 commented Oct 4, 2020

johnjosephmorgan commented Oct 4, 2020 via email

desh2608 commented Oct 4, 2020

johnjosephmorgan commented Oct 4, 2020 via email

johnjosephmorgan commented Oct 4, 2020

desh2608 commented Oct 4, 2020

RuslanSel commented Oct 5, 2020

desh2608 commented Oct 5, 2020

RuslanSel commented Oct 5, 2020

desh2608 commented Oct 5, 2020

RuslanSel commented Oct 5, 2020

desh2608 commented Oct 5, 2020

johnjosephmorgan commented Oct 5, 2020 via email

RuslanSel commented Oct 6, 2020

johnjosephmorgan commented Dec 16, 2020

desh2608 commented Sep 29, 2020 •

edited

Loading