-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMI diarization and overlap detector #4265
Conversation
Data preparation and ASR training
ASR (with oracle segments), SAD, and diarization
Decoding and scoring with diarized output
RNNLM rescoring and multichannel recipe
Added diarization scoring for overlapping regions only
Added diarizer auto-download (Pavel's script)
@RuslanSel thanks for pointing out. |
@desh2608
I see the changes in local/train_overlap_detector.sh and the deletion in
wsj/s5/steps/overlap
but I do not see
local/overlap
… On Sep 28, 2020, at 4:33 PM, Desh Raj ***@***.***> wrote:
@RuslanSel <https://github.com/RuslanSel> thanks for pointing out.
@johnjosephmorgan <https://github.com/johnjosephmorgan> yeah you are right. I have fixed these issues now. Hopefully should work.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7AG5Q6EIMBZOHOTYPH3SIDXKJANCNFSM4RHZTEUQ>.
|
@johnjosephmorgan sorry it seems I have been distracted today! Pushed the |
@desh2608. Looks like local/overlap/prepare_overlap_graph.py still missing. |
Hi @desh2608 I am getting an extra affix _1a appended to a directory name. |
desh2608 |
@desh2608 |
@johnjosephmorgan thanks, I'll change it. I think it's better to keep it under steps since it is a generic script. Thanks for running through the recipe. Really appreciate it! |
@danpovey perhaps this can be merged. |
Out of curiosity, what DERs did you achieve with these models? I think it would be useful to add a RESULTS file for this directory with this information (also for others to know whether they successfully reproduced the models). |
FWIW, here is the output of running stage 9 where the evaluation takes
place. I don't understand the results yet.
./run.sh: performing overlap detection on dev
--convert_data_dir_to_whole true --output-scale 1 2 1 data/dev
exp/overlap_1a/tdnn_lstm_1a_1a exp/overlap_1a/dev
diff: exp/overlap_1a/dev/final.raw: No such file or directory
steps/nnet3/compute_output.sh --nj 15 --cmd run.pl --iter final
--extra-left-context 0 --extra-right-context 0
--extra-left-context-initial -1 --extra-right-context-final -1
--frames-per-chunk 300 --apply-exp true --frame-subsampling-factor 1
data/dev_whole exp/overlap_1a/dev/overlap exp/overlap_1a/dev
utils/data/get_utt2dur.sh: data/dev_whole/utt2dur already exists with
the expected length. We won't recompute it.
local/detect_overlaps.sh: Decoding output
utils/data/get_utt2dur.sh: data/dev_whole/utt2dur already exists with
the expected length. We won't recompute it.
local/detect_overlaps.sh: Created output overlap RTTM at
exp/overlap_1a/dev/rttm_overlap
./run.sh: evaluating output..
MISSED SPEAKER TIME = 1711.01 secs ( 40.9 percent of scored speaker time)
FALARM SPEAKER TIME = 1062.66 secs ( 25.4 percent of scored speaker time)
./run.sh: performing overlap detection on eval
--convert_data_dir_to_whole true --output-scale 1 2 1 data/eval
exp/overlap_1a/tdnn_lstm_1a_1a exp/overlap_1a/eval
utils/fix_data_dir.sh: file data/eval_whole/utt2spk is not in sorted
order or not unique, sorting it
fix_data_dir.sh: kept all 16 utterances.
fix_data_dir.sh: old files are kept in data/eval_whole/.backup
fix_data_dir.sh: kept all 16 utterances.
fix_data_dir.sh: old files are kept in data/eval_whole/.backup
steps/make_mfcc.sh --mfcc-config conf/mfcc_hires.conf --nj 16 --cmd
run.pl --write-utt2num-frames true data/eval_whole
utils/validate_data_dir.sh: Successfully validated data-directory
data/eval_whole
steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp
indexed by utterance.
steps/make_mfcc.sh: Succeeded creating MFCC features for eval_whole
steps/compute_cmvn_stats.sh data/eval_whole
Succeeded creating CMVN stats for eval_whole
fix_data_dir.sh: kept all 16 utterances.
fix_data_dir.sh: old files are kept in data/eval_whole/.backup
diff: exp/overlap_1a/eval/final.raw: No such file or directory
steps/nnet3/compute_output.sh --nj 16 --cmd run.pl --iter final
--extra-left-context 0 --extra-right-context 0
--extra-left-context-initial -1 --extra-right-context-final -1
--frames-per-chunk 300 --apply-exp true --frame-subsampling-factor 1
data/eval_whole exp/overlap_1a/eval/overlap exp/overlap_1a/eval
utils/data/get_utt2dur.sh: data/eval_whole/utt2dur already exists with
the expected length. We won't recompute it.
local/detect_overlaps.sh: Decoding output
utils/data/get_utt2dur.sh: data/eval_whole/utt2dur already exists with
the expected length. We won't recompute it.
local/detect_overlaps.sh: Created output overlap RTTM at
exp/overlap_1a/eval/rttm_overlap
./run.sh: evaluating output..
MISSED SPEAKER TIME = 1868.99 secs ( 46.1 percent of scored speaker time)
FALARM SPEAKER TIME = 856.43 secs ( 21.1 percent of scored speaker time)
…On 10/3/20, Piotr Żelasko ***@***.***> wrote:
Out of curiosity, what DERs did you achieve with these models? I think it
would be useful to add a RESULTS file for this directory with this
information (also for others to know whether they successfully reproduced
the models).
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#4265 (comment)
|
@johnjosephmorgan those are somewhat worse than what I got (36.5% missed and 16.1% false alarm). Perhaps there is some hyperparameter config that is different. I'll check and confirm. @pzelasko The DERs on dev/eval are: |
@desh2806 recall that I trained the Voxceleb models on MFCC vectors of dimension 30.
Could this explain the worse results?
… On Oct 4, 2020, at 12:18 PM, Desh Raj ***@***.***> wrote:
@johnjosephmorgan <https://github.com/johnjosephmorgan> those are somewhat worse than what I got (36.5% missed and 16.1% false alarm). Perhaps there is some hyperparameter config that is different. I'll check and confirm.
@pzelasko <https://github.com/pzelasko> The DERs on dev/eval are:
AHC: 27.0/28.3
Spectral: 27.6/26.9
VBx: 26.8/26.2
Note that these are with oracle SAD.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7ABLI252O2YJVBKAJ2LSJCN5XANCNFSM4RHZTEUQ>.
|
@johnjosephmorgan that shouldn't affect the overlap detection result. The overlap detector is trained separately in this script. |
@desh2608
I have a different question:
Could you train a speech activity detector with the same AMI corpus you used to train the overlap detector?
I ask because I'd like to use 16k data to train an SAD instead of the 8k data used in the aspire recipe.
On Oct 4, 2020, at 3:23 PM, Desh Raj <[email protected]> wrote:OK
…
@johnjosephmorgan <https://github.com/johnjosephmorgan> that shouldn't affect the overlap detection result. The overlap detector is trained separately in this <https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5c/local/train_overlap_detector.sh> script.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7AAWAJQ4MT46DXCH6YDSJDDUHANCNFSM4RHZTEUQ>.
|
@desh2608 In local/train_overlap_detector.sh you say that at decode time the overlap detector could be used to do SAD. Is it a matter of consolidating the single and overlap labels as speech? |
Yes it can in principle. But the detector is trained using annotations which are not precise enough to do good speech activity detection. The CHiME-6 recipe trains a SAD on 16k data, and the model is also available on kaldi-asr.org. |
I've got an error message, |
@RuslanSel can you check if your training data was prepared correctly? |
My data/train/wav.scp has 23 Mix-Headset.wav files. |
That doesn't sound right. I have 133 Mix-Headset wav files in my training set. You can still train with that data, by setting |
Thanks. |
You can set |
I have 132 files in data/tran_ovl/wav.scp.
Maybe I'm missing 1 file?
@desh2608 could you verify that you have 133 files and not 132.
… On Oct 5, 2020, at 2:48 PM, RuslanSel ***@***.***> wrote:
Thanks.
Something wrong with my download.
What $mic is right for
local/ami_download.sh $mic $AMI_DIR
cause now mic: unbound variable
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#4265 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJL7AHWT427CRFUDIILPQDSJIIJRANCNFSM4RHZTEUQ>.
|
I set mic=ihm, but there are zero Mixed-Headset.wavs after download. |
Hello @desh2608 Could you post the trained overlap models to the kaldi webpage? |
This PR contains the following: