Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolved conflicts for #2722 #2725

Merged
merged 330 commits into from
Sep 26, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
330 commits
Select commit Hold shift + click to select a range
365beff
changed name of README file
johnjosephmorgan Sep 25, 2017
86c31ea
deleted WERs from README
johnjosephmorgan Sep 25, 2017
aa40cae
removed cuda_cmd
johnjosephmorgan Sep 25, 2017
785681b
moved dictionary download to run.sh
johnjosephmorgan Sep 25, 2017
1e3c10c
moved dictionary download to run.sh
johnjosephmorgan Sep 25, 2017
ab5e009
using ./utils/format_lm.sh
johnjosephmorgan Sep 25, 2017
315c0ba
fixed indentation
johnjosephmorgan Sep 25, 2017
bdc27a0
deleted redundant ivector step that is already done in chain model sc…
johnjosephmorgan Sep 25, 2017
f11fbcf
using directory data/local/lm instead of language_models to keep ever…
johnjosephmorgan Sep 25, 2017
1931b23
removed hostname command since the -d option is not compatible with bsd
johnjosephmorgan Sep 25, 2017
be1276e
put mini batch chun sizes back to 256,128,64 and set initial and fina…
Sep 26, 2017
ca7b814
changed --trainer.num-chunk-per-minibatch=256,128,64 to --tra…
johnjosephmorgan Sep 26, 2017
2424eba
put mini batch chun sizes back to 256,128,64 and set initial and fina…
johnjosephmorgan Sep 26, 2017
95e8b8d
merging
Sep 27, 2017
9d728b5
merge
Sep 27, 2017
f88b5ae
merging
Sep 27, 2017
84a0566
merging
Sep 27, 2017
775ea4a
merging
Sep 27, 2017
65cb7b2
removed _tgsmall suffix and exiting after stage 15
johnjosephmorgan Sep 27, 2017
2ffc077
moved monophone decoding to after training and using data/lang_test i…
johnjosephmorgan Sep 27, 2017
b2a36dd
delete lm rescoring step
johnjosephmorgan Sep 27, 2017
d407efe
removed _tgsmall suffix and removed decoding with lm rescoring
johnjosephmorgan Sep 27, 2017
be3d0b9
moved evaluation stages to immediately after training stages
johnjosephmorgan Sep 27, 2017
0ff6ff4
changing mini librispeech hard coded location to LDC Heroico corpus
johnjosephmorgan Sep 30, 2017
33c4e5e
merging
Oct 2, 2017
1ab3cdc
merging
Oct 2, 2017
689fbcc
added tests using gp lm
johnjosephmorgan Oct 2, 2017
ff71212
merging
johnjosephmorgan Oct 2, 2017
557a638
fixing references to new lm in format command
johnjosephmorgan Oct 2, 2017
c7a4efc
Spanish lm is named SP not ES
johnjosephmorgan Oct 2, 2017
f87bbd1
merge
Oct 3, 2017
28ad257
added simple and gplm to testing
johnjosephmorgan Oct 3, 2017
41413b5
merge
Oct 4, 2017
4e810e9
changed nj from 30 to 20
johnjosephmorgan Oct 4, 2017
595ce4c
added scripts to build lm with the subs corpus
johnjosephmorgan Oct 4, 2017
f54b927
included lm building steps with subs corpus
johnjosephmorgan Oct 4, 2017
1f1c444
got rid of the GlobalPhone LM. It probably was trained on text in a d…
johnjosephmorgan Oct 5, 2017
6ca1cfb
merge
Oct 6, 2017
39fec5b
consolidated subs processing in 1 script
johnjosephmorgan Oct 6, 2017
2b45c0e
deleted subs scripts that were consolidated into local/subs_restrict_…
johnjosephmorgan Oct 6, 2017
f4b8c93
updated processing of subs data for lm
johnjosephmorgan Oct 6, 2017
9b322dc
moved data prep to script under local
johnjosephmorgan Oct 8, 2017
177026b
moved subs lm prep to prepare lm script
johnjosephmorgan Oct 8, 2017
5028b5f
deleted multiple lm code
johnjosephmorgan Oct 8, 2017
6b1d7c0
deleted subs prep script
johnjosephmorgan Oct 8, 2017
35263af
moved to subs trained lm
johnjosephmorgan Oct 9, 2017
f692ea2
fixed paths to directories and files
johnjosephmorgan Oct 9, 2017
c540add
simplified names of files
johnjosephmorgan Oct 9, 2017
cebb778
minor fixes
johnjosephmorgan Oct 9, 2017
2e46612
minor stage changes
johnjosephmorgan Oct 10, 2017
1164dce
added lower casing
johnjosephmorgan Oct 10, 2017
f4b23f7
merge
Oct 10, 2017
3c6698e
condition text remove punctuation down case
johnjosephmorgan Oct 10, 2017
d23bba7
process with conditioned text and in vocabulary data
johnjosephmorgan Oct 10, 2017
541341e
update to lm preparation scripts
johnjosephmorgan Oct 10, 2017
1a86d9f
merge
Oct 11, 2017
30e527e
updated WER scores
johnjosephmorgan Oct 11, 2017
eb01599
changed hard coded clsp disk names
johnjosephmorgan Oct 11, 2017
f9d7a08
merge
Oct 12, 2017
4e5be51
added WER scores obtained on CLSP cluster
johnjosephmorgan Oct 12, 2017
82bf0f3
Changed reference to mini_librispeech to heroico. This probably shoul…
johnjosephmorgan Oct 15, 2017
c4566b0
more fixes to clsp cluster case, trying to mimic mini_librispeech setup
johnjosephmorgan Oct 15, 2017
d9759ae
Changed data splitting from mini_librispeech setup to heroico.
johnjosephmorgan Oct 15, 2017
8bee95a
Removed special data splitting on clsp cluster.
johnjosephmorgan Oct 16, 2017
8694f3e
merge
Oct 17, 2017
4d5760b
white space instead of tabs
johnjosephmorgan Nov 7, 2017
e05a62a
small tweaks for running on the CLSP cluster
jtrmal Nov 10, 2017
b981622
training text is now an input variable
johnjosephmorgan Nov 12, 2017
31051ca
training lm on acoustic model text instead of subs
johnjosephmorgan Nov 12, 2017
3477db1
merge
Nov 13, 2017
09d03e6
merge
Nov 13, 2017
d39a0ba
adding subs back in
johnjosephmorgan Nov 13, 2017
e4dba94
fixing argument formatting
johnjosephmorgan Nov 14, 2017
093320b
added arg 0 to echo statements
johnjosephmorgan Nov 14, 2017
b59d91a
adressing overfitting comment, number of leave, proportional shrinkin…
johnjosephmorgan Nov 14, 2017
dcb7089
merge
Nov 14, 2017
4019ab5
merge
Nov 14, 2017
947fcb3
separated out the recordings with overlapping prompts and put it in a…
johnjosephmorgan Nov 14, 2017
d2699c8
made a devtest set with recordings that had overlapping prompts
johnjosephmorgan Nov 14, 2017
1ca3490
added devtest
johnjosephmorgan Nov 14, 2017
38ae387
made devtest with recordings that had prompts that overlapped with US…
johnjosephmorgan Nov 14, 2017
7507b79
added devtest
johnjosephmorgan Nov 14, 2017
fedc016
merge
Nov 15, 2017
2aef442
going back a couple of steps
johnjosephmorgan Nov 15, 2017
dd039b9
added the first script to the chain tuning
johnjosephmorgan Nov 15, 2017
8cf0afe
linked the first chain script to the 1a script under the tuning direc…
johnjosephmorgan Nov 15, 2017
927b0e3
changed affix variable to 1a from 1c and added devtest to test set
johnjosephmorgan Nov 15, 2017
35b1c71
added a 1b script for tuning the number of leaves in the tree
johnjosephmorgan Nov 15, 2017
a1c9428
lowering layer dimension from 512 to 400
johnjosephmorgan Nov 15, 2017
b74d052
replaced the old WER scores with the new scores
johnjosephmorgan Nov 15, 2017
808051e
inserted info and WERs
johnjosephmorgan Nov 15, 2017
967f4d8
working on corpus description
johnjosephmorgan Nov 15, 2017
d7a8cc7
linked to 1b
johnjosephmorgan Nov 15, 2017
c8df419
merge
Nov 16, 2017
43b1891
merge
Nov 16, 2017
eac5181
put layer dimension back to 512, number of leaves down to 2000 from 3…
johnjosephmorgan Nov 16, 2017
3589bd2
removes proportional shrink
johnjosephmorgan Nov 16, 2017
acfe5af
merge
Nov 27, 2017
23e2729
made a variable for number of leaves
johnjosephmorgan Nov 27, 2017
9cc9cdf
merge
Nov 28, 2017
69d3967
merge
Nov 28, 2017
21a329c
experiment 1b changes the number of leaves from 3500 to 2000
johnjosephmorgan Nov 28, 2017
d64b40a
inserted info and WERS
johnjosephmorgan Nov 28, 2017
45ae460
comment out table
johnjosephmorgan Nov 28, 2017
843bcfd
commented out table
johnjosephmorgan Nov 28, 2017
42585b2
experiment 1c removes proportional shrinking
johnjosephmorgan Nov 29, 2017
7d6cef7
merge
Nov 29, 2017
5b01124
changed affix to 1b from 1a
johnjosephmorgan Nov 29, 2017
af0bec0
added WERS and info
johnjosephmorgan Nov 29, 2017
508e52c
1c sets number of leaves to 2500
johnjosephmorgan Nov 29, 2017
cf2617b
merge
Nov 29, 2017
8c71937
remove proportional shrink option
johnjosephmorgan Nov 29, 2017
5d4b4de
added WERs and info
johnjosephmorgan Nov 29, 2017
311257f
merge
Nov 30, 2017
21ad50d
merge
Nov 30, 2017
e7c120d
added l2 regularization opts
johnjosephmorgan Nov 30, 2017
d7eec95
modified layer definitions
johnjosephmorgan Nov 30, 2017
51d810a
adding the script to compare WERs from mini_librispeech adapted to t…
johnjosephmorgan Nov 30, 2017
4df2d76
merge
johnjosephmorgan Nov 30, 2017
4c1c04c
update with WERs and info
johnjosephmorgan Nov 30, 2017
b826f00
lower epochs to 7 from 10
johnjosephmorgan Nov 30, 2017
d5d596e
setting to heroico test sets
Dec 1, 2017
53ae873
printing my test sets
johnjosephmorgan Dec 1, 2017
2691e18
printing heroico test sets
johnjosephmorgan Dec 1, 2017
329cfda
inserted WERs info and comparison to 1d
johnjosephmorgan Dec 1, 2017
fccb4e1
minor deletion
johnjosephmorgan Dec 1, 2017
62dd291
lowering epochs again
johnjosephmorgan Dec 1, 2017
aeb0c3b
added results
johnjosephmorgan Dec 1, 2017
bf9fce4
set epochs to 5 instead of 4
johnjosephmorgan Dec 1, 2017
25ab21e
update info and results
johnjosephmorgan Dec 7, 2017
7d005f7
tested with layer dimension set to 256
johnjosephmorgan Dec 7, 2017
c3bef6c
tested with layer dimensions set to 128
johnjosephmorgan Dec 7, 2017
c88a123
symlink to best tuning run 1e
johnjosephmorgan Dec 7, 2017
ddbdb04
removed a misplaced script
johnjosephmorgan Dec 10, 2017
1ed5ae3
Delete apache license file.
johnjosephmorgan Jan 11, 2018
795d31c
Deleted non-best tuning scripts.
johnjosephmorgan Jan 11, 2018
0d62906
deleted link to current best tuning script.
johnjosephmorgan Jan 11, 2018
1227161
Linked run tdnn script to new best tuning run.
johnjosephmorgan Jan 11, 2018
a134fbe
Configuration files for pitch and plp feature extraction.
johnjosephmorgan Apr 5, 2018
329a51b
Format fixing.
johnjosephmorgan Apr 5, 2018
5da92ab
Format fixing.
johnjosephmorgan Apr 5, 2018
d003c61
Updating run script.
johnjosephmorgan Apr 6, 2018
d7b698a
Fixing run script.
johnjosephmorgan Apr 6, 2018
9082c69
Changed lang_test to lang.
johnjosephmorgan Apr 6, 2018
db08d28
mfcc to plp_pitch.
johnjosephmorgan Apr 6, 2018
8426246
Forgot to delete the mfcc_.
johnjosephmorgan Apr 6, 2018
e4350b1
No ivectors no hires no sp.
johnjosephmorgan Apr 9, 2018
013aaa2
Removed a dires directory.
johnjosephmorgan Apr 9, 2018
63bdf7f
Fixed alli dir.
johnjosephmorgan Apr 9, 2018
2f5e70b
Deleted affix.
johnjosephmorgan Apr 9, 2018
eb5c86b
Fixing tree step.
johnjosephmorgan Apr 9, 2018
a8f2b92
Fixed typo.
johnjosephmorgan Apr 9, 2018
758e69b
No ivector directory.
johnjosephmorgan Apr 9, 2018
65506ce
Fixing directories.
johnjosephmorgan Apr 9, 2018
83adc0c
Removing ivector layer.
johnjosephmorgan Apr 9, 2018
14a8ac6
Removed ivector feature option from training command.
johnjosephmorgan Apr 9, 2018
aac8519
Input dimension is 16 not 40.
johnjosephmorgan Apr 9, 2018
1ceffa4
Cleaning up.
johnjosephmorgan Apr 10, 2018
292fd00
Proportional shrink.
johnjosephmorgan Apr 10, 2018
2fac95a
Commented out lines with results.
johnjosephmorgan Apr 10, 2018
575a639
Added script to download subs corpus.
johnjosephmorgan Jun 18, 2018
f98f1d2
Fixing subs data prep.
johnjosephmorgan Jun 18, 2018
56a6281
Fixed subs data prep.
johnjosephmorgan Jun 18, 2018
494391e
Minor stage shifts.
johnjosephmorgan Jun 18, 2018
718d881
Added script to download heroico resources.
johnjosephmorgan Jun 18, 2018
1f1df35
Putting downloads in scripts under local.
johnjosephmorgan Jun 18, 2018
3470873
Hardwire paths.
johnjosephmorgan Jun 18, 2018
7812faa
Make download directory here.
johnjosephmorgan Jun 18, 2018
fc9e879
Not making download directory in run.sh script.
johnjosephmorgan Jun 18, 2018
746ed7e
Changed permissions.
johnjosephmorgan Jun 18, 2018
04cdbd3
Fixed data directory path variable.
johnjosephmorgan Jun 18, 2018
5856c84
Fixing data directory paths.
johnjosephmorgan Jun 18, 2018
976dc24
fix_data_dir.sh not fix_datadir.sh.
johnjosephmorgan Jun 18, 2018
83f0029
Stage end problem fixed.
johnjosephmorgan Jun 18, 2018
7b1c21e
mfcc not plp_pitch.
johnjosephmorgan Jun 18, 2018
3fdccf6
Removed a nasty bug. An extra s on the config directory.
johnjosephmorgan Jun 19, 2018
71abd90
Changed lang_test to just lang.
johnjosephmorgan Jun 19, 2018
b037b80
Only one final blank.
johnjosephmorgan Jun 21, 2018
0563a7a
Deleted ./.
johnjosephmorgan Jun 21, 2018
31bf353
Updated results with Dan Poveys command.
johnjosephmorgan Jun 21, 2018
d490178
I copied the cmd.sh from mini_librispeech.
johnjosephmorgan Jun 21, 2018
1e3ecc9
cmd.sh from mini_librispeech
johnjosephmorgan Jun 21, 2018
ebe34ae
Reconciling Dan Poveys changes with the subs lm scripts.
johnjosephmorgan Jun 21, 2018
a123db9
Minor fixes.
johnjosephmorgan Jun 21, 2018
38e8bca
Fixed paths.
johnjosephmorgan Jun 21, 2018
0d885f3
Reconciling with Dan Poveys changes.
johnjosephmorgan Jun 21, 2018
935d76a
Reconciling with Dan Poveys changes.
johnjosephmorgan Jun 21, 2018
524bcd8
Reconciling with Dan Poveys changes.
johnjosephmorgan Jun 21, 2018
9ea95d8
Fixing paths.
johnjosephmorgan Jun 21, 2018
69c2879
Updated WERs.
johnjosephmorgan Jun 21, 2018
64abf9f
Fixed symtab variable. Why did this not get committed before?
johnjosephmorgan Jun 22, 2018
c4e1c4e
I deleted words incorrectly.
johnjosephmorgan Jun 22, 2018
53a5017
Deleted line that removes temporary work.
johnjosephmorgan Jun 25, 2018
bf7c645
lang_test instead of lang.
johnjosephmorgan Jun 25, 2018
5820469
Including mono and chain results.
johnjosephmorgan Jun 25, 2018
08c77da
Download in pwd so we do not keep downloading after removing data dir…
johnjosephmorgan Jul 16, 2018
ccc4da6
Download to pwd.
johnjosephmorgan Jul 16, 2018
5a2a60c
Data is in pwd.
johnjosephmorgan Jul 16, 2018
9abf709
Downloading is done in stage -1.
johnjosephmorgan Jul 16, 2018
1ae0384
Downloaded dictionary is in pwd.
johnjosephmorgan Jul 16, 2018
76f445d
Use in_vocabulary.txt instead of es.txt to train lm.
johnjosephmorgan Jul 16, 2018
839e199
updating.
johnjosephmorgan Jul 22, 2018
81a61bb
Addressed conflicts.
johnjosephmorgan Jul 26, 2018
aff7501
cnn tdnn chain model script.
johnjosephmorgan Jul 26, 2018
1862797
Inserted nj variable.
johnjosephmorgan Jul 27, 2018
cacf3cd
Added thread variable.
johnjosephmorgan Jul 27, 2018
9e07bfa
Recognize that speakers are common across recordings and answers in m…
johnjosephmorgan Jul 27, 2018
ce24c6e
Download subs first .
johnjosephmorgan Jul 30, 2018
7d31d4e
Moved open into else clause.
johnjosephmorgan Jul 30, 2018
6759c9f
Removed varaibles for lexicon, corpus and subs.
johnjosephmorgan Jul 30, 2018
01bdbbc
Replaced sed with tr.
johnjosephmorgan Aug 1, 2018
7b7340b
Use more readable variable names.
johnjosephmorgan Aug 1, 2018
70b19b9
Minor changes.
johnjosephmorgan Aug 1, 2018
bdd0880
Removed plp and pitch config files.
johnjosephmorgan Aug 1, 2018
df3fda4
Removed chain cnn tdnn script.
johnjosephmorgan Aug 1, 2018
dac4362
Merge branch 'master' into jjm_kaldi
johnjosephmorgan Aug 4, 2018
5271745
Put the speech and lexicon data source variable in run.sh so people k…
johnjosephmorgan Aug 6, 2018
5e7a92b
Put the subs text data source variable in run.sh so people know wher…
johnjosephmorgan Aug 6, 2018
52c6f37
Make the data source variables arguments to the download scripts unde…
johnjosephmorgan Aug 6, 2018
31ff767
Merge branch 'master' into jjm_kaldi
johnjosephmorgan Aug 7, 2018
82062a5
Update triphone results after training lm on subs corpus.
johnjosephmorgan Aug 7, 2018
1812c1b
Added chain model results after training lm with subs.
johnjosephmorgan Aug 7, 2018
66c6808
Added latest chain model results.
johnjosephmorgan Aug 7, 2018
64aea9a
Make corpus directory an argument to data prep script.
johnjosephmorgan Aug 8, 2018
fab4432
Adding grammar decoding.
johnjosephmorgan Sep 7, 2018
f1d2eee
Added recipe for Tunisian Modern Standard Arabic corpus.
johnjosephmorgan Sep 19, 2018
4d6b615
Removing grammar directory.
johnjosephmorgan Sep 19, 2018
bd6f86d
Cleaning up.
johnjosephmorgan Sep 19, 2018
3705077
Fixed affix.
johnjosephmorgan Sep 19, 2018
b1152ed
Added requirements.
johnjosephmorgan Sep 19, 2018
d2e55c9
Merge.
johnjosephmorgan Sep 19, 2018
f2ac53c
resolved conflicts
xiaohui-zhang Sep 19, 2018
f01cf69
mfcc instead of plp features.
johnjosephmorgan Sep 19, 2018
fe7d648
mfcc instead of plp features.
johnjosephmorgan Sep 19, 2018
de2add2
Merge branch 'jjm_kaldi' of https://github.com/johnjosephmorgan/kaldi…
xiaohui-zhang Sep 20, 2018
c553b88
Remove header from qcri lexicon.
johnjosephmorgan Sep 20, 2018
ae683ff
Use a python script instead of a perl module to convert buckwalter to…
johnjosephmorgan Sep 20, 2018
9c73132
Adding a python script from Andy Roberts that converts buckwalter enc…
johnjosephmorgan Sep 20, 2018
3e92297
Use python script and bash shell script to convert qcri lexicon from …
johnjosephmorgan Sep 20, 2018
fb42057
Remove the exit.
johnjosephmorgan Sep 20, 2018
aac91a8
Skip initial blank in non silence list.
johnjosephmorgan Sep 21, 2018
e21f930
If the lexicon gets downloaded, unzip it and remove the header.
johnjosephmorgan Sep 21, 2018
a067e01
Merge branch 'jjm_kaldi' of https://github.com/johnjosephmorgan/kaldi…
xiaohui-zhang Sep 21, 2018
c86008e
Fix cut command to extend to end of line with -f 2- option.
johnjosephmorgan Sep 21, 2018
8043e34
Inserted a case for MACOSX.
johnjosephmorgan Sep 21, 2018
f4bddba
adding results
xiaohui-zhang Sep 26, 2018
30f115f
Merge branch 'master' of https://github.com/kaldi-asr/kaldi into tmp0919
xiaohui-zhang Sep 26, 2018
47be46d
minor fix
xiaohui-zhang Sep 26, 2018
85e633d
Change queue.pl to pbs.pl in error messages.
johnjosephmorgan Sep 26, 2018
3aab2bf
Merge branch 'jjm_kaldi' of https://github.com/johnjosephmorgan/kaldi…
xiaohui-zhang Sep 26, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
resolved conflicts
  • Loading branch information
xiaohui-zhang committed Sep 19, 2018
commit f2ac53cec332ae24426f68766e9c13bb17451a33
65 changes: 44 additions & 21 deletions egs/heroico/s5/RESULTS
Original file line number Diff line number Diff line change
@@ -1,25 +1,48 @@
# for dir in $(echo exp/tri*/decode* | grep -v 'si/'); do grep WER $dir/wer* | utils/best_wer.sh; done

%WER 17.44 [ 1334 / 7650, 215 ins, 245 del, 874 sub ] exp/tri1/decode_devtest/wer_14_0.5
%WER 10.16 [ 762 / 7498, 68 ins, 126 del, 568 sub ] exp/tri1/decode_native/wer_16_0.5
%WER 16.60 [ 1530 / 9215, 175 ins, 222 del, 1133 sub ] exp/tri1/decode_nonnative/wer_15_0.5
%WER 13.74 [ 2297 / 16713, 248 ins, 344 del, 1705 sub ] exp/tri1/decode_test/wer_15_0.5
%WER 17.53 [ 1341 / 7650, 243 ins, 224 del, 874 sub ] exp/tri2b/decode_devtest/wer_16_0.0
%WER 9.23 [ 692 / 7498, 58 ins, 108 del, 526 sub ] exp/tri2b/decode_native/wer_17_0.5
%WER 17.19 [ 1584 / 9215, 200 ins, 205 del, 1179 sub ] exp/tri2b/decode_nonnative/wer_17_0.5
%WER 13.67 [ 2284 / 16713, 258 ins, 312 del, 1714 sub ] exp/tri2b/decode_test/wer_17_0.5
%WER 15.74 [ 1204 / 7650, 224 ins, 193 del, 787 sub ] exp/tri3b/decode_devtest/wer_17_0.0
%WER 20.14 [ 1541 / 7650, 287 ins, 234 del, 1020 sub ] exp/tri3b/decode_devtest.si/wer_16_0.0
%WER 6.80 [ 510 / 7498, 48 ins, 57 del, 405 sub ] exp/tri3b/decode_native/wer_17_0.0
%WER 11.20 [ 840 / 7498, 110 ins, 114 del, 616 sub ] exp/tri3b/decode_native.si/wer_17_1.0
%WER 14.22 [ 1310 / 9215, 150 ins, 164 del, 996 sub ] exp/tri3b/decode_nonnative/wer_17_1.0
%WER 22.43 [ 2067 / 9215, 322 ins, 234 del, 1511 sub ] exp/tri3b/decode_nonnative.si/wer_17_1.0
%WER 11.00 [ 1838 / 16713, 183 ins, 249 del, 1406 sub ] exp/tri3b/decode_test/wer_17_1.0
%WER 17.47 [ 2919 / 16713, 437 ins, 352 del, 2130 sub ] exp/tri3b/decode_test.si/wer_17_1.0
# old results before adding Movie subtitles text corpus in LM training:
# %WER 67.01 [ 5126 / 7650, 837 ins, 575 del, 3714 sub ] exp/tri1/decode_devtest/wer_14_1.0
# %WER 62.39 [ 4678 / 7498, 768 ins, 397 del, 3513 sub ] exp/tri1/decode_native/wer_13_1.0
# %WER 67.05 [ 6179 / 9215, 895 ins, 606 del, 4678 sub ] exp/tri1/decode_nonnative/wer_13_1.0
# %WER 64.97 [ 10859 / 16713, 1678 ins, 999 del, 8182 sub ] exp/tri1/decode_test/wer_13_1.0
# %WER 65.90 [ 5041 / 7650, 1016 ins, 416 del, 3609 sub ] exp/tri2b/decode_devtest/wer_12_1.0
# %WER 61.26 [ 4593 / 7498, 908 ins, 300 del, 3385 sub ] exp/tri2b/decode_native/wer_14_1.0
# %WER 67.51 [ 6221 / 9215, 1085 ins, 524 del, 4612 sub ] exp/tri2b/decode_nonnative/wer_14_1.0
# %WER 64.87 [ 10842 / 16713, 2004 ins, 838 del, 8000 sub ] exp/tri2b/decode_test/wer_14_1.0
# %WER 66.09 [ 5056 / 7650, 1078 ins, 402 del, 3576 sub ] exp/tri3b/decode_devtest/wer_16_1.0
# %WER 74.88 [ 5728 / 7650, 1210 ins, 426 del, 4092 sub ] exp/tri3b/decode_devtest.si/wer_15_1.0
# %WER 61.19 [ 4588 / 7498, 1038 ins, 255 del, 3295 sub ] exp/tri3b/decode_native/wer_14_1.0
# %WER 70.99 [ 5323 / 7498, 1185 ins, 301 del, 3837 sub ] exp/tri3b/decode_native.si/wer_16_1.0
# %WER 66.35 [ 6114 / 9215, 1186 ins, 421 del, 4507 sub ] exp/tri3b/decode_nonnative/wer_17_1.0
# %WER 76.36 [ 7037 / 9215, 1420 ins, 467 del, 5150 sub ] exp/tri3b/decode_nonnative.si/wer_16_1.0
# %WER 64.06 [ 10706 / 16713, 2245 ins, 657 del, 7804 sub ] exp/tri3b/decode_test/wer_15_1.0
# %WER 73.97 [ 12362 / 16713, 2608 ins, 766 del, 8988 sub ] exp/tri3b/decode_test.si/wer_16_1.0
# %WER 53.07 [ 4060 / 7650, 744 ins, 376 del, 2940 sub ] exp/chain/tdnn1e_sp/decode_devtest/wer_7_1.0
# %WER 54.47 [ 4084 / 7498, 536 ins, 475 del, 3073 sub ] exp/chain/tdnn1e_sp/decode_native/wer_7_1.0
# %WER 63.01 [ 5806 / 9215, 685 ins, 784 del, 4337 sub ] exp/chain/tdnn1e_sp/decode_nonnative/wer_7_1.0
# %WER 59.25 [ 9903 / 16713, 1226 ins, 1259 del, 7418 sub ] exp/chain/tdnn1e_sp/decode_test/wer_7_1.0

# chain model results:
# new results:
%WER 18.27 [ 1398 / 7650, 213 ins, 253 del, 932 sub ] exp/tri1/decode_devtest/wer_15_0.5
%WER 9.95 [ 746 / 7498, 74 ins, 108 del, 564 sub ] exp/tri1/decode_native/wer_13_0.5
%WER 16.63 [ 1532 / 9215, 197 ins, 183 del, 1152 sub ] exp/tri1/decode_nonnative/wer_17_0.0
%WER 13.68 [ 2287 / 16713, 207 ins, 360 del, 1720 sub ] exp/tri1/decode_test/wer_17_0.5
%WER 17.19 [ 1315 / 7650, 227 ins, 231 del, 857 sub ] exp/tri2b/decode_devtest/wer_17_0.5
%WER 9.23 [ 692 / 7498, 60 ins, 103 del, 529 sub ] exp/tri2b/decode_native/wer_16_0.5
%WER 17.16 [ 1581 / 9215, 184 ins, 216 del, 1181 sub ] exp/tri2b/decode_nonnative/wer_17_0.5
%WER 13.64 [ 2279 / 16713, 241 ins, 326 del, 1712 sub ] exp/tri2b/decode_test/wer_17_0.5
%WER 15.36 [ 1175 / 7650, 212 ins, 210 del, 753 sub ] exp/tri3b/decode_devtest/wer_17_0.5
%WER 20.27 [ 1551 / 7650, 269 ins, 257 del, 1025 sub ] exp/tri3b/decode_devtest.si/wer_14_1.0
%WER 6.40 [ 480 / 7498, 50 ins, 58 del, 372 sub ] exp/tri3b/decode_native/wer_16_0.0
%WER 10.91 [ 818 / 7498, 100 ins, 112 del, 606 sub ] exp/tri3b/decode_native.si/wer_16_1.0
%WER 14.30 [ 1318 / 9215, 206 ins, 134 del, 978 sub ] exp/tri3b/decode_nonnative/wer_17_0.0
%WER 21.62 [ 1992 / 9215, 286 ins, 224 del, 1482 sub ] exp/tri3b/decode_nonnative.si/wer_16_1.0
%WER 10.78 [ 1802 / 16713, 247 ins, 195 del, 1360 sub ] exp/tri3b/decode_test/wer_17_0.0
%WER 16.81 [ 2809 / 16713, 374 ins, 338 del, 2097 sub ] exp/tri3b/decode_test.si/wer_16_1.0

12.84 [ 982 / 7650, 187 ins, 165 del, 630 sub ] exp/chain/tdnn1b_sp/decode_devtest/wer_10_0.5
12.13 [ 1118 / 9215, 99 ins, 161 del, 858 sub ] exp/chain/tdnn1b_sp/decode_nonnative/wer_13_0.0
9.47 [ 1582 / 16713, 119 ins, 254 del, 1209 sub ] exp/chain/tdnn1b_sp/decode_test/wer_11_0.5
5.87 [ 440 / 7498, 28 ins, 79 del, 333 sub ] exp/chain/tdnn1b_sp/decode_native/wer_11_0.5
# chain model results:
# for dir in $(echo exp/chain/tdnn1b_sp/decode* | grep -v 'si/'); do grep WER $dir/wer* | utils/best_wer.sh; done
%WER 12.99 [ 994 / 7650, 192 ins, 163 del, 639 sub ] exp/chain/tdnn1b_sp/decode_devtest/wer_10_1.0
%WER 12.47 [ 1149 / 9215, 119 ins, 174 del, 856 sub ] exp/chain/tdnn1b_sp/decode_nonnative/wer_12_0.0
%WER 9.64 [ 1611 / 16713, 169 ins, 240 del, 1202 sub ] exp/chain/tdnn1b_sp/decode_test/wer_12_0.0
%WER 6.13 [ 460 / 7498, 52 ins, 55 del, 353 sub ] exp/chain/tdnn1b_sp/decode_native/wer_10_0.0
1 change: 0 additions & 1 deletion egs/heroico/s5/local/prepare_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ set -e
set -o pipefail

tmpdir=data/local/tmp
datadir=$(pwd)/LDC2006S37

# acoustic models are trained on the heroico corpus
# testing is done on the usma corpus
Expand Down
12 changes: 2 additions & 10 deletions egs/heroico/s5/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,11 @@ set -e
set -o pipefail
set -u

# the location of the LDC corpus; this location works for the CLSP grid.
#datadir=/export/corpora5/LDC/LDC2006S37
# The corpus and lexicon are on openslr.org
speech="https://www.openslr.org/resources/39/LDC2006S37.tar.gz"
lexicon="https://www.openslr.org/resources/34/santiago.tar.gz"

# Location of the Movie subtitles text corpus
subs_src="https://opus.lingfil.uu.se/download.php?f=OpenSubtitles2018/en-es.txt.zip"

# don't change tmpdir, the location is used explicitly in scripts in local/.
tmpdir=data/local/tmp

if [ $stage -le -1 ]; then
if [ $stage -le 0 ]; then
# download the corpus from openslr
local/heroico_download.sh $speech $lexicon
# Get data for lm training
Expand All @@ -42,7 +34,7 @@ fi

if [ $stage -le 1 ]; then
echo "Makin lists for building models."
local/prepare_data.sh
local/prepare_data.sh $datadir
fi

if [ $stage -le 2 ]; then
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.