-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
resolved conflicts for #2722 #2725
Merged
Merged
Changes from 250 commits
Commits
Show all changes
330 commits
Select commit
Hold shift + click to select a range
365beff
changed name of README file
johnjosephmorgan 86c31ea
deleted WERs from README
johnjosephmorgan aa40cae
removed cuda_cmd
johnjosephmorgan 785681b
moved dictionary download to run.sh
johnjosephmorgan 1e3c10c
moved dictionary download to run.sh
johnjosephmorgan ab5e009
using ./utils/format_lm.sh
johnjosephmorgan 315c0ba
fixed indentation
johnjosephmorgan bdc27a0
deleted redundant ivector step that is already done in chain model sc…
johnjosephmorgan f11fbcf
using directory data/local/lm instead of language_models to keep ever…
johnjosephmorgan 1931b23
removed hostname command since the -d option is not compatible with bsd
johnjosephmorgan be1276e
put mini batch chun sizes back to 256,128,64 and set initial and fina…
ca7b814
changed --trainer.num-chunk-per-minibatch=256,128,64 to --tra…
johnjosephmorgan 2424eba
put mini batch chun sizes back to 256,128,64 and set initial and fina…
johnjosephmorgan 95e8b8d
merging
9d728b5
merge
f88b5ae
merging
84a0566
merging
775ea4a
merging
65cb7b2
removed _tgsmall suffix and exiting after stage 15
johnjosephmorgan 2ffc077
moved monophone decoding to after training and using data/lang_test i…
johnjosephmorgan b2a36dd
delete lm rescoring step
johnjosephmorgan d407efe
removed _tgsmall suffix and removed decoding with lm rescoring
johnjosephmorgan be3d0b9
moved evaluation stages to immediately after training stages
johnjosephmorgan 0ff6ff4
changing mini librispeech hard coded location to LDC Heroico corpus
johnjosephmorgan 33c4e5e
merging
1ab3cdc
merging
689fbcc
added tests using gp lm
johnjosephmorgan ff71212
merging
johnjosephmorgan 557a638
fixing references to new lm in format command
johnjosephmorgan c7a4efc
Spanish lm is named SP not ES
johnjosephmorgan f87bbd1
merge
28ad257
added simple and gplm to testing
johnjosephmorgan 41413b5
merge
4e810e9
changed nj from 30 to 20
johnjosephmorgan 595ce4c
added scripts to build lm with the subs corpus
johnjosephmorgan f54b927
included lm building steps with subs corpus
johnjosephmorgan 1f1c444
got rid of the GlobalPhone LM. It probably was trained on text in a d…
johnjosephmorgan 6ca1cfb
merge
39fec5b
consolidated subs processing in 1 script
johnjosephmorgan 2b45c0e
deleted subs scripts that were consolidated into local/subs_restrict_…
johnjosephmorgan f4b8c93
updated processing of subs data for lm
johnjosephmorgan 9b322dc
moved data prep to script under local
johnjosephmorgan 177026b
moved subs lm prep to prepare lm script
johnjosephmorgan 5028b5f
deleted multiple lm code
johnjosephmorgan 6b1d7c0
deleted subs prep script
johnjosephmorgan 35263af
moved to subs trained lm
johnjosephmorgan f692ea2
fixed paths to directories and files
johnjosephmorgan c540add
simplified names of files
johnjosephmorgan cebb778
minor fixes
johnjosephmorgan 2e46612
minor stage changes
johnjosephmorgan 1164dce
added lower casing
johnjosephmorgan f4b23f7
merge
3c6698e
condition text remove punctuation down case
johnjosephmorgan d23bba7
process with conditioned text and in vocabulary data
johnjosephmorgan 541341e
update to lm preparation scripts
johnjosephmorgan 1a86d9f
merge
30e527e
updated WER scores
johnjosephmorgan eb01599
changed hard coded clsp disk names
johnjosephmorgan f9d7a08
merge
4e5be51
added WER scores obtained on CLSP cluster
johnjosephmorgan 82bf0f3
Changed reference to mini_librispeech to heroico. This probably shoul…
johnjosephmorgan c4566b0
more fixes to clsp cluster case, trying to mimic mini_librispeech setup
johnjosephmorgan d9759ae
Changed data splitting from mini_librispeech setup to heroico.
johnjosephmorgan 8bee95a
Removed special data splitting on clsp cluster.
johnjosephmorgan 8694f3e
merge
4d5760b
white space instead of tabs
johnjosephmorgan e05a62a
small tweaks for running on the CLSP cluster
jtrmal b981622
training text is now an input variable
johnjosephmorgan 31051ca
training lm on acoustic model text instead of subs
johnjosephmorgan 3477db1
merge
09d03e6
merge
d39a0ba
adding subs back in
johnjosephmorgan e4dba94
fixing argument formatting
johnjosephmorgan 093320b
added arg 0 to echo statements
johnjosephmorgan b59d91a
adressing overfitting comment, number of leave, proportional shrinkin…
johnjosephmorgan dcb7089
merge
4019ab5
merge
947fcb3
separated out the recordings with overlapping prompts and put it in a…
johnjosephmorgan d2699c8
made a devtest set with recordings that had overlapping prompts
johnjosephmorgan 1ca3490
added devtest
johnjosephmorgan 38ae387
made devtest with recordings that had prompts that overlapped with US…
johnjosephmorgan 7507b79
added devtest
johnjosephmorgan fedc016
merge
2aef442
going back a couple of steps
johnjosephmorgan dd039b9
added the first script to the chain tuning
johnjosephmorgan 8cf0afe
linked the first chain script to the 1a script under the tuning direc…
johnjosephmorgan 927b0e3
changed affix variable to 1a from 1c and added devtest to test set
johnjosephmorgan 35b1c71
added a 1b script for tuning the number of leaves in the tree
johnjosephmorgan a1c9428
lowering layer dimension from 512 to 400
johnjosephmorgan b74d052
replaced the old WER scores with the new scores
johnjosephmorgan 808051e
inserted info and WERs
johnjosephmorgan 967f4d8
working on corpus description
johnjosephmorgan d7a8cc7
linked to 1b
johnjosephmorgan c8df419
merge
43b1891
merge
eac5181
put layer dimension back to 512, number of leaves down to 2000 from 3…
johnjosephmorgan 3589bd2
removes proportional shrink
johnjosephmorgan acfe5af
merge
23e2729
made a variable for number of leaves
johnjosephmorgan 9cc9cdf
merge
69d3967
merge
21a329c
experiment 1b changes the number of leaves from 3500 to 2000
johnjosephmorgan d64b40a
inserted info and WERS
johnjosephmorgan 45ae460
comment out table
johnjosephmorgan 843bcfd
commented out table
johnjosephmorgan 42585b2
experiment 1c removes proportional shrinking
johnjosephmorgan 7d6cef7
merge
5b01124
changed affix to 1b from 1a
johnjosephmorgan af0bec0
added WERS and info
johnjosephmorgan 508e52c
1c sets number of leaves to 2500
johnjosephmorgan cf2617b
merge
8c71937
remove proportional shrink option
johnjosephmorgan 5d4b4de
added WERs and info
johnjosephmorgan 311257f
merge
21ad50d
merge
e7c120d
added l2 regularization opts
johnjosephmorgan d7eec95
modified layer definitions
johnjosephmorgan 51d810a
adding the script to compare WERs from mini_librispeech adapted to t…
johnjosephmorgan 4df2d76
merge
johnjosephmorgan 4c1c04c
update with WERs and info
johnjosephmorgan b826f00
lower epochs to 7 from 10
johnjosephmorgan d5d596e
setting to heroico test sets
53ae873
printing my test sets
johnjosephmorgan 2691e18
printing heroico test sets
johnjosephmorgan 329cfda
inserted WERs info and comparison to 1d
johnjosephmorgan fccb4e1
minor deletion
johnjosephmorgan 62dd291
lowering epochs again
johnjosephmorgan aeb0c3b
added results
johnjosephmorgan bf9fce4
set epochs to 5 instead of 4
johnjosephmorgan 25ab21e
update info and results
johnjosephmorgan 7d005f7
tested with layer dimension set to 256
johnjosephmorgan c3bef6c
tested with layer dimensions set to 128
johnjosephmorgan c88a123
symlink to best tuning run 1e
johnjosephmorgan ddbdb04
removed a misplaced script
johnjosephmorgan 1ed5ae3
Delete apache license file.
johnjosephmorgan 795d31c
Deleted non-best tuning scripts.
johnjosephmorgan 0d62906
deleted link to current best tuning script.
johnjosephmorgan 1227161
Linked run tdnn script to new best tuning run.
johnjosephmorgan a134fbe
Configuration files for pitch and plp feature extraction.
johnjosephmorgan 329a51b
Format fixing.
johnjosephmorgan 5da92ab
Format fixing.
johnjosephmorgan d003c61
Updating run script.
johnjosephmorgan d7b698a
Fixing run script.
johnjosephmorgan 9082c69
Changed lang_test to lang.
johnjosephmorgan db08d28
mfcc to plp_pitch.
johnjosephmorgan 8426246
Forgot to delete the mfcc_.
johnjosephmorgan e4350b1
No ivectors no hires no sp.
johnjosephmorgan 013aaa2
Removed a dires directory.
johnjosephmorgan 63bdf7f
Fixed alli dir.
johnjosephmorgan 2f5e70b
Deleted affix.
johnjosephmorgan eb5c86b
Fixing tree step.
johnjosephmorgan a8f2b92
Fixed typo.
johnjosephmorgan 758e69b
No ivector directory.
johnjosephmorgan 65506ce
Fixing directories.
johnjosephmorgan 83adc0c
Removing ivector layer.
johnjosephmorgan 14a8ac6
Removed ivector feature option from training command.
johnjosephmorgan aac8519
Input dimension is 16 not 40.
johnjosephmorgan 1ceffa4
Cleaning up.
johnjosephmorgan 292fd00
Proportional shrink.
johnjosephmorgan 2fac95a
Commented out lines with results.
johnjosephmorgan 575a639
Added script to download subs corpus.
johnjosephmorgan f98f1d2
Fixing subs data prep.
johnjosephmorgan 56a6281
Fixed subs data prep.
johnjosephmorgan 494391e
Minor stage shifts.
johnjosephmorgan 718d881
Added script to download heroico resources.
johnjosephmorgan 1f1df35
Putting downloads in scripts under local.
johnjosephmorgan 3470873
Hardwire paths.
johnjosephmorgan 7812faa
Make download directory here.
johnjosephmorgan fc9e879
Not making download directory in run.sh script.
johnjosephmorgan 746ed7e
Changed permissions.
johnjosephmorgan 04cdbd3
Fixed data directory path variable.
johnjosephmorgan 5856c84
Fixing data directory paths.
johnjosephmorgan 976dc24
fix_data_dir.sh not fix_datadir.sh.
johnjosephmorgan 83f0029
Stage end problem fixed.
johnjosephmorgan 7b1c21e
mfcc not plp_pitch.
johnjosephmorgan 3fdccf6
Removed a nasty bug. An extra s on the config directory.
johnjosephmorgan 71abd90
Changed lang_test to just lang.
johnjosephmorgan b037b80
Only one final blank.
johnjosephmorgan 0563a7a
Deleted ./.
johnjosephmorgan 31bf353
Updated results with Dan Poveys command.
johnjosephmorgan d490178
I copied the cmd.sh from mini_librispeech.
johnjosephmorgan 1e3ecc9
cmd.sh from mini_librispeech
johnjosephmorgan ebe34ae
Reconciling Dan Poveys changes with the subs lm scripts.
johnjosephmorgan a123db9
Minor fixes.
johnjosephmorgan 38e8bca
Fixed paths.
johnjosephmorgan 0d885f3
Reconciling with Dan Poveys changes.
johnjosephmorgan 935d76a
Reconciling with Dan Poveys changes.
johnjosephmorgan 524bcd8
Reconciling with Dan Poveys changes.
johnjosephmorgan 9ea95d8
Fixing paths.
johnjosephmorgan 69c2879
Updated WERs.
johnjosephmorgan 64abf9f
Fixed symtab variable. Why did this not get committed before?
johnjosephmorgan c4e1c4e
I deleted words incorrectly.
johnjosephmorgan 53a5017
Deleted line that removes temporary work.
johnjosephmorgan bf7c645
lang_test instead of lang.
johnjosephmorgan 5820469
Including mono and chain results.
johnjosephmorgan 08c77da
Download in pwd so we do not keep downloading after removing data dir…
johnjosephmorgan ccc4da6
Download to pwd.
johnjosephmorgan 5a2a60c
Data is in pwd.
johnjosephmorgan 9abf709
Downloading is done in stage -1.
johnjosephmorgan 1ae0384
Downloaded dictionary is in pwd.
johnjosephmorgan 76f445d
Use in_vocabulary.txt instead of es.txt to train lm.
johnjosephmorgan 839e199
updating.
johnjosephmorgan 81a61bb
Addressed conflicts.
johnjosephmorgan aff7501
cnn tdnn chain model script.
johnjosephmorgan 1862797
Inserted nj variable.
johnjosephmorgan cacf3cd
Added thread variable.
johnjosephmorgan 9e07bfa
Recognize that speakers are common across recordings and answers in m…
johnjosephmorgan ce24c6e
Download subs first .
johnjosephmorgan 7d31d4e
Moved open into else clause.
johnjosephmorgan 6759c9f
Removed varaibles for lexicon, corpus and subs.
johnjosephmorgan 01bdbbc
Replaced sed with tr.
johnjosephmorgan 7b7340b
Use more readable variable names.
johnjosephmorgan 70b19b9
Minor changes.
johnjosephmorgan bdd0880
Removed plp and pitch config files.
johnjosephmorgan df3fda4
Removed chain cnn tdnn script.
johnjosephmorgan dac4362
Merge branch 'master' into jjm_kaldi
johnjosephmorgan 5271745
Put the speech and lexicon data source variable in run.sh so people k…
johnjosephmorgan 5e7a92b
Put the subs text data source variable in run.sh so people know wher…
johnjosephmorgan 52c6f37
Make the data source variables arguments to the download scripts unde…
johnjosephmorgan 31ff767
Merge branch 'master' into jjm_kaldi
johnjosephmorgan 82062a5
Update triphone results after training lm on subs corpus.
johnjosephmorgan 1812c1b
Added chain model results after training lm with subs.
johnjosephmorgan 66c6808
Added latest chain model results.
johnjosephmorgan 64aea9a
Make corpus directory an argument to data prep script.
johnjosephmorgan fab4432
Adding grammar decoding.
johnjosephmorgan f1d2eee
Added recipe for Tunisian Modern Standard Arabic corpus.
johnjosephmorgan 4d6b615
Removing grammar directory.
johnjosephmorgan bd6f86d
Cleaning up.
johnjosephmorgan 3705077
Fixed affix.
johnjosephmorgan b1152ed
Added requirements.
johnjosephmorgan d2e55c9
Merge.
johnjosephmorgan f2ac53c
resolved conflicts
xiaohui-zhang f01cf69
mfcc instead of plp features.
johnjosephmorgan fe7d648
mfcc instead of plp features.
johnjosephmorgan de2add2
Merge branch 'jjm_kaldi' of https://github.com/johnjosephmorgan/kaldi…
xiaohui-zhang c553b88
Remove header from qcri lexicon.
johnjosephmorgan ae683ff
Use a python script instead of a perl module to convert buckwalter to…
johnjosephmorgan 9c73132
Adding a python script from Andy Roberts that converts buckwalter enc…
johnjosephmorgan 3e92297
Use python script and bash shell script to convert qcri lexicon from …
johnjosephmorgan fb42057
Remove the exit.
johnjosephmorgan aac91a8
Skip initial blank in non silence list.
johnjosephmorgan e21f930
If the lexicon gets downloaded, unzip it and remove the header.
johnjosephmorgan a067e01
Merge branch 'jjm_kaldi' of https://github.com/johnjosephmorgan/kaldi…
xiaohui-zhang c86008e
Fix cut command to extend to end of line with -f 2- option.
johnjosephmorgan 8043e34
Inserted a case for MACOSX.
johnjosephmorgan f4bddba
adding results
xiaohui-zhang 30f115f
Merge branch 'master' of https://github.com/kaldi-asr/kaldi into tmp0919
xiaohui-zhang 47be46d
minor fix
xiaohui-zhang 85e633d
Change queue.pl to pbs.pl in error messages.
johnjosephmorgan 3aab2bf
Merge branch 'jjm_kaldi' of https://github.com/johnjosephmorgan/kaldi…
xiaohui-zhang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
A Kaldi recipe for Arabic using the Tunisian_MSA corpus. | ||
|
||
Extra Requirements: | ||
This recipe uses the QCRI lexicon which uses the Buckwalter encoding. | ||
In order to convert the Buckwalter to utf-8, the Encode::Arabic::Buckwalter perl module is required. | ||
On ubuntu install the package: libencode-arabic-perl. | ||
On Mac OSX use cpanm (cpanminus) to install the perl module. | ||
|
||
Description of the Tunisian_MSA Corpus | ||
The Tunisian_MSA corpus was originally collected to train acoustic models for pronunciation modeling in Arabic language learning applications. | ||
The data collection took place near Tunis the capital of the Republic of Tunisia in 2003 at the Military Academy of Fondouk Jedied . | ||
The Tunisian_MSA corpus is divided into recited and prompted speech subcorpora. | ||
The recited speech appears under the recordings directory and the prompted speech under the answers directory. | ||
Each of the 118 informants contributed to both subcorpora by reciting sentences and providing answers to prompted questions. | ||
The Tunisian_MSA corpus has 11.2 hours of speech. | ||
|
||
With the exception of speech from two speakers , all the corpus was used for training. | ||
|
||
A small corpus was collected for testing. | ||
|
||
A pronunciation dictionary is also available from openslrm.org. | ||
It covers all the words uttered in the Tunisian_MSA corpus and the test corpus. | ||
The QCRI lexicon was used as a starting point for writing this lexicon. | ||
The phones are the same as those used in the QCRI lexicon. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# you can change cmd.sh depending on what type of queue you are using. | ||
# If you have no queueing system and want to run on a local machine, you | ||
# can change all instances 'queue.pl' to run.pl (but be careful and run | ||
# commands one by one: most recipes will exhaust the memory on your | ||
# machine). queue.pl works with GridEngine (qsub). slurm.pl works | ||
# with slurm. Different queues are configured differently, with different | ||
# queue names and different ways of specifying things like memory; | ||
# to account for these differences you can create and edit the file | ||
# conf/queue.conf to match your queue's configuration. Search for | ||
# conf/queue.conf in https://kaldi-asr.org/doc/queue.html for more information, | ||
# or search for the string 'default_config' in utils/queue.pl or utils/slurm.pl. | ||
|
||
export train_cmd="queue.pl --mem 2G" | ||
export decode_cmd="queue.pl --mem 4G" | ||
export mkgraph_cmd="queue.pl --mem 8G" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--use-energy=false # only non-default option. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# config for high-resolution MFCC features, intended for neural network training | ||
# Note: we keep all cepstra, so it has the same info as filterbank features, | ||
# but MFCC is more easily compressible (because less correlated) which is why | ||
# we prefer this method. | ||
--use-energy=false # use average of log energy, not energy. | ||
--num-mel-bins=40 # similar to Google's setup. | ||
--num-ceps=40 # there is no dimensionality reduction. | ||
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so | ||
# there might be some information at the low end. | ||
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# configuration file for apply-cmvn-online, used in the script ../local/run_online_decoding.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
#!/usr/bin/env perl | ||
|
||
# Copyright 2018 John Morgan | ||
# Apache 2.0. | ||
|
||
# answers_make_lists.pl - make acoustic model training lists | ||
|
||
use strict; | ||
use warnings; | ||
use Carp; | ||
|
||
use File::Spec; | ||
use File::Copy; | ||
use File::Basename; | ||
|
||
my $tmpdir = 'data/local/tmp/tunis'; | ||
|
||
system "mkdir -p $tmpdir/answers"; | ||
|
||
# input wav file list | ||
my $wav_list = "$tmpdir/answers_wav.txt"; | ||
|
||
# output temporary wav.scp files | ||
my $wav_scp = "$tmpdir/answers/wav.scp"; | ||
|
||
# output temporary utt2spk files | ||
my $u = "$tmpdir/answers/utt2spk"; | ||
|
||
# output temporary text files | ||
my $t = "$tmpdir/answers/text"; | ||
|
||
# initialize hash for prompts | ||
my %prompt = (); | ||
|
||
# store prompts in hash | ||
LINEA: while ( my $line = <> ) { | ||
chomp $line; | ||
my ($num,$sent) = split /\t/sxm, $line, 2; | ||
|
||
my ($machine,$s,$mode,$language,$i) = split /\_/sxm, $num; | ||
# the utterance name | ||
my $utt = $machine . '_' . $s . '_' . 'a' . '_' . $i; | ||
$prompt{$utt} = $sent; | ||
} | ||
|
||
# Write wav.scp, utt2spk and text files. | ||
open my $W, '<', $wav_list or croak "problem with $wav_list $!"; | ||
open my $O, '+>', $wav_scp or croak "problem with $wav_scp $!"; | ||
open my $U, '+>', $u or croak "problem with $u"; | ||
open my $T, '+>', $t or croak "problem with $t"; | ||
|
||
LINE: while ( my $line = <$W> ) { | ||
chomp $line; | ||
next LINE if ( $line !~ /Answers/sxm ); | ||
next LINE if ( $line =~ /Recordings/sxm ); | ||
my ($volume,$directories,$file) = File::Spec->splitpath( $line ); | ||
my @dirs = split /\//sxm, $directories; | ||
my $r = basename $line, '.wav'; | ||
my $machine = $dirs[-3]; | ||
my $s = $dirs[-1]; | ||
my $rid = $machine . '_' . $s . '_' . 'a' . '_' . $r; | ||
if ( exists $prompt{$rid} ) { | ||
print ${T} "$rid\t$prompt{$rid}\n" or croak; | ||
} elsif ( defined $rid ) { | ||
print STDERR "problem\t$rid" or croak; | ||
next LINE; | ||
} else { | ||
croak "$line"; | ||
} | ||
|
||
print ${O} "$rid sox $line -t wav - |\n" or croak; | ||
print ${U} "$rid ${machine}_${s}_a\n" or croak; | ||
} | ||
close $U or croak; | ||
close $T or croak; | ||
close $W or croak; | ||
close $O or croak; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/usr/bin/env perl | ||
# Input buckwalter encoded Arabic and print it out as utf-8 encoded Arabic. | ||
use strict; | ||
use warnings; | ||
use Carp; | ||
|
||
use Encode::Arabic::Buckwalter; # imports just like 'use Encode' would, plus more | ||
|
||
while ( my $line = <>) { | ||
print encode 'utf8', decode 'buckwalter', $line; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/run_tdnn_1a.sh |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be removed too.