Skip to content

Commit

Permalink
save_path
Browse files Browse the repository at this point in the history
  • Loading branch information
experimenti committed Oct 8, 2018
1 parent 840756b commit 991cd66
Show file tree
Hide file tree
Showing 1,094 changed files with 2,958 additions and 11 deletions.
3 changes: 3 additions & 0 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"python.pythonPath": "/home/john/anaconda3/envs/ds/bin/python"
}
38 changes: 38 additions & 0 deletions data/an4/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
AN4 License Terms

This audio database is free for use for any purpose (commercial or otherwise) subject to the restrictions detailed below.

/* ====================================================================
* Copyright (c) 1991-2005 Carnegie Mellon University. All rights
* reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided with the
* distribution.
*
* This work was supported in part by funding from the Defense Advanced
* Research Projects Agency and the National Science Foundation of the
* United States of America, and the CMU Sphinx Speech Consortium.
*
* THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND
* ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
* THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
* PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY
* NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
* LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
* DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
* THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
* (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
* OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*
* ====================================================================
*/
20 changes: 20 additions & 0 deletions data/an4/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
This directory contains the Census (AN4) database audio files. Some
files from the original database were excluded, namely those
with filenames starting with "cen9".

The AN4 database was recorded at Carnegie Mellon University circa
1991. For more detailes, please see "Acoustical and environmental
robustness in automatic speech recognition", by Alex Acero, published
by Kluwer Academic Publishers, 1993.

The files are in raw PCM format, sampled at 16kHz, in little endian
byte order.

The directories contain:

-wav/an4_clstk: training data set recorded on close talking microphone.

-wav/an4test_clstk: test data set recorded on close talking microphone.

-etc: directory containing the transcriptions, control files, dictionary etc.

130 changes: 130 additions & 0 deletions data/an4/etc/an4.dic
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
A AH
A(2) EY
AND AE N D
AND(2) AH N D
APOSTROPHE AH P AA S T R AH F IY
APRIL EY P R AH L
AREA EH R IY AH
AUGUST AA G AH S T
AUGUST(2) AO G AH S T
B B IY
C S IY
CODE K OW D
D D IY
DECEMBER D IH S EH M B ER
E IY
EIGHT EY T
EIGHTEEN EY T IY N
EIGHTEENTH EY T IY N TH
EIGHTH EY T TH
EIGHTH(2) EY TH
EIGHTY EY T IY
ELEVEN IH L EH V AH N
ELEVEN(2) IY L EH V AH N
ELEVENTH IH L EH V AH N TH
ELEVENTH(2) IY L EH V AH N TH
ENTER EH N ER
ENTER(2) EH N T ER
ERASE IH R EY S
ERASE(2) IY R EY S
F EH F
FEBRUARY F EH B AH W EH R IY
FEBRUARY(2) F EH B R UW W EH R IY
FEBRUARY(3) F EH B UW W EH R IY
FEBRUARY(4) F EH B Y AH W EH R IY
FEBRUARY(5) F EH B Y UW W EH R IY
FIFTEEN F IH F T IY N
FIFTEENTH F IH F T IY N TH
FIFTH F IH F TH
FIFTH(2) F IH TH
FIFTY F IH F T IY
FIRST F ER S T
FIVE F AY V
FORTY F AO R T IY
FOUR F AO R
FOURTEEN F AO R T IY N
FOURTH F AO R TH
G JH IY
GO G OW
H EY CH
HALF HH AE F
HELP HH EH L P
HUNDRED HH AH N D ER D
HUNDRED(2) HH AH N D R AH D
HUNDRED(3) HH AH N D R IH D
HUNDRED(4) HH AH N ER D
I AY
J JH EY
JANUARY JH AE N Y UW EH R IY
JULY JH AH L AY
JULY(2) JH UW L AY
JUNE JH UW N
K K EY
L EH L
M EH M
MARCH M AA R CH
MAY M EY
N EH N
NINE N AY N
NINETEEN N AY N T IY N
NINETY N AY N T IY
NINTH N AY N TH
NO N OW
NOVEMBER N OW V EH M B ER
O OW
OCTOBER AA K T OW B ER
OF AH V
OH OW
ONE HH W AH N
ONE(2) W AH N
P P IY
Q K Y UW
R AA R
REPEAT R IH P IY T
REPEAT(2) R IY P IY T
RUBOUT R AH B AW T
S EH S
SECOND S EH K AH N
SECOND(2) S EH K AH N D
SEPTEMBER S EH P T EH M B ER
SEVEN S EH V AH N
SEVENTEEN S EH V AH N T IY N
SEVENTH S EH V AH N TH
SEVENTY S EH V AH N IY
SEVENTY(2) S EH V AH N T IY
SIX S IH K S
SIXTEEN S IH K S T IY N
SIXTEENTH S IH K S T IY N TH
SIXTH S IH K S TH
SIXTY S IH K S T IY
START S T AA R T
STOP S T AA P
T T IY
TEN T EH N
THIRD TH ER D
THIRTEEN TH ER T IY N
THIRTIETH TH ER T IY AH TH
THIRTIETH(2) TH ER T IY IH TH
THIRTY TH ER D IY
THIRTY(2) TH ER T IY
THOUSAND TH AW Z AH N
THOUSAND(2) TH AW Z AH N D
THREE TH R IY
TWELFTH T W EH L F TH
TWELVE T W EH L V
TWENTIETH T W EH N IY AH TH
TWENTIETH(2) T W EH N IY IH TH
TWENTIETH(3) T W EH N T IY AH TH
TWENTIETH(4) T W EH N T IY IH TH
TWENTY T W EH N IY
TWENTY(2) T W EH N T IY
TWO T UW
U Y UW
V V IY
W D AH B AH L Y UW
X EH K S
Y W AY
YES Y EH S
Z Z IY
ZERO Z IH R OW
ZERO(2) Z IY R OW
3 changes: 3 additions & 0 deletions data/an4/etc/an4.filler
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<s> SIL
</s> SIL
<sil> SIL
34 changes: 34 additions & 0 deletions data/an4/etc/an4.phone
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
AA
AE
AH
AO
AW
AY
B
CH
D
EH
ER
EY
F
G
HH
IH
IY
JH
K
L
M
N
OW
P
R
S
SIL
T
TH
UW
V
W
Y
Z
145 changes: 145 additions & 0 deletions data/an4/etc/an4.ug.lm
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
#############################################################################
This is a 2-gram language model, based on a vocabulary of 13 words,
which begins "<s>", "</s>", "oh"...
This is an OPEN-vocabulary model (type 1)
(OOVs were mapped to UNK, which is treated as any other vocabulary word)
This file is in the ARPA-standard format introduced by Doug Paul.

p(wd3|wd1,wd2)= if(trigram exists) p_3(wd1,wd2,wd3)
else if(bigram w1,w2 exists) bo_wt_2(w1,w2)*p(wd3|wd2)
else p(wd3|w2)

p(wd2|wd1)= if(bigram exists) p_2(wd1,wd2)
else bo_wt_1(wd1)*p_1(wd2)

All probs and back-off weights (bo_wt) are given in log10 form.

Data formats:

Beginning of data mark: \data\
ngram 1=nr # number of 1-grams
ngram 2=nr # number of 2-grams

\1-grams:
p_1 wd_1 bo_wt_1
\2-grams:
p_2 wd_1 wd_2

end of data mark: \end\

\data\
ngram 1=107
ngram 2=1

\1-grams:
-2.0253 <UNK> 0.0000
-2.0253 </s> -99.0000
-99.0000 <s> 0.0000
-2.0253 A 0.0000
-2.0253 AND 0.0000
-2.0253 APOSTROPHE 0.0000
-2.0253 APRIL 0.0000
-2.0253 AREA 0.0000
-2.0253 AUGUST 0.0000
-2.0253 B 0.0000
-2.0253 C 0.0000
-2.0253 CODE 0.0000
-2.0253 D 0.0000
-2.0253 DECEMBER 0.0000
-2.0253 E 0.0000
-2.0253 EIGHT 0.0000
-2.0253 EIGHTEEN 0.0000
-2.0253 EIGHTEENTH 0.0000
-2.0253 EIGHTH 0.0000
-2.0253 EIGHTY 0.0000
-2.0253 ELEVEN 0.0000
-2.0253 ELEVENTH 0.0000
-2.0253 ENTER 0.0000
-2.0253 ERASE 0.0000
-2.0253 F 0.0000
-2.0253 FEBRUARY 0.0000
-2.0253 FIFTEEN 0.0000
-2.0253 FIFTEENTH 0.0000
-2.0253 FIFTH 0.0000
-2.0253 FIFTY 0.0000
-2.0253 FIRST 0.0000
-2.0253 FIVE 0.0000
-2.0253 FORTY 0.0000
-2.0253 FOUR 0.0000
-2.0253 FOURTEEN 0.0000
-2.0253 FOURTH 0.0000
-2.0253 G 0.0000
-2.0253 GO 0.0000
-2.0253 H 0.0000
-2.0253 HALF 0.0000
-2.0253 HALL 0.0000
-2.0253 HELP 0.0000
-2.0253 HUNDRED 0.0000
-2.0253 I 0.0000
-2.0253 J 0.0000
-2.0253 JANUARY 0.0000
-2.0253 JULY 0.0000
-2.0253 JUNE 0.0000
-2.0253 K 0.0000
-2.0253 L 0.0000
-2.0253 LANE 0.0000
-2.0253 M 0.0000
-2.0253 MARCH 0.0000
-2.0253 MAY 0.0000
-2.0253 MEMORY 0.0000
-2.0253 N 0.0000
-2.0253 NINE 0.0000
-2.0253 NINETEEN 0.0000
-2.0253 NINETY 0.0000
-2.0253 NINTH 0.0000
-2.0253 NO 0.0000
-2.0253 O 0.0000
-2.0253 OCTOBER 0.0000
-2.0253 OF 0.0000
-2.0253 OH 0.0000
-2.0253 ONE 0.0000
-2.0253 P 0.0000
-2.0253 Q 0.0000
-2.0253 R 0.0000
-2.0253 REPEAT 0.0000
-2.0253 RUBOUT 0.0000
-2.0253 S 0.0000
-2.0253 SECOND 0.0000
-2.0253 SEPTEMBER 0.0000
-2.0253 SEVEN 0.0000
-2.0253 SEVENTEEN 0.0000
-2.0253 SEVENTH 0.0000
-2.0253 SEVENTY 0.0000
-2.0253 SIX 0.0000
-2.0253 SIXTEEN 0.0000
-2.0253 SIXTEENTH 0.0000
-2.0253 SIXTH 0.0000
-2.0253 SIXTY 0.0000
-2.0253 START 0.0000
-2.0253 STOP 0.0000
-2.0253 T 0.0000
-2.0253 TEN 0.0000
-2.0253 THIRD 0.0000
-2.0253 THIRTIETH 0.0000
-2.0253 THIRTY 0.0000
-2.0253 THOUSAND 0.0000
-2.0253 THREE 0.0000
-2.0253 TWELFTH 0.0000
-2.0253 TWELVE 0.0000
-2.0253 TWELVTH 0.0000
-2.0253 TWENTIETH 0.0000
-2.0253 TWENTY 0.0000
-2.0253 TWO 0.0000
-2.0253 U 0.0000
-2.0253 V 0.0000
-2.0253 W 0.0000
-2.0253 WEAN 0.0000
-2.0253 X 0.0000
-2.0253 Y 0.0000
-2.0253 YES 0.0000
-2.0253 Z 0.0000
-2.0253 ZERO 0.0000

\2-grams:
0.0000 <s> </s>
\end\
Binary file added data/an4/etc/an4.ug.lm.DMP
Binary file not shown.
Loading

0 comments on commit 991cd66

Please sign in to comment.