Skip to content

Latest commit

 

History

History

Wav2Letter

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Wav2Letter Speech Recognition with oneflow

Implementation of Wav2Letter (a speech recognition model from Facebooks AI Research (FAIR)) with Oneflow.

Requirements

pip install -r requirements.txt

Data

We train and evaluate our models on Google Speech Command Dataset. This is a simple to use lightweight dataset for testing model performance.

Data Preprocess

data.py contains scripts to process google speech command audio data into features compatible with Wav2Letter.

This will process the google speech commands audio data into 13 mfcc features with a max framelength of 250 (these are short audio clips). Anything less will be padded with zeros. Target data will be integer encoded and also padded to have the same length. Final outputs are numpy arrays saved as x.npy and y.npy in the ./speech_data directory.

Train

bash train.sh

Infer

bash infer.sh

Wer

oneflow 0.3031 pytorch 0.3099