Skip to content

Commit

Permalink
Added instructions to tune LMs, added script to select params
Browse files Browse the repository at this point in the history
  • Loading branch information
sean.narenthiran committed Aug 1, 2019
1 parent b8e34cc commit 62bd824
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 65 deletions.
104 changes: 40 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,69 +52,11 @@ Finally clone this repo and run this within the repo:
pip install -r requirements.txt
```

## Usage
## Training

### Datasets

Currently supports AN4, TEDLIUM, Voxforge and LibriSpeech. Scripts will setup the dataset and create manifest files used in dataloading.

#### AN4

To download and setup the an4 dataset run below command in the root folder of the repo:

```
cd data; python an4.py
```

#### TEDLIUM

You have the option to download the raw dataset file manually or through the script (which will cache it).
The file is found [here](http:https://www.openslr.org/resources/19/TEDLIUM_release2.tar.gz).

To download and setup the TEDLIUM_V2 dataset run below command in the root folder of the repo:

```
cd data; python ted.py # Optionally if you have downloaded the raw dataset file, pass --tar_path /path/to/TEDLIUM_release2.tar.gz
```
#### Voxforge

To download and setup the Voxforge dataset run the below command in the root folder of the repo:

```
cd data; python voxforge.py
```

Note that this dataset does not come with a validation dataset or test dataset.

#### LibriSpeech

To download and setup the LibriSpeech dataset run the below command in the root folder of the repo:

```
cd data; python librispeech.py
```

You have the option to download the raw dataset files manually or through the script (which will cache them as well).
In order to do this you must create the following folder structure and put the corresponding tar files that you download from [here](http:https://www.openslr.org/12/).

```
cd data/
mkdir LibriSpeech/ # This can be anything as long as you specify the directory path as --target-dir when running the librispeech.py script
mkdir LibriSpeech/val/
mkdir LibriSpeech/test/
mkdir LibriSpeech/train/
```

Now put the `tar.gz` files in the correct folders. They will now be used in the data pre-processing for librispeech and be removed after
formatting the dataset.

Optionally you can specify the exact librispeech files you want if you don't want to add all of them. This can be done like below:

```
cd data/
python librispeech.py --files-to-use "train-clean-100.tar.gz, train-clean-360.tar.gz,train-other-500.tar.gz, dev-clean.tar.gz,dev-other.tar.gz, test-clean.tar.gz,test-other.tar.gz"
```
Currently supports AN4, TEDLIUM, Voxforge, Common Voice and LibriSpeech. Scripts will setup the dataset and create manifest files used in data-loading. The scripts can be found in the data/ folder. Many of the scripts allow you to download the raw datasets separately if you choose so.

#### Custom Dataset

Expand All @@ -139,7 +81,7 @@ cd data/
python merge_manifests.py --output-path merged_manifest.csv --merge-dir all-manifests/ --min-duration 1 --max-duration 15 # durations in seconds
```

## Training
### Training a Model

```
python train.py --train-manifest data/train_manifest.csv --val-manifest data/val_manifest.csv
Expand All @@ -161,7 +103,7 @@ python train.py --tensorboard --logdir log_dir/ # Make sure the Tensorboard inst

For both visualisation tools, you can add your own name to the run by changing the `--id` parameter when training.

## Multi-GPU Training
### Multi-GPU Training

We support multi-GPU training via the distributed parallel wrapper (see [here](https://github.com/NVIDIA/sentiment-discovery/blob/master/analysis/scale.md) and [here](https://github.com/SeanNaren/deepspeech.pytorch/issues/211) to see why we don't use DataParallel).

Expand All @@ -181,7 +123,7 @@ python -m multiproc train.py --visdom --cuda --device-ids 0,1,2,3 # Add your par

We suggest using the NCCL backend which defaults to TCP if Infiniband isn't available.

## Mixed Precision
### Mixed Precision

If you are using NVIDIA volta cards or above to train your model, it's highly suggested to turn on mixed precision for speed/memory benefits. More information can be found [here](https://docs.nvidia.com/deeplearning/sdk/mixed-precision-training/index.html).

Expand Down Expand Up @@ -279,7 +221,9 @@ An example script to output a transcription has been provided:
python transcribe.py --model-path models/deepspeech.pth --audio-path /path/to/audio.wav
```

## Server
If you used mixed-precision or half precision when training the model, you can use the `--half` flag for a speed/memory benefit.

## Inference Server

Included is a basic server script that will allow post request to be sent to the server to transcribe files.

Expand All @@ -289,6 +233,38 @@ python server.py --host 0.0.0.0 --port 8000 # Run on one window
curl -X POST http:https://0.0.0.0:8000/transcribe -H "Content-type: multipart/form-data" -F "file=@/path/to/input.wav"
```

## Using an ARPA LM

We support using kenlm based LMs. Below are instructions on how to take the LibriSpeech LMs found [here](http:https://www.openslr.org/11/) and tune the model to give you the best parameters when decoding, based on LibriSpeech.

### Tuning the LibriSpeech LMs

First ensure you've set up the librispeech datasets from the data/ folder.
In addition download the latest pre-trained librispeech model from the releases page, as well as the ARPA model you want to tune from [here](http:https://www.openslr.org/11/). For the below we use the 4gram ARPA model.

First we need to generate the acoustic output to be used to evaluate the model on LibriSpeech val.
```
python test.py --test-manifest data/librispeech_val_manifest.csv --model-path librispeech_pretrained_v2.pth --cuda --half --save-output librispeech_val_output.npy
```

We use a beam width of 128 which gives reasonable results. We suggest using a CPU intensive node to carry out the grid search.

```
python search_lm_params.py --num-workers 16 --saved-output librispeech_val_output.npy --output-path libri_tune_output.json --lm-alpha-from 0 --lm-alpha-to 5 --lm-beta-from 0 --lm-beta-to 3 --lm-path 4-gram.arpa --model-path librispeech_pretrained_v2.pth --beam-width 128 --lm-workers 16
```

This will run a grid search across the alpha/beta parameters using a beam width of 128. Use the below script to find the best alpha/beta params:

```
python select_lm_params.py --input-path libri_tune_output.json
```

Use the alpha/beta parameters when using the beam decoder.

### Building your own LM

To build your own LM you need to use the KenLM repo found [here](https://github.com/kpu/kenlm). Have a read of the documentation to get a sense of how to train your own LM. The above steps once trained can be used to find the appropriate parameters.

### Alternate Decoders
By default, `test.py` and `transcribe.py` use a `GreedyDecoder` which picks the highest-likelihood output label at each timestep. Repeated and blank symbols are then filtered to give the final output.

Expand Down
2 changes: 1 addition & 1 deletion tune_decoder.py → search_lm_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from model import DeepSpeech
from opts import add_decoder_args

parser = argparse.ArgumentParser(description='DeepSpeech transcription')
parser = argparse.ArgumentParser(description='Tune an ARPA LM based on a pre-trained acoustic model output')
parser.add_argument('--model-path', default='models/deepspeech_final.pth',
help='Path to model file created by training')
parser.add_argument('--saved-output', default="", type=str, help='Path to output from test.py')
Expand Down
12 changes: 12 additions & 0 deletions select_lm_params.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
import argparse
import json

parser = argparse.ArgumentParser(description='Select the best parameters based on the WER')
parser.add_argument('--input-path', type=str, help='Output json file from search_lm_params')
args = parser.parse_args()

with open(args.input_path) as f:
results = json.load(f)

min_results = min(results, key=lambda x: x[2]) # Find the minimum WER (alpha, beta, WER, CER)
print("Alpha: %f \nBeta: %f \nWER: %f\nCER: %f" % tuple(min_results))

0 comments on commit 62bd824

Please sign in to comment.