Skip to content

Commit

Permalink
updated readme for slimmed down test of pytorch 1
Browse files Browse the repository at this point in the history
  • Loading branch information
John committed Oct 7, 2018
1 parent 840756b commit 2e858f4
Showing 1 changed file with 19 additions and 193 deletions.
212 changes: 19 additions & 193 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,87 +1,44 @@
# deepspeech.pytorch

Forked from SeanNaren/deepspeech.pytorch

Added nightly build of Pytorch to requirements.txt - testing with python 3.7 and Pytorch 1.0 (pre-release)

## Optionally:

- Built from Pytorch source using conda: Follow instructions at this URL: https://github.com/pytorch/pytorch#from-source, except conda install -c pytorch magma-cuda92

# Naren Doc

Implementation of DeepSpeech2 using [Baidu Warp-CTC](https://github.com/baidu-research/warp-ctc).
Creates a network based on the [DeepSpeech2](http:https://arxiv.org/pdf/1512.02595v1.pdf) architecture, trained with the CTC activation function.

## Features
Forked from SeanNaren/deepspeech.pytorch

* Train DeepSpeech, configurable RNN types and architectures with multi-gpu support.
* Language model support using kenlm (WIP right now, currently no instructions to build a LM yet).
* Multiple dataset downloaders, support for AN4, TED, Voxforge and Librispeech. Datasets can be merged, support for custom datasets included.
* Noise injection for online training to improve noise robustness.
* Audio augmentation to improve noise robustness.
* Easy start/stop capabilities in the event of crash or hard stop during training.
* Visdom/Tensorboard support for visualizing training graphs.
Testing with python 3.7 and Pytorch 1.0 (pre-release)

# Installation
## Features

Several libraries are needed to be installed for training to work. I will assume that everything is being installed in
an Anaconda installation on Ubuntu.
Sean Naren's excellent implentation contains a lot of functionality (distributed training, etc.) that is not included here.

Install [PyTorch](https://github.com/pytorch/pytorch#installation) if you haven't already
Source install: # https://github.com/pytorch/pytorch#from-source
or, typically: conda install pytorch-nightly -c pytorch
* Train DeepSpeech, configurable RNN types with Pytorch 1.0 CTC Loss
* Multiple dataset downloaders, support for AN4, TED, Voxforge and Librispeech. Datasets can be merged, support for custom datasets included.

WIP:
- [] multi-gpu support.
- [] cpu support.
- [] Language model support using kenlm (WIP right now, currently no instructions to build a LM yet).
- [] Noise injection for online training to improve noise robustness.
- [] Audio augmentation to improve noise robustness.
- [] Easy start/stop capabilities in the event of crash or hard stop during training.
- [] Visdom/Tensorboard support for visualizing training graphs.

Install this fork for Warp-CTC bindings:
# Installation

conda / ubuntu with typings and python 3.7
```
git clone https://github.com/SeanNaren/warp-ctc.git
cd warp-ctc
mkdir build; cd build
cmake ..
make
export CUDA_HOME="/usr/local/cuda"
cd ../pytorch_binding
python setup.py install
conda create -n deepspeech python=3.7
```

Install pytorch audio:
```
sudo apt-get install sox libsox-dev libsox-fmt-all
git clone https://github.com/pytorch/audio.git
cd audio
pip install cffi
python setup.py install
```
# If you are working with on-going releases of the pytorch code, delete any old compiled binaries in /build prior to running the setup so the extension links with correct version.
Install [PyTorch 1.0](https://pytorch.org/get-started/locally/) if you haven't already
or, typically: pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html

If you want decoding to support beam search with an optional language model, install ctcdecode:
```
git clone --recursive https://github.com/parlance/ctcdecode.git
cd ctcdecode
pip install .
```

Finally clone this repo and run this within the repo:
```
pip install -r requirements.txt
```

## Docker

There is no official Dockerhub image, however a Dockerfile is provided to build on your own systems.

```
sudo nvidia-docker build -t deepspeech2.docker .
sudo nvidia-docker run -ti -v `pwd`/data:/workspace/data -p 8888:8888 deepspeech2.docker # Opens a Jupyter notebook, mounting the /data drive in the container
```

If you'd prefer bash:

```
nvidia-docker run -ti -v `pwd`/data:/workspace/data --entrypoint=/bin/bash deepspeech2.docker # Opens a bash terminal, mounting the /data drive in the container
```
# Usage
Expand Down Expand Up @@ -180,113 +137,6 @@ Use `python train.py --help` for more parameters and options.
There is also [Visdom](https://github.com/facebookresearch/visdom) support to visualize training. Once a server has been started, to use:
```
python train.py --visdom
```

There is also [Tensorboard](https://github.com/lanpa/tensorboard-pytorch) support to visualize training. Follow the instructions to set up. To use:

```
python train.py --tensorboard --logdir log_dir/ # Make sure the Tensorboard instance is made pointing to this log directory
```

For both visualisation tools, you can add your own name to the run by changing the `--id` parameter when training.

## Multi-GPU Training

We support multi-GPU training via the distributed parallel wrapper (see [here](https://github.com/NVIDIA/sentiment-discovery/blob/master/analysis/scale.md) and [here](https://github.com/SeanNaren/deepspeech.pytorch/issues/211) to see why we don't use DataParallel).

To use multi-GPU:

```
python -m multiproc train.py --visdom --cuda # Add your parameters as normal, multiproc will scale to all GPUs automatically
```

multiproc will open a log for all processes other than the main process.

We suggest using the gloo backend which defaults to TCP if Infiniband isn't available. Using NCCL2 is also possible as a backend. More information [here](http:https://pytorch.org/docs/master/distributed.html#distributed-basics).

You can also specify specific GPU IDs rather than allowing the script to use all available GPUs:

```
python -m multiproc train.py --visdom --cuda --device-ids 0,1,2,3 # Add your parameters as normal, will only run on 4 GPUs
```

### Noise Augmentation/Injection

There is support for two different types of noise; noise augmentation and noise injection.

#### Noise Augmentation

Applies small changes to the tempo and gain when loading audio to increase robustness. To use, use the `--augment` flag when training.

#### Noise Injection

Dynamically adds noise into the training data to increase robustness. To use, first fill a directory up with all the noise files you want to sample from.
The dataloader will randomly pick samples from this directory.

To enable noise injection, use the `--noise-dir /path/to/noise/dir/` to specify where your noise files are. There are a few noise parameters to tweak, such as
`--noise_prob` to determine the probability that noise is added, and the `--noise-min`, `--noise-max` parameters to determine the minimum and maximum noise to add in training.

Included is a script to inject noise into an audio file to hear what different noise levels/files would sound like. Useful for curating the noise dataset.

```
python noise_inject.py --input-path /path/to/input.wav --noise-path /path/to/noise.wav --output-path /path/to/input_injected.wav --noise-level 0.5 # higher levels means more noise
```

### Checkpoints

Training supports saving checkpoints of the model to continue training from should an error occur or early termination. To enable epoch
checkpoints use:

```
python train.py --checkpoint
```

To enable checkpoints every N batches through the epoch as well as epoch saving:

```
python train.py --checkpoint --checkpoint-per-batch N # N is the number of batches to wait till saving a checkpoint at this batch.
```

Note for the batch checkpointing system to work, you cannot change the batch size when loading a checkpointed model from it's original training
run.

To continue from a checkpointed model that has been saved:

```
python train.py --continue-from models/deepspeech_checkpoint_epoch_N_iter_N.pth
```

This continues from the same training state as well as recreates the visdom graph to continue from if enabled.

If you would like to start from a previous checkpoint model but not continue training, add the `--finetune` flag to restart training
from the `--continue-from` weights.

### Choosing batch sizes

Included is a script that can be used to benchmark whether training can occur on your hardware, and the limits on the size of the model/batch
sizes you can use. To use:

```
python benchmark.py --batch-size 32
```

Use the flag `--help` to see other parameters that can be used with the script.

### Model details

Saved models contain the metadata of their training process. To see the metadata run the below command:

```
python model.py --model-path models/deepspeech.pth
```

To also note, there is no final softmax layer on the model as when trained, warp-ctc does this softmax internally. This will have to also be implemented in complex decoders if anything is built on top of the model, so take this into consideration!

## Testing/Inference

To evaluate a trained model on a test set (has to be in the same format as the training set):
```
python test.py --model-path models/deepspeech.pth --test-manifest /path/to/test_manifest.csv --cuda
Expand All @@ -298,30 +148,6 @@ An example script to output a transcription has been provided:
python transcribe.py --model-path models/deepspeech.pth --audio-path /path/to/audio.wav
```
## Server

Included is a basic server script that will allow post request to be sent to the server to transcribe files.

```
python server.py --host 0.0.0.0 --port 8000 # Run on one window
curl -X POST http:https://0.0.0.0:8000/transcribe -H "Content-type: multipart/form-data" -F "file=@/path/to/input.wav"
```

### Alternate Decoders
By default, `test.py` and `transcribe.py` use a `GreedyDecoder` which picks the highest-likelihood output label at each timestep. Repeated and blank symbols are then filtered to give the final output.

A beam search decoder can optionally be used with the installation of the `ctcdecode` library as described in the Installation section. The `test` and `transcribe` scripts have a `--decoder` argument. To use the beam decoder, add `--decoder beam`. The beam decoder enables additional decoding parameters:
- **beam_width** how many beams to consider at each timestep
- **lm_path** optional binary KenLM language model to use for decoding
- **alpha** weight for language model
- **beta** bonus weight for words

### Time offsets

Use the `--offsets` flag to get positional information of each character in the transcription when using `transcribe.py` script. The offsets are based on the size
of the output tensor, which you need to convert into a format required.
For example, based on default parameters you could multiply the offsets by a scalar (duration of file in seconds / size of output) to get the offsets in seconds.
## Pre-trained models
Expand Down

0 comments on commit 2e858f4

Please sign in to comment.