updated readme for slimmed down test of pytorch 1

experimenti · Oct 7, 2018 · 2e858f4 · 2e858f4
1 parent 840756b
commit 2e858f4
Showing 1 changed file with 19 additions and 193 deletions.
diff --git a/README.md b/README.md
@@ -1,87 +1,44 @@
 # deepspeech.pytorch
 
-Forked from SeanNaren/deepspeech.pytorch
-
-Added nightly build of Pytorch to requirements.txt - testing with python 3.7 and Pytorch 1.0 (pre-release)
-
-## Optionally: 
-
-- Built from Pytorch source using conda: Follow instructions at this URL: https://github.com/pytorch/pytorch#from-source, except conda install -c pytorch magma-cuda92
-
-# Naren Doc
-
 Implementation of DeepSpeech2 using [Baidu Warp-CTC](https://github.com/baidu-research/warp-ctc).
 Creates a network based on the [DeepSpeech2](http:https://arxiv.org/pdf/1512.02595v1.pdf) architecture, trained with the CTC activation function.
 
-## Features
+Forked from SeanNaren/deepspeech.pytorch
 
-* Train DeepSpeech, configurable RNN types and architectures with multi-gpu support.
-* Language model support using kenlm (WIP right now, currently no instructions to build a LM yet).
-* Multiple dataset downloaders, support for AN4, TED, Voxforge and Librispeech. Datasets can be merged, support for custom datasets included.
-* Noise injection for online training to improve noise robustness.
-* Audio augmentation to improve noise robustness.
-* Easy start/stop capabilities in the event of crash or hard stop during training.
-* Visdom/Tensorboard support for visualizing training graphs.
+Testing with python 3.7 and Pytorch 1.0 (pre-release)
 
-# Installation
+## Features
 
-Several libraries are needed to be installed for training to work. I will assume that everything is being installed in
-an Anaconda installation on Ubuntu.
+Sean Naren's excellent implentation contains a lot of functionality (distributed training, etc.) that is not included here.
 
-Install [PyTorch](https://github.com/pytorch/pytorch#installation) if you haven't already 
-Source install: # https://github.com/pytorch/pytorch#from-source
-or, typically: conda install pytorch-nightly -c pytorch
+* Train DeepSpeech, configurable RNN types with Pytorch 1.0 CTC Loss
+* Multiple dataset downloaders, support for AN4, TED, Voxforge and Librispeech. Datasets can be merged, support for custom datasets included.
 
+WIP:
+- [] multi-gpu support.
+- [] cpu support.
+- [] Language model support using kenlm (WIP right now, currently no instructions to build a LM yet).
+- [] Noise injection for online training to improve noise robustness.
+- [] Audio augmentation to improve noise robustness.
+- [] Easy start/stop capabilities in the event of crash or hard stop during training.
+- [] Visdom/Tensorboard support for visualizing training graphs.
 
-Install this fork for Warp-CTC bindings:
+# Installation
 
+conda / ubuntu with typings and python 3.7
 ```
-git clone https://github.com/SeanNaren/warp-ctc.git
-cd warp-ctc
-mkdir build; cd build
-cmake ..
-make
-export CUDA_HOME="/usr/local/cuda"
-cd ../pytorch_binding
-python setup.py install
+conda create -n deepspeech python=3.7
 ```
 
-Install pytorch audio:
-```
-sudo apt-get install sox libsox-dev libsox-fmt-all
-git clone https://github.com/pytorch/audio.git
-cd audio
-pip install cffi
-python setup.py install
-```
-# If you are working with on-going releases of the pytorch code, delete any old compiled binaries in /build prior to running the setup so the extension links with correct version.
+Install [PyTorch 1.0](https://pytorch.org/get-started/locally/) if you haven't already 
+or, typically: pip install torch_nightly -f https://download.pytorch.org/whl/nightly/cu92/torch_nightly.html
 
-If you want decoding to support beam search with an optional language model, install ctcdecode:
-```
-git clone --recursive https://github.com/parlance/ctcdecode.git
-cd ctcdecode
-pip install .
-```
 
 Finally clone this repo and run this within the repo:
 ```
 pip install -r requirements.txt
 ```
 
-## Docker
-
-There is no official Dockerhub image, however a Dockerfile is provided to build on your own systems.
-
-```
-sudo nvidia-docker build -t deepspeech2.docker .
-sudo nvidia-docker run -ti -v `pwd`/data:/workspace/data -p 8888:8888 deepspeech2.docker # Opens a Jupyter notebook, mounting the /data drive in the container
-```
-
-If you'd prefer bash:
-
-```
-nvidia-docker run -ti -v `pwd`/data:/workspace/data --entrypoint=/bin/bash deepspeech2.docker # Opens a bash terminal, mounting the /data drive in the container
-
 ```
 # Usage
 
@@ -180,113 +137,6 @@ Use `python train.py --help` for more parameters and options.
 
 There is also [Visdom](https://github.com/facebookresearch/visdom) support to visualize training. Once a server has been started, to use:
 
-```
-python train.py --visdom
-```
-
-There is also [Tensorboard](https://github.com/lanpa/tensorboard-pytorch) support to visualize training. Follow the instructions to set up. To use:
-
-```
-python train.py --tensorboard --logdir log_dir/ # Make sure the Tensorboard instance is made pointing to this log directory
-```
-
-For both visualisation tools, you can add your own name to the run by changing the `--id` parameter when training.
-
-## Multi-GPU Training
-
-We support multi-GPU training via the distributed parallel wrapper (see [here](https://github.com/NVIDIA/sentiment-discovery/blob/master/analysis/scale.md) and [here](https://github.com/SeanNaren/deepspeech.pytorch/issues/211) to see why we don't use DataParallel).
-
-To use multi-GPU:
-
-```
-python -m multiproc train.py --visdom --cuda # Add your parameters as normal, multiproc will scale to all GPUs automatically
-```
-
-multiproc will open a log for all processes other than the main process.
-
-We suggest using the gloo backend which defaults to TCP if Infiniband isn't available. Using NCCL2 is also possible as a backend. More information [here](http:https://pytorch.org/docs/master/distributed.html#distributed-basics).
-
-You can also specify specific GPU IDs rather than allowing the script to use all available GPUs:
-
-```
-python -m multiproc train.py --visdom --cuda --device-ids 0,1,2,3 # Add your parameters as normal, will only run on 4 GPUs
-```
-
-### Noise Augmentation/Injection
-
-There is support for two different types of noise; noise augmentation and noise injection.
-
-#### Noise Augmentation
-
-Applies small changes to the tempo and gain when loading audio to increase robustness. To use, use the `--augment` flag when training.
-
-#### Noise Injection
-
-Dynamically adds noise into the training data to increase robustness. To use, first fill a directory up with all the noise files you want to sample from.
-The dataloader will randomly pick samples from this directory.
-
-To enable noise injection, use the `--noise-dir /path/to/noise/dir/` to specify where your noise files are. There are a few noise parameters to tweak, such as
-`--noise_prob` to determine the probability that noise is added, and the `--noise-min`, `--noise-max` parameters to determine the minimum and maximum noise to add in training.
-
-Included is a script to inject noise into an audio file to hear what different noise levels/files would sound like. Useful for curating the noise dataset.
-
-```
-python noise_inject.py --input-path /path/to/input.wav --noise-path /path/to/noise.wav --output-path /path/to/input_injected.wav --noise-level 0.5 # higher levels means more noise
-```
-
-### Checkpoints
-
-Training supports saving checkpoints of the model to continue training from should an error occur or early termination. To enable epoch
-checkpoints use:
-
-```
-python train.py --checkpoint
-```
-
-To enable checkpoints every N batches through the epoch as well as epoch saving:
-
-```
-python train.py --checkpoint --checkpoint-per-batch N # N is the number of batches to wait till saving a checkpoint at this batch.
-```
-
-Note for the batch checkpointing system to work, you cannot change the batch size when loading a checkpointed model from it's original training
-run.
-
-To continue from a checkpointed model that has been saved:
-
-```
-python train.py --continue-from models/deepspeech_checkpoint_epoch_N_iter_N.pth
-```
-
-This continues from the same training state as well as recreates the visdom graph to continue from if enabled.
-
-If you would like to start from a previous checkpoint model but not continue training, add the `--finetune` flag to restart training
-from the `--continue-from` weights.
-
-### Choosing batch sizes
-
-Included is a script that can be used to benchmark whether training can occur on your hardware, and the limits on the size of the model/batch
-sizes you can use. To use:
-
-```
-python benchmark.py --batch-size 32
-```
-
-Use the flag `--help` to see other parameters that can be used with the script.
-
-### Model details
-
-Saved models contain the metadata of their training process. To see the metadata run the below command:
-
-```
-python model.py --model-path models/deepspeech.pth
-```
-
-To also note, there is no final softmax layer on the model as when trained, warp-ctc does this softmax internally. This will have to also be implemented in complex decoders if anything is built on top of the model, so take this into consideration!
-
-## Testing/Inference
-
-To evaluate a trained model on a test set (has to be in the same format as the training set):
 
 ```
 python test.py --model-path models/deepspeech.pth --test-manifest /path/to/test_manifest.csv --cuda
@@ -298,30 +148,6 @@ An example script to output a transcription has been provided:
 python transcribe.py --model-path models/deepspeech.pth --audio-path /path/to/audio.wav
 ```
 
-## Server
-
-Included is a basic server script that will allow post request to be sent to the server to transcribe files.
-
-```
-python server.py --host 0.0.0.0 --port 8000 # Run on one window
-
-curl -X POST http:https://0.0.0.0:8000/transcribe -H "Content-type: multipart/form-data" -F "file=@/path/to/input.wav"
-```
-
-### Alternate Decoders
-By default, `test.py` and `transcribe.py` use a `GreedyDecoder` which picks the highest-likelihood output label at each timestep. Repeated and blank symbols are then filtered to give the final output.
-
-A beam search decoder can optionally be used with the installation of the `ctcdecode` library as described in the Installation section. The `test` and `transcribe` scripts have a `--decoder` argument. To use the beam decoder, add `--decoder beam`. The beam decoder enables additional decoding parameters:
-- **beam_width** how many beams to consider at each timestep
-- **lm_path** optional binary KenLM language model to use for decoding
-- **alpha** weight for language model
-- **beta** bonus weight for words
-
-### Time offsets
-
-Use the `--offsets` flag to get positional information of each character in the transcription when using `transcribe.py` script. The offsets are based on the size
-of the output tensor, which you need to convert into a format required.
-For example, based on default parameters you could multiply the offsets by a scalar (duration of file in seconds / size of output) to get the offsets in seconds.
 
 ## Pre-trained models