Skip to content

Commit

Permalink
Merge pull request SeanNaren#222 from ryanleary/posix-args
Browse files Browse the repository at this point in the history
Switch cmd-line args to POSIX-style
  • Loading branch information
Sean Naren committed Jan 18, 2018
2 parents 29b1cc8 + bec7121 commit 7c79fbf
Show file tree
Hide file tree
Showing 14 changed files with 118 additions and 118 deletions.
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Creates a network based on the [DeepSpeech2](http:https://arxiv.org/pdf/1512.02595v1.p
* Noise injection for online training to improve noise robustness.
* Audio augmentation to improve noise robustness.
* Easy start/stop capabilities in the event of crash or hard stop during training.
* Visdom/Tensorboard support for visualising training graphs.
* Visdom/Tensorboard support for visualizing training graphs.

# Installation

Expand Down Expand Up @@ -102,7 +102,7 @@ In order to do this you must create the following folder structure and put the c

```
cd data/
mkdir LibriSpeech/ # This can be anything as long as you specify the directory path as --target_dir when running the librispeech.py script
mkdir LibriSpeech/ # This can be anything as long as you specify the directory path as --target-dir when running the librispeech.py script
mkdir LibriSpeech/val/
mkdir LibriSpeech/test/
mkdir LibriSpeech/train/
Expand All @@ -115,7 +115,7 @@ Optionally you can specify the exact librispeech files you want if you don't wan

```
cd data/
python librispeech.py --files_to_use "train-clean-100.tar.gz, train-clean-360.tar.gz,train-other-500.tar.gz, dev-clean.tar.gz,dev-other.tar.gz, test-clean.tar.gz,test-other.tar.gz"
python librispeech.py --files-to-use "train-clean-100.tar.gz, train-clean-360.tar.gz,train-other-500.tar.gz, dev-clean.tar.gz,dev-other.tar.gz, test-clean.tar.gz,test-other.tar.gz"
```

### Custom Dataset
Expand All @@ -138,27 +138,27 @@ containing all the manifests you want to merge. You can also prune short and lon

```
cd data/
python merge_manifests.py --output_path merged_manifest.csv --merge_dir all_manifests/ --min_duration 1 --max_duration 15 # durations in seconds
python merge_manifests.py --output-path merged_manifest.csv --merge-dir all-manifests/ --min-duration 1 --max-duration 15 # durations in seconds
```

## Training

```
python train.py --train_manifest data/train_manifest.csv --val_manifest data/val_manifest.csv
python train.py --train-manifest data/train_manifest.csv --val-manifest data/val_manifest.csv
```

Use `python train.py --help` for more parameters and options.

There is also [Visdom](https://github.com/facebookresearch/visdom) support to visualise training. Once a server has been started, to use:
There is also [Visdom](https://github.com/facebookresearch/visdom) support to visualize training. Once a server has been started, to use:

```
python train.py --visdom
```

There is also [Tensorboard](https://github.com/lanpa/tensorboard-pytorch) support to visualise training. Follow the instructions to set up. To use:
There is also [Tensorboard](https://github.com/lanpa/tensorboard-pytorch) support to visualize training. Follow the instructions to set up. To use:

```
python train.py --tensorboard --logdir log_dir/ # Make sure the tensorboard instance is made pointing to this log directory
python train.py --tensorboard --logdir log_dir/ # Make sure the Tensorboard instance is made pointing to this log directory
```

For both visualisation tools, you can add your own name to the run by changing the `--id` parameter when training.
Expand All @@ -176,13 +176,13 @@ Applies small changes to the tempo and gain when loading audio to increase robus
Dynamically adds noise into the training data to increase robustness. To use, first fill a directory up with all the noise files you want to sample from.
The dataloader will randomly pick samples from this directory.

To enable noise injection, use the `--noise_dir /path/to/noise/dir/` to specify where your noise files are. There are a few noise parameters to tweak, such as
`--noise_prob` to determine the probability that noise is added, and the `--noise_min`, `--noise_max` parameters to determine the minimum and maximum noise to add in training.
To enable noise injection, use the `--noise-dir /path/to/noise/dir/` to specify where your noise files are. There are a few noise parameters to tweak, such as
`--noise_prob` to determine the probability that noise is added, and the `--noise-min`, `--noise-max` parameters to determine the minimum and maximum noise to add in training.

Included is a script to inject noise into an audio file to hear what different noise levels/files would sound like. Useful for curating the noise dataset.

```
python noise_inject.py --input_path /path/to/input.wav --noise_path /path/to/noise.wav --output_path /path/to/input_injected.wav --noise_level 0.5 # higher levels means more noise
python noise_inject.py --input-path /path/to/input.wav --noise-path /path/to/noise.wav --output-path /path/to/input_injected.wav --noise-level 0.5 # higher levels means more noise
```

### Checkpoints
Expand All @@ -197,7 +197,7 @@ python train.py --checkpoint
To enable checkpoints every N batches through the epoch as well as epoch saving:

```
python train.py --checkpoint --checkpoint_per_batch N # N is the number of batches to wait till saving a checkpoint at this batch.
python train.py --checkpoint --checkpoint-per-batch N # N is the number of batches to wait till saving a checkpoint at this batch.
```

Note for the batch checkpointing system to work, you cannot change the batch size when loading a checkpointed model from it's original training
Expand All @@ -206,21 +206,21 @@ run.
To continue from a checkpointed model that has been saved:

```
python train.py --continue_from models/deepspeech_checkpoint_epoch_N_iter_N.pth.tar
python train.py --continue-from models/deepspeech_checkpoint_epoch_N_iter_N.pth.tar
```

This continues from the same training state as well as recreates the visdom graph to continue from if enabled.

If you would like to start from a previous checkpoint model but not continue training, add the `--finetune` flag to restart training
from the `--continue_from` weights.
from the `--continue-from` weights.

### Choosing batch sizes

Included is a script that can be used to benchmark whether training can occur on your hardware, and the limits on the size of the model/batch
sizes you can use. To use:

```
python benchmark.py --batch_size 32
python benchmark.py --batch-size 32
```

Use the flag `--help` to see other parameters that can be used with the script.
Expand All @@ -230,7 +230,7 @@ Use the flag `--help` to see other parameters that can be used with the script.
Saved models contain the metadata of their training process. To see the metadata run the below command:

```
python model.py --model_path models/deepspeech.pth.tar
python model.py --model-path models/deepspeech.pth.tar
```

To also note, there is no final softmax layer on the model as when trained, warp-ctc does this softmax internally. This will have to also be implemented in complex decoders if anything is built on top of the model, so take this into consideration!
Expand All @@ -240,13 +240,13 @@ To also note, there is no final softmax layer on the model as when trained, warp
To evaluate a trained model on a test set (has to be in the same format as the training set):

```
python test.py --model_path models/deepspeech.pth.tar --test_manifest /path/to/test_manifest.csv --cuda
python test.py --model-path models/deepspeech.pth.tar --test-manifest /path/to/test_manifest.csv --cuda
```

An example script to output a transcription has been provided:

```
python transcribe.py --model_path models/deepspeech.pth.tar --audio_path /path/to/audio.wav
python transcribe.py --model-path models/deepspeech.pth.tar --audio-path /path/to/audio.wav
```

### Alternate Decoders
Expand Down
16 changes: 8 additions & 8 deletions benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@
from model import DeepSpeech, supported_rnns

parser = argparse.ArgumentParser()
parser.add_argument('--batch_size', type=int, default=32, help='Size of input')
parser.add_argument('--batch-size', type=int, default=32, help='Size of input')
parser.add_argument('--seconds', type=int, default=15,
help='The size of the fake input in seconds using default stride of 0.01, '
'15s is usually the maximum duration')
parser.add_argument('--dry_runs', type=int, default=20, help='Dry runs before measuring performance')
parser.add_argument('--dry-runs', type=int, default=20, help='Dry runs before measuring performance')
parser.add_argument('--runs', type=int, default=20, help='How many benchmark runs to measure performance')
parser.add_argument('--labels_path', default='labels.json', help='Path to the labels to infer over in the model')
parser.add_argument('--hidden_size', default=400, type=int, help='Hidden size of RNNs')
parser.add_argument('--hidden_layers', default=4, type=int, help='Number of RNN layers')
parser.add_argument('--rnn_type', default='lstm', help='Type of the RNN. rnn|gru|lstm are supported')
parser.add_argument('--sample_rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--window_size', default=.02, type=float, help='Window size for spectrogram in seconds')
parser.add_argument('--labels-path', default='labels.json', help='Path to the labels to infer over in the model')
parser.add_argument('--hidden-size', default=400, type=int, help='Hidden size of RNNs')
parser.add_argument('--hidden-layers', default=4, type=int, help='Number of RNN layers')
parser.add_argument('--rnn-type', default='lstm', help='Type of the RNN. rnn|gru|lstm are supported')
parser.add_argument('--sample-rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--window-size', default=.02, type=float, help='Window size for spectrogram in seconds')
args = parser.parse_args()

input = torch.randn(args.batch_size, 1, 161, args.seconds * 100).cuda()
Expand Down
6 changes: 3 additions & 3 deletions data/an4.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@
from utils import create_manifest

parser = argparse.ArgumentParser(description='Processes and downloads an4.')
parser.add_argument('--target_dir', default='an4_dataset/', help='Path to save dataset')
parser.add_argument('--min_duration', default=1, type=int,
parser.add_argument('--target-dir', default='an4_dataset/', help='Path to save dataset')
parser.add_argument('--min-duration', default=1, type=int,
help='Prunes training samples shorter than the min duration (given in seconds, default 1)')
parser.add_argument('--max_duration', default=15, type=int,
parser.add_argument('--max-duration', default=15, type=int,
help='Prunes training samples longer than the max duration (given in seconds, default 15)')
args = parser.parse_args()

Expand Down
14 changes: 7 additions & 7 deletions data/common_voice.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
from utils import create_manifest

parser = argparse.ArgumentParser(description='Downloads and processes Mozilla Common Voice dataset.')
parser.add_argument("--target_dir", default='CommonVoice_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument("--tar_path", type=str, help="Path to the Common Voice *.tar file if downloaded (Optional).")
parser.add_argument('--sample_rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--min_duration', default=1, type=int,
parser.add_argument("--target-dir", default='CommonVoice_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument("--tar-path", type=str, help="Path to the Common Voice *.tar file if downloaded (Optional).")
parser.add_argument('--sample-rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--min-duration', default=1, type=int,
help='Prunes training samples shorter than the min duration (given in seconds, default 1)')
parser.add_argument('--max_duration', default=15, type=int,
parser.add_argument('--max-duration', default=15, type=int,
help='Prunes training samples longer than the max duration (given in seconds, default 15)')
parser.add_argument('--files_to_process', default="cv-valid-dev.csv,cv-valid-test.csv,cv-valid-train.csv",
parser.add_argument('--files-to-process', default="cv-valid-dev.csv,cv-valid-test.csv,cv-valid-train.csv",
type=str, help='list of *.csv file names to process')
args = parser.parse_args()
COMMON_VOICE_URL = "https://common-voice-data-download.s3.amazonaws.com/cv_corpus_v1.tar.gz"
Expand Down Expand Up @@ -85,4 +85,4 @@ def main():
args.max_duration)

if __name__ == "__main__":
main()
main()
10 changes: 5 additions & 5 deletions data/librispeech.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,16 +7,16 @@
import shutil

parser = argparse.ArgumentParser(description='Processes and downloads LibriSpeech dataset.')
parser.add_argument("--target_dir", default='LibriSpeech_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument('--sample_rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--files_to_use', default="train-clean-100.tar.gz,"
parser.add_argument("--target-dir", default='LibriSpeech_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument('--sample-rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--files-to-use', default="train-clean-100.tar.gz,"
"train-clean-360.tar.gz,train-other-500.tar.gz,"
"dev-clean.tar.gz,dev-other.tar.gz,"
"test-clean.tar.gz,test-other.tar.gz", type=str,
help='list of file names to download')
parser.add_argument('--min_duration', default=1, type=int,
parser.add_argument('--min-duration', default=1, type=int,
help='Prunes training samples shorter than the min duration (given in seconds, default 1)')
parser.add_argument('--max_duration', default=15, type=int,
parser.add_argument('--max-duration', default=15, type=int,
help='Prunes training samples longer than the max duration (given in seconds, default 15)')
args = parser.parse_args()

Expand Down
8 changes: 4 additions & 4 deletions data/merge_manifests.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@
from utils import order_and_prune_files

parser = argparse.ArgumentParser(description='Merges all manifest CSV files in specified folder.')
parser.add_argument('--merge_dir', default='manifests/', help='Path to all manifest files you want to merge')
parser.add_argument('--min_duration', default=1, type=int,
parser.add_argument('--merge-dir', default='manifests/', help='Path to all manifest files you want to merge')
parser.add_argument('--min-duration', default=1, type=int,
help='Prunes any samples shorter than the min duration (given in seconds, default 1)')
parser.add_argument('--max_duration', default=15, type=int,
parser.add_argument('--max-duration', default=15, type=int,
help='Prunes any samples longer than the max duration (given in seconds, default 15)')
parser.add_argument('--output_path', default='merged_manifest.csv', help='Output path to merged manifest')
parser.add_argument('--output-path', default='merged_manifest.csv', help='Output path to merged manifest')

args = parser.parse_args()

Expand Down
10 changes: 5 additions & 5 deletions data/ted.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@
from tqdm import tqdm

parser = argparse.ArgumentParser(description='Processes and downloads TED-LIUMv2 dataset.')
parser.add_argument("--target_dir", default='TEDLIUM_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument("--tar_path", type=str, help="Path to the TEDLIUM_release tar if downloaded (Optional).")
parser.add_argument('--sample_rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--min_duration', default=1, type=int,
parser.add_argument("--target-dir", default='TEDLIUM_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument("--tar-path", type=str, help="Path to the TEDLIUM_release tar if downloaded (Optional).")
parser.add_argument('--sample-rate', default=16000, type=int, help='Sample rate')
parser.add_argument('--min-duration', default=1, type=int,
help='Prunes training samples shorter than the min duration (given in seconds, default 1)')
parser.add_argument('--max_duration', default=15, type=int,
parser.add_argument('--max-duration', default=15, type=int,
help='Prunes training samples longer than the max duration (given in seconds, default 15)')
args = parser.parse_args()

Expand Down
8 changes: 4 additions & 4 deletions data/voxforge.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
VOXFORGE_URL_16kHz = 'http:https://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/'

parser = argparse.ArgumentParser(description='Processes and downloads VoxForge dataset.')
parser.add_argument("--target_dir", default='voxforge_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument('--sample_rate', default=16000,
parser.add_argument("--target-dir", default='voxforge_dataset/', type=str, help="Directory to store the dataset.")
parser.add_argument('--sample-rate', default=16000,
type=int, help='Sample rate')
parser.add_argument('--min_duration', default=1, type=int,
parser.add_argument('--min-duration', default=1, type=int,
help='Prunes training samples shorter than the min duration (given in seconds, default 1)')
parser.add_argument('--max_duration', default=15, type=int,
parser.add_argument('--max-duration', default=15, type=int,
help='Prunes training samples longer than the max duration (given in seconds, default 15)')
args = parser.parse_args()

Expand Down
2 changes: 1 addition & 1 deletion model.py
Original file line number Diff line number Diff line change
Expand Up @@ -286,7 +286,7 @@ def get_meta(model):
import argparse

parser = argparse.ArgumentParser(description='DeepSpeech model information')
parser.add_argument('--model_path', default='models/deepspeech_final.pth.tar',
parser.add_argument('--model-path', default='models/deepspeech_final.pth.tar',
help='Path to model file created by training')
args = parser.parse_args()
package = torch.load(args.model_path, map_location=lambda storage, loc: storage)
Expand Down
10 changes: 5 additions & 5 deletions noise_inject.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
from data.data_loader import load_audio, NoiseInjection

parser = argparse.ArgumentParser()
parser.add_argument('--input_path', default='input.wav', help='The input audio to inject noise into')
parser.add_argument('--noise_path', default='noise.wav', help='The noise file to mix in')
parser.add_argument('--output_path', default='output.wav', help='The noise file to mix in')
parser.add_argument('--sample_rate', default=16000, help='Sample rate to save output as')
parser.add_argument('--noise_level', type=float, default=1.0,
parser.add_argument('--input-path', default='input.wav', help='The input audio to inject noise into')
parser.add_argument('--noise-path', default='noise.wav', help='The noise file to mix in')
parser.add_argument('--output-path', default='output.wav', help='The noise file to mix in')
parser.add_argument('--sample-rate', default=16000, help='Sample rate to save output as')
parser.add_argument('--noise-level', type=float, default=1.0,
help='The Signal to Noise ratio (higher means more noise)')
args = parser.parse_args()

Expand Down
Loading

0 comments on commit 7c79fbf

Please sign in to comment.