An opensource speech-to-text software written in tensorflow.
Python3, portaudio19-dev and ffmpeg are required.
On Ubuntu install via
sudo apt install python3-pip portaudio19-dev ffmpeg
pip3 install git+https://github.com/timediv/speechT
Currently speechT is based on the Wav2Letter paper and the CTC loss function.
The speech corpus from http:https://www.openslr.org/12/ is automatically downloaded.
Note: The corpus is about 30GB!
The data must be preprocessed before training
speecht-cli preprocess
Then, to run the training, execute
speecht-cli train
Use --help
for more details.
You can monitor the training and see other logs in tensorboard
tensorboard --logdir log/
To evaluate on the test set run
speecht-cli evaluate
Use --help
for more details.
To record using your microphone and then print the transcription run
speecht-cli record
Use --help
for more details.
If you'd like to use KenLM as a language model for decoding you need to compile and install tensorflow-with-kenlm.
Then run:
speecht-cli evaluate --language-model YOUR_KENLM_FILES_DIRECTORY/