A collection of audio security related resources
-
Voice Cloning
- Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
- GUI Python toolbox that boasts the ability to clone a voice with 5 seconds of sample data
- Uses PyTorch and requires a GPU
- Video presentation of toolbox features
- Python, Theano
- Python, Tensorflow
- Python
- Python, Tensorflow, librosa
- Python
Here are a collection of audio datasets for training new models
- Contains urban sound tags like birds, car horns, etc
- The Ryerson Audio-Visual Database of Emotional Speech and Song
- Contains emotion tags
- 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books
- Sourced from the LibriVox project
- Data comes from professional audiobooks produced by Usborne Publishing
- This CSTR VCTK Corpus includes speech data uttered by 109 native speakers of English with various accents. Each speaker reads out about 400 sentences, most of which were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
- All speech data was recorded using an identical recording setup: an omni-directional head-mounted microphone (DPA 4035), 96kHz sampling frequency at 24 bits and in a hemi-anechoic chamber of the University of Edinburgh.
- A TensorFlow implementation of Baidu's DeepSpeech architecture
- Research project from the Carnegie Mellon University
- Golang library
- Python library
- Based on a Google published paper published in April 2017, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs.
- An implementation of Tacotron speech synthesis in TensorFlow
- WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based artificial intelligence firm DeepMind. The technique, outlined in a paper in September 2016,[1] is able to generate more realistic-sounding human-like voices by sampling real human speech and directly modelling waveforms.
- Keras implementation of Wavenet
- Tensorflow implementation of WaveNet
Sometimes open source projects either aren't up to par or a service with greater compute resources behind it makes more sense for a project.
- Free voice cloning service
- Free usage tier
- Otherwise $.0125/min
- Requires text to be chunked so you'll need to split it along pauses
- Free usage tier
- Something like $0.10/min - Not cheap
- Automatically transcribes speech to text for free!
- Various audio security services. Use with caution.