Skip to content

kai5263499/audio-security-awesome

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 

Repository files navigation

audio-security-awesome AwesomeTravis


A collection of audio security related resources


Voice Cloning Blog Posts


  • Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks

Voice Cloning Papers


Voice Cloning Implementations


  • Python, Theano
  • Python, Tensorflow
  • Python
  • Python, Tensorflow, librosa
  • Python

Datasets


Here are a collection of audio datasets for training new models

  • Contains urban sound tags like birds, car horns, etc
  • The Ryerson Audio-Visual Database of Emotional Speech and Song
  • Contains emotion tags
  • 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books
  • Sourced from the LibriVox project
  • Data comes from professional audiobooks produced by Usborne Publishing
  • This CSTR VCTK Corpus includes speech data uttered by 109 native speakers of English with various accents. Each speaker reads out about 400 sentences, most of which were selected from a newspaper plus the Rainbow Passage and an elicitation paragraph intended to identify the speaker's accent.
  • All speech data was recorded using an identical recording setup: an omni-directional head-mounted microphone (DPA 4035), 96kHz sampling frequency at 24 bits and in a hemi-anechoic chamber of the University of Edinburgh.

Speech To Text


  • A TensorFlow implementation of Baidu's DeepSpeech architecture

Text to Speech


  • Based on a Google published paper published in April 2017, Tacotron: Towards End-to-End Speech Synthesis, where they present a neural text-to-speech model that learns to synthesize speech directly from (text, audio) pairs.
  • An implementation of Tacotron speech synthesis in TensorFlow
  • WaveNet is a deep neural network for generating raw audio. It was created by researchers at London-based artificial intelligence firm DeepMind. The technique, outlined in a paper in September 2016,[1] is able to generate more realistic-sounding human-like voices by sampling real human speech and directly modelling waveforms.
  • Keras implementation of Wavenet
  • Tensorflow implementation of WaveNet

Commercial Services


Sometimes open source projects either aren't up to par or a service with greater compute resources behind it makes more sense for a project.

  • Free voice cloning service
  • Free usage tier
  • Otherwise $.0125/min
  • Requires text to be chunked so you'll need to split it along pauses
  • Free usage tier
  • Something like $0.10/min - Not cheap
  • Automatically transcribes speech to text for free!
  • Various audio security services. Use with caution.

About

Assortment of resources related to audio security

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published