- Italy, Bologna
- @loretoparisi
Block or Report
Block or report loretoparisi
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseAudio Generation
Official pytorch implementation of the paper: "Catch-A-Waveform: Learning to Generate Audio from a Single Short Example" (NeurIPS 2021)
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Speech recognition tool to convert audio to text transcripts, for Linux and Raspberry Pi.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A repository for demos illustrating features of the Web Speech API. See https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API for more details.
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
A lightweight yet powerful audio-to-MIDI converter with pitch bend detection
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
🔊 Text-Prompted Generative Audio Model
Port of OpenAI's Whisper model in C/C++
Muzic: Music Understanding and Generation with Artificial Intelligence
The code for the bark-voicecloning model. Training and inference.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Unofficial implementation of NVIDIA P-Flow TTS paper
An Open Source text-to-speech system built by inverting Whisper.
On-device Speech Recognition for Apple Silicon
Zero-Shot Speech Editing and Text-to-Speech in the Wild
React Native Expo wrapper for the Swift WhisperKit library
Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector