Stars
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Self-supervised learning for fast pitch estimation
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
*CREPE+HYBRID TRAINING* A very experimental fork of the Retrieval-based-Voice-Conversion-WebUI repo that incorporates a variety of other f0 methods, along with a hybrid f0 nanmedian method.
Super-Resolution Neural Operator, in CVPR 2023
Official code of OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
Speed up Stable Diffusion with this one simple trick!
Easily train a good VC model with voice data <= 10 mins!