Stars
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilitiesγ
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Instant voice cloning by MIT and MyShell.
Inference and training library for high-quality TTS models.
AI powered speech denoising and enhancement. Adapted for windows and optimized
Noise supression using deep filtering
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
Upsampling Artifacts in Neural Audio Synthesis β https://arxiv.org/abs/2010.14356
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions
π Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. π§π₯π Advanced audio processing.
The official implementation of HierSpeech++
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
AI powered speech denoising and enhancement
Faster Tortoise inference then Tortoise Fast Fork
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
A rough and ready Python utility which splits audio files based on silence and desired min/max chunk duration.
Versatile audio super resolution (any -> 48kHz) with AudioSR.
Public voice datasets used for our Text-to-Speech voices.
152334H / DL-Art-School
Forked from neonbjb/DL-Art-SchoolTorToiSe fine-tuning with DLAS
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
A multi-voice TTS system trained with an emphasis on quality
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
π₯ 2D and 3D Face alignment library build using pytorch
The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."