Lists (9)
Sort Name ascending (A-Z)
Stars
Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
AI powered speech denoising and enhancement
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
A nearly-live implementation of OpenAI's Whisper.
Text-to-Speech for languages of India
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Multilingual Voice Understanding Model
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
Inference and training library for high-quality TTS models.
Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
A list of scripts/notebooks I'd like to keep handy
vits2 backbone with multilingual-bert
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.
Whisper realtime streaming for long speech-to-text transcription and translation
🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
indianajson / wav2lip-HD
Forked from ajay-sainy/Wav2Lip-GFPGANAlternative to Flawless AI's TrueSync. Make lips in video match provided audio using the power of Wav2Lip and GFPGAN.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.
Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.
ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing
AlwaysReddy is a LLM voice assistant that is always just a hotkey away.
Solve sudokus from video in real time with computer vision and neural networks