LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,561 172 Updated Nov 14, 2024

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

Python 2,050 281 Updated Nov 5, 2024

AI4Bharat / Indic-TTS

Text-to-Speech for languages of India

Jupyter Notebook 151 36 Updated Nov 8, 2024

FunAudioLLM / CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 6,305 673 Updated Nov 15, 2024

FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model

Python 3,434 311 Updated Oct 18, 2024

yl4579 / StyleTTS2

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 4,949 417 Updated Aug 10, 2024

huggingface / parler-tts

Inference and training library for high-quality TTS models.

Python 4,637 471 Updated Oct 30, 2024

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 34,393 4,241 Updated Nov 16, 2024

ylacombe / scripts_and_notebooks

A list of scripts/notebooks I'd like to keep handy

Jupyter Notebook 13 2 Updated Aug 15, 2024

fishaudio / Bert-VITS2

vits2 backbone with multilingual-bert

Python 8,002 1,133 Updated Nov 15, 2024

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 12,482 1,311 Updated Aug 21, 2024

sanchit-gandhi / whisper-jax

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

Jupyter Notebook 4,437 386 Updated Apr 3, 2024

ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Python 2,079 252 Updated Nov 15, 2024

KwaiVGI / LivePortrait

Bring portraits to life!

Python 12,963 1,378 Updated Nov 12, 2024

Inferencer / LipSick

🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮

Python 182 28 Updated Jul 16, 2024

indianajson / wav2lip-HD

Forked from ajay-sainy/Wav2Lip-GFPGAN

Alternative to Flawless AI's TrueSync. Make lips in video match provided audio using the power of Wav2Lip and GFPGAN.

Python 112 22 Updated Jul 14, 2024

Aman Rai 9throok

Highlights

Lists (9)

Audio LMs

lip-sync

LLMs

multimodal

stable-diffusion

talking heads

TTS

voice changer

whisper

Stars