Skip to content
View 9throok's full-sized avatar

Highlights

  • Pro

Block or report 9throok

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Inference code for the paper "Spirit-LM Interleaved Spoken and Written Language Model".

Python 776 50 Updated Oct 28, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,303 158 Updated Jun 25, 2024

Multi-modal conversational AI (xRx) system

Python 245 40 Updated Nov 9, 2024

GeneFace++: Generalized and Stable Real-Time 3D Talking Face Generation; Official Code

Python 1,564 225 Updated Oct 18, 2024

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 2,833 354 Updated Nov 15, 2024
Python 6,744 525 Updated Oct 31, 2024

AI powered speech denoising and enhancement

Python 1,427 142 Updated Nov 5, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,561 172 Updated Nov 14, 2024

A nearly-live implementation of OpenAI's Whisper.

Python 2,050 281 Updated Nov 5, 2024

Text-to-Speech for languages of India

Jupyter Notebook 151 36 Updated Nov 8, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 6,305 673 Updated Nov 15, 2024

Multilingual Voice Understanding Model

Python 3,434 311 Updated Oct 18, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 4,949 417 Updated Aug 10, 2024

Inference and training library for high-quality TTS models.

Python 4,637 471 Updated Oct 30, 2024

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 34,393 4,241 Updated Nov 16, 2024

A list of scripts/notebooks I'd like to keep handy

Jupyter Notebook 13 2 Updated Aug 15, 2024

vits2 backbone with multilingual-bert

Python 8,002 1,133 Updated Nov 15, 2024

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 12,482 1,311 Updated Aug 21, 2024

JAX implementation of OpenAI's Whisper model for up to 70x speed-up on TPU.

Jupyter Notebook 4,437 386 Updated Apr 3, 2024

Whisper realtime streaming for long speech-to-text transcription and translation

Python 2,079 252 Updated Nov 15, 2024

Bring portraits to life!

Python 12,963 1,378 Updated Nov 12, 2024

🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮

Python 182 28 Updated Jul 16, 2024

Alternative to Flawless AI's TrueSync. Make lips in video match provided audio using the power of Wav2Lip and GFPGAN.

Python 112 22 Updated Jul 14, 2024

Silero VAD: pre-trained enterprise-grade Voice Activity Detector

Python 4,363 425 Updated Nov 13, 2024

Talking Head (3D): A JavaScript class for real-time lip-sync using Ready Player Me full-body 3D avatars.

JavaScript 344 105 Updated Nov 10, 2024

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

TypeScript 66,984 3,674 Updated Nov 16, 2024

ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing

Python 65 6 Updated May 20, 2024

A fast, local neural text to speech system

C++ 6,561 477 Updated Oct 21, 2024

AlwaysReddy is a LLM voice assistant that is always just a hotkey away.

Python 663 67 Updated Nov 14, 2024

Solve sudokus from video in real time with computer vision and neural networks

Python 117 19 Updated Jul 13, 2019
Next