PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture in…

Python 27 4 Updated Oct 27, 2024

anliyuan / Ultralight-Digital-Human

一个超轻量级、可以在移动端实时运行的数字人模型

Python 796 134 Updated Nov 4, 2024

THUDM / GLM-4-Voice

GLM-4-Voice | 端到端中英语音对话模型

Python 2,106 169 Updated Oct 31, 2024

Winn1y / Awesome-Human-Motion-Video-Generation

Human Motion Video Generation: A Survey (https://www.techrxiv.org/users/836049/articles/1228135-human-motion-video-generation-a-survey)

87 4 Updated Nov 4, 2024

gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Python 1,482 177 Updated Nov 6, 2024

kyutai-labs / moshi

Python 6,651 506 Updated Oct 31, 2024

xinchen-ai / Westlake-Omni

Python 165 13 Updated Sep 24, 2024

pengzhendong / streaming-sensevoice

Pseudo Streaming SenseVoice with Hotwords

Python 73 11 Updated Nov 2, 2024

fudan-generative-vision / hallo2

Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

Python 3,507 492 Updated Nov 6, 2024

SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 6,669 773 Updated Nov 5, 2024

ToTheBeginning / PuLID

[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Python 2,547 178 Updated Nov 1, 2024

FireRedTeam / FireRedTTS

An Open-Sourced LLM-empowered Foundation TTS System

Python 415 29 Updated Oct 17, 2024

revdotcom / reverb

Open source inference code for Rev's model

Python 325 21 Updated Oct 28, 2024

yufan1012 / MonoGaussianAvatar

Python 108 3 Updated Sep 27, 2024

kyegomez / LiqudNet

Implementation of Liquid Nets in Pytorch

Python 51 7 Updated Nov 4, 2024

jdh-algo / JoyHallo

JoyHallo: Digital human model for Mandarin

Python 275 28 Updated Oct 8, 2024

usual2970 / certimate

开源的SSL证书管理工具，可以帮助你自动申请、部署SSL证书，并在证书即将过期时自动续期。An open-source SSL certificate management tool that helps you automatically apply for and deploy SSL certificates, as well as automatically renew them w…

TypeScript 4,627 411 Updated Nov 7, 2024

postech-ami / 3d-talking-head-av-guidance

Python 13 1 Updated Sep 25, 2024

postech-ami / MultiTalk

[INTERSPEECH'24] Official repository for "MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset"

Python 74 8 Updated Nov 5, 2024

damo-cv / RealisDance

The official implementation of RealisDance

C 221 13 Updated Nov 5, 2024

FuxiVirtualHuman / free_avatar

Jupyter Notebook 33 4 Updated Sep 11, 2024

ictnlp / LLaMA-Omni

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,524 168 Updated Sep 24, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,059 272 Updated Nov 5, 2024

Tony-Tan / CUDA_Freshman

Cuda 2,185 436 Updated Jan 16, 2024

xingchensong / S3Tokenizer

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 123 12 Updated Oct 12, 2024

Lovemefan lovemefan

Lists (22)

AI Other

ASR

avatar

dataset

Diffusion

✨ Inspiration

kws

Language Model

llm

Mindspore

Music

python code style

quantization

RUST

Singing Voice Synthesis

Speech Editing

SpeechEnhance

speechllm

super resolution

TTS

工具

微服务

Stars