LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 2,558 172 Updated Nov 14, 2024

yerfor / GeneFace

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

Python 2,551 295 Updated Oct 18, 2024

ZiqiaoPeng / SyncTalk

[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"

Python 1,330 161 Updated Aug 28, 2024

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 3,084 277 Updated Nov 5, 2024

yerfor / Real3DPortrait

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code

Python 947 109 Updated Oct 18, 2024

AIGC-Audio / AudioGPT

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Python 10,024 862 Updated Jul 6, 2024

HumanAIGC / EMO

Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

7,500 913 Updated Aug 21, 2024

Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices

Python 1,039 66 Updated Apr 15, 2024

Hangz-nju-cuhk / Talking-Face_PC-AVS

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Python 925 168 Updated Jan 6, 2024

sstzal / DiffTalk

[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"

Python 454 42 Updated Jul 15, 2024

MStypulkowski / diffused-heads

Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Python 468 33 Updated Apr 15, 2024

modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Python 7,009 719 Updated Nov 14, 2024

modelscope / facechain

FaceChain is a deep-learning toolchain for generating your Digital-Twin.

Jupyter Notebook 9,034 850 Updated Nov 4, 2024

TMElyralab / MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Python 2,832 354 Updated Nov 15, 2024

michaildoukas / headGAN

HeadGAN - Official PyTorch Implementation (ICCV 2021)

Python 70 6 Updated Aug 4, 2023

Rudrabha / Wav2Lip

This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs

Python 10,790 2,293 Updated Oct 30, 2024

uniBruce / Mead

MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]

Python 246 28 Updated Jul 7, 2024

MRzzm / DINet

The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."

Python 992 175 Updated Sep 25, 2023

Weizhi-Zhong / IP_LAP

CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors

Python 693 79 Updated Jan 6, 2024

kleinlee / DH_live

每个人都能用的数字人

Python 693 151 Updated Nov 6, 2024

antgroup / echomimic

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

Python 2,933 341 Updated Nov 14, 2024

OpenTalker / video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Python 6,648 977 Updated Aug 5, 2024

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 9,483 1,302 Updated Sep 14, 2024

guoyww / AnimateDiff

Official implementation of AnimateDiff.

Python 10,586 872 Updated Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BigCowPeking BigcowPeking

Block or report BigcowPeking

Stars

XPixelGroup / DiffBIR

WeThinkIn / Interview-for-Algorithm-Engineer

fudan-generative-vision / hallo2

anliyuan / Ultralight-Digital-Human

gpt-omni / mini-omni2

jishengpeng / WavTokenizer

ictnlp / LLaMA-Omni