-
TengXun
- Beijing
- https://github.com/settings/profile
Stars
Official codes of DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior
【三年面试五年模拟】算法工程师秘籍。涵盖AIGC、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、SLAM、具身智能、元宇宙、AGI等AI行业面试笔试经验与干货知识。
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code
[CVPR 2024] This is the official source for our paper "SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis"
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis; ICLR 2024 Spotlight; Official code
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Strong and Open Vision Language Assistant for Mobile Devices
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)
[CVPR2023] The implementation for "DiffTalk: Crafting Diffusion Models for Generalized Audio-Driven Portraits Animation"
Official repository for Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation
ModelScope: bring the notion of Model-as-a-Service to life.
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
HeadGAN - Official PyTorch Implementation (ICCV 2021)
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
MEAD: A Large-scale Audio-visual Dataset for Emotional Talking-face Generation [ECCV2020]
The source code of "DINet: deformation inpainting network for realistic face visually dubbing on high resolution video."
CVPR2023 talking face implementation for Identity-Preserving Talking Face Generation With Landmark and Appearance Priors
EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation