-
Zhejiang University, Harbin Institute of Technology
- Shanghai
Stars
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
[CVPR'24] DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
[NeurIPS 2024] Official implementation of MambaAD: Exploring State Space Models for Multi-class Unsupervised Anomaly Detection.
[ICRA23] Efficient Implicit Neural Reconstruction Using LiDAR
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
[NeurIPS 2023] FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Large Motion Model for Unified Multi-Modal Motion Generation
Towards Variable and Coordinated Holistic Co-Speech Motion Generation, CVPR 2024
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
A deep neural network that directly reconstructs the motion of a 3D human skeleton from monocular video [ToG 2020]
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"
MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model
Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness (ICASSP 2024)
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019)
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
We present MocapNET, a real-time method that estimates the 3D human pose directly in the popular Bio Vision Hierarchy (BVH) format, given estimations of the 2D body joints originating from monocula…
[CVPR 2024] Official Implementation of "Seamless Human Motion Composition with Blended Positional Encodings".
This repository contains an example script to convert from a SMPL model to a bvh file.
Pytorch implementation of our paper MaxQ: Multi-Axis Query for N:M Sparsity Network accepted by CVPR 2024.
Denoising Diffusion Probabilistic Models
[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs
Official implementation of "TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts (ECCV2022)"
Resource, Evaluation and Detection Papers for ChatGPT
Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity (SIGGRAPH Asia 2020)
Erase specific content from the video that you don't wanna see