GitHub - AkashCS/dreamtalk: Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

DreamTalk: When Expressive Talking Head Generation
Meets Diffusion Probabilistic Models

DreamTalk is a diffusion-based audio-driven expressive talking head generation framework that can produce high-quality talking head videos across diverse speaking styles. DreamTalk exhibits robust performance with a diverse array of inputs, including songs, speech in multiple languages, noisy audio, and out-of-domain portraits.

News

[2023.12] Release inference code and pretrained checkpoint.

Installation

conda create -n dreamtalk python=3.7.0
conda activate dreamtalk
pip install -r requirements.txt
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
conda update ffmpeg

pip install urllib3==1.26.6
pip install transformers==4.28.1
pip install dlib

Download Checkpoints

In light of the social impact, we have ceased public download access to checkpoints. If you want to obtain the checkpoints, please request it by emailing [email protected] . It is important to note that sending this email implies your consent to use the provided method solely for academic research purposes.

Put the downloaded checkpoints into checkpoints folder.

Inference

Run the script:

python inference_for_demo_video.py \
--wav_path data/audio/acknowledgement_english.m4a \
--style_clip_path data/style_clip/3DMM/M030_front_neutral_level1_001.mat \
--pose_path data/pose/RichardShelby_front_neutral_level1_001.mat \
--image_path data/src_img/uncropped/male_face.png \
--cfg_scale 1.0 \
--max_gen_len 30 \
--output_name acknowledgement_english@M030_front_neutral_level1_001@male_face

wav_path specifies the input audio. The input audio file extensions such as wav, mp3, m4a, and mp4 (video with sound) should all be compatible.

style_clip_path specifies the reference speaking style and pose_path specifies head pose. They are 3DMM parameter sequences extracted from reference videos. You can follow PIRenderer to extract 3DMM parameters from your own videos. Note that the video frame rate should be 25 FPS. Besides, videos used for head pose reference should be first cropped to $256\times256$ using scripts in FOMM video preprocessing.

image_path specifies the input portrait. Its resolution should be larger than

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
checkpoints		checkpoints
configs		configs
core		core
data		data
generators		generators
media		media
output_video		output_video
tmp		tmp
LICENSE		LICENSE
README.md		README.md
inference_for_demo_video.py		inference_for_demo_video.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamTalk: When Expressive Talking Head Generation
Meets Diffusion Probabilistic Models

News

Installation

Download Checkpoints

Inference

License

AkashCS/dreamtalk

Folders and files

Latest commit

History

Repository files navigation

DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models

News

Installation

Download Checkpoints

Inference

DreamTalk: When Expressive Talking Head Generation
Meets Diffusion Probabilistic Models