Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
utils		utils
readme.md		readme.md
test.ipynb		test.ipynb

readme.md

A Microbenchmark for Talking-Face Synthesis

Dataset | Website

This repository contains the datasets and testing scripts for talking-face synthesis.

A microbenchmark serves as a valuable tool for researchers to conduct speedy evaluations of new algorithms. This repository can be easily customized and applied to diverse audio-visual talking-face datasets.

Datasets

In this benchmark, we collect 3 videos for English speakers and 3 videos for Chinese speakers.

File Structure

├── driving_audios
| ├── [9.3M] may_english_audio.aac
| ├── [3.3M] macron_english_trim_audio.aac
| ├── [3.5M] obama1_english_audio.aac
| ├── [780K] laoliang_chinese_50s_audio.mp3
| ├── [4.3M] luoxiang_chinese_audio.mp3
| ├── [8.9M] zuijiapaidang_chinese_audio.mp3
├── source_images
| ├── [294K] may.png
| ├── [202K] macron.png
| ├── [213M] obama1.png
| ├── [206K] zuijiapaidang.png
| ├── [175K] luoxiang.png
| ├── [204K] laoliang.png
├── reference_videos
│ ├── [56M] obama1_english.mp4, 03:38.16, 25fps, 450x450, 46 sentences
│ ├── [96M] may_english.mp4, 04:02.97, 25fps, 512x512, 35 sentences
│ ├── [24M] macron_english_trim.mp4, 00:03:31.92, 25fps, 512x512, 49 sentences
│ ├── [3.6M] laoliang_chinese_50s.mp4, 00:00:49.85, 30fps, 410x380, 40 sentences
│ ├── [14M] luoxiang_chinese.mp4, 04:40.01, 25fps, 350x500, 32 sentences
│ ├── [28M] zuijiapaidang_chinese.mp4, 09:41.98, 30fps, 460x450, 85 sentences

English Speakers
obama1_english.mp4	may_english.mp4	macron_english.mp4
<iframe src="https://drive.google.com/file/d/1g-T1nvL0KqBkInIRVSSbOvmC1LiCB36o/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1UMQZP7j8ORLJpHYiUMc-FexDp_SX7386/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1ReG45fm8wnz_a3ZJ3qOhPJGgS8LywKaS/preview"></iframe>
Chinese Speakers
laoliang_chinese.mp4	luoxiang_chinese.mp4	zuijiapaidang_chinese.mp4
<iframe src="https://drive.google.com/file/d/1jk9gX2R7KcD_Q2WF-zs7e2Es3lfKBCpK/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1d1haMYyA9mH0Wc1NgkEAuHtk30KpLJME/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1H-DhAj2K8EESbCUWvr6ylcUqKIFVJ94k/preview"></iframe>

Benchmark

To measure the performance of Wav2Lip and SadTalker, we run them on all videos and testing with the following metrics:

Sync↑: The confidence score from SyncNet (lip-sync);
PSNR↑: Peak signal-to-noise ratio (identity-preserving);
SSIM↑: Structural similarity for image (identity-preserving);
FID↓: Frchet inception distance (image quality);

Implementation (off-the-shelf tools)

Qualitative Results for One-shot Pipelines

English Speakers
obama1_Wav2Lip.mp4 PSNR: 32.287, SSIM: 0.951, FID: 18.993	may_Wav2Lip.mp4 PSNR: 32.572, SSIM: 0.936, FID: 33.941	macron_Wav2Lip.mp4 PSNR: 35.737, SSIM: 0.969, FID: 6.121
<iframe src="https://drive.google.com/file/d/159jlICcQEs5A-_bxnH752fjL49P4uzuw/preview"></iframe>	<iframe src="https://drive.google.com/file/d/195V0U8rjnce4aujAI2AZhpCwqKddXHGA/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1Z0bIbqmVgNdECxgYLedUPVpW6uwquE1z/preview"></iframe>
Chinese Speakers
laoliang_Wav2Lip.mp4 PSNR: 31.444, SSIM: 0.939, FID: 19.192	luoxiang_Wav2Lip.mp4 PSNR: 34.367, SSIM: 0.971, FID: 23.631	zuijiapaidang_Wav2Lip.mp4 PSNR: 20.364, SSIM: 0.783, FID: 49.04
<iframe src="https://drive.google.com/file/d/1SKfceJZ_142bETjqc-FyCtem-SSFlWI4/preview"></iframe>	<iframe src="https://drive.google.com/file/d/15Dt0-5rRbWiYDW4GuzfZGxK8ndjk2MOy/preview"></iframe>	<iframe src="https://drive.google.com/file/d/12iFMIexJkpG9dDmatfFD9yd-LG-bk1dw/preview"></iframe>

English Speakers
obama1_SadTalker.mp4 PSNR: 20.587, SSIM: 0.754, FID: 24.051	may_SadTalker.mp4 PSNR: 19.211, SSIM: 0.701, FID: 46.182	macron_SadTalker.mp4 PSNR: 18.729, SSIM: 0.763, FID: 98.982
<iframe src="https://drive.google.com/file/d/1xw0gsxCIGJOKpdAudHM1M5mc7qFaQnBv/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1wAFcDyK_Yma4pBHNQZAUJzWEzIsL6rS0/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1y8NmIkXmgCXYKXxJKAEhYwjsh1LSiTiq/preview"></iframe>
Chinese Speakers
laoliang_SadTalker.mp4 PSNR: 18.536, SSIM: 0.672, FID: 52.362	luoxiang_SadTalker.mp4 PSNR: 14.363, SSIM: 0.598, FID: 104.221	zuijiapaidang_SadTalker.mp4 PSNR: 17.359, SSIM: 0.725, FID: 4.781
<iframe src="https://drive.google.com/file/d/1i5fu_iYkg98a6vRvPw7tg8Z2mRvp4PV3/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1Ln5WBpa2PMWT0vDMfB0M_Una_o5j2QL3/preview"></iframe>	<iframe src="https://drive.google.com/file/d/1m8itAbvVVi5kx67_00mUo7vpTGs0gwpw/preview"></iframe>

Quantitative Results for One-shot Pipelines

English Speakers					Chinese Speakers
Pipeline	Sync↑	PSNR↑	SSIM↑	FID↓	Pipeline	Sync↑	PSNR↑	SSIM↓	FID↓
Wav2Lip	xxx	33.532	0.952	19.685	Wav2Lip	xxx	28.725	0.897	30.621
SadTalker	xxx	19.509	0.739	56.407	SadTaler	xxx	16.753	0.665	68.120

Because NeRF based renderers (GeneFace and ER-NeRF) are person-dependent, we train them on the first 3 minutes of marcon and zuijiapaidang respectively.

Qualitative Results for Few-shot Pipelines

English Speakers
marcon_GeneFace.mp4	macron_ER-NeRF.mp4
Chinese Speakers
zuijiapaidang_GeneFace.mp4	zuijiapaidang_ER-NeRF.mp4

Quantitative Results for Few-shot Pipelines

marcon (English)						zuijiapaidang (Chinese)
Pipeline	Sync↑	PSNR↑	SSIM↓	FID↓	IS↑	Pipeline	Sync↑	PSNR↑	SSIM↓	FID↓	IS↑
GeneFace	xxx	xxx	xxx	xxx	xxx	GeneFace	xxx	xxx	xxx	xxx	xxx
ER-NeRF	xxx	xxx	xxx	xxx	xxx	ER-NeRF	xxx	xxx	xxx	xxx	xxx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks

benchmarks

readme.md

A Microbenchmark for Talking-Face Synthesis

Dataset | Website

Datasets

File Structure

Benchmark

Implementation (off-the-shelf tools)

Qualitative Results for One-shot Pipelines

Quantitative Results for One-shot Pipelines

Qualitative Results for Few-shot Pipelines

Quantitative Results for Few-shot Pipelines

External Links

Files

benchmarks

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmarks

Folders and files

parent directory

readme.md

A Microbenchmark for Talking-Face Synthesis

Dataset | Website

Datasets

File Structure

Benchmark

Implementation (off-the-shelf tools)

Qualitative Results for One-shot Pipelines

Quantitative Results for One-shot Pipelines

Qualitative Results for Few-shot Pipelines

Quantitative Results for Few-shot Pipelines

External Links