This project is based on SoundSketchs to implement Wav2lip for video lip synthesis. By using video files to generate lip shapes driven by voice, and setting a configurable enhancement method for the facial area, the synthetic lip shape (face) area image enhancement is performed to improve the clarity of the generated lip shapes. Use the DAIN frame interpolation DL algorithm to add frames to the generated video to supplement the action transition of synthetic lip shapes between frames, making the synthesized lip shapes more smooth, realistic and natural.
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt
#If you want to use the DAIN model for frame patching, you need to install paddle.
# CUDA 11.2
python -m pip install paddlepaddle-gpu==2.3.2.post112 \
-f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html
SoundSketch-Video-LipSync
├──checkpoints
| ├──BFM_Fitting
| ├──DAIN_weight
| ├──hub
| ├── ...
├──dian_output
| ├── ...
├──examples
| ├── audio
| ├── video
├──results
| ├── ...
├──src
| ├── ...
├──sync_show
├──third_part
| ├── ...
├──...
├──inference.py
├──README.md
python inference.py --driven_audio <audio.wav> \
--source_video <video.mp4> \
--enhancer <none,lip,face> \ #(Default lip)
--use_DAIN \ #(Using this function will occupy a large amount of video memory and consume a lot of time.)
--time_step 0.5 #(Frame insertion frequency, default 0.5, that is, 25fps—>50fps; 0.25, that is, 25fps—>100fps)
#The synthesis effect is displayed in the ./sync_show directory:
#original.mp4 Original video
#sync_none.mp4 No enhanced synthesis effects
#none_dain_50fps.mp4 Add frames from 25fps to 50fps using DAIN model only
#lip_dain_50fps.mp4 Enhance the lip area to make the lip shape clearer + DAIN model adds frames from 25fps to 50fps
#face_dain_50fps.mp4 Enhance the entire face area to make the lip shape clearer + DAIN model adds frames from 25fps to 50fps
#The following is a video of the generation effects of different methods
#our.mp4 Video generated by SoundSketch-Video-LipSync in this project
#SoundSketch.mp4 full video generated by SoundSketch
#retalking.mp4 Video generated by retalking
#wav2lip.mp4 Video generated by wav2lip
lip_sync.mp4
When the videos are spliced together, the frame number is unified to 25fps. The effect of interpolating frames cannot be seen. For specific details, you can see the individual videos in the ./sync_show directory for comparison.
Comparison of the effects of this project with SoundSketch, retalking, and wav2lip lip synthesis:
our | SoundSketch |
---|---|
our_sync.mp4 |
sadtalker_sync.mp4 |
retalking | wav2lip |
retalking_sync.mp4 |
wa2lip_sync.mp4 |
The video displayed in the readme has been resized. The original video can be compared by viewing the synthesized videos of different categories in the ./sync_show directory.
The pretrained model looks like this:
├──checkpoints
| ├──BFM_Fitting
| ├──DAIN_weight
| ├──hub
| ├──auido2exp_00300-model.pth
| ├──auido2pose_00140-model.pth
| ├──epoch_20.pth
| ├──facevid2vid_00189-model.pth.tar
| ├──GFPGANv1.3.pth
| ├──GPEN-BFR-512.pth
| ├──mapping_00109-model.pth.tar
| ├──ParseNet-latest.pth
| ├──RetinaFace-R50.pth
| ├──shape_predictor_68_face_landmarks.dat
| ├──wav2lip.pth
Pre-trained model checkpoints download path:
Baidu Skydisk:https://pan.baidu.com/s/15-zjk64SGQnRT9qIduTe2A Extraction code:klfv
Google Drive:https://drive.google.com/file/d/1lW4mf5YNtS4MAD7ZkAauDDWp2N3_Qzs7/view?usp=sharing
Quark network disk:https://pan.quark.cn/s/2a1042b1d046 Extraction code:zMBP
#Download the compressed package and extract it to the project path (need to be executed when downloading Google Cloud Disk and Quark Cloud Disk)
cd SoundSketch-Video-LipSync
tar -zxvf checkpoints.tar.gz
- SadTalker-Video-Lip-Sync: https://github.com/Zz-ww/SadTalker-Video-Lip-Sync
- SADTalker: https://github.com/Winfredy/SADTalker
- VideoReTalking: https://github.com/vinthony/video-retalking
- DAIN: https://arxiv.org/abs/1904.00830
- PaddleGAN: https://github.com/PaddlePaddle/PaddleGAN