Skip to content

mjavadpur/SoundSketch_vidoe_lipSync

Repository files navigation

Open In Colab

SoundSketch-Video-LipSync

This project is based on SoundSketchs to implement Wav2lip for video lip synthesis. By using video files to generate lip shapes driven by voice, and setting a configurable enhancement method for the facial area, the synthetic lip shape (face) area image enhancement is performed to improve the clarity of the generated lip shapes. Use the DAIN frame interpolation DL algorithm to add frames to the generated video to supplement the action transition of synthetic lip shapes between frames, making the synthesized lip shapes more smooth, realistic and natural.

1.Environmental preparation(Environment)

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
conda install ffmpeg
pip install -r requirements.txt

#If you want to use the DAIN model for frame patching, you need to install paddle.
# CUDA 11.2
python -m pip install paddlepaddle-gpu==2.3.2.post112 \
-f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

2.Project structure(Repository structure)

SoundSketch-Video-LipSync
├──checkpoints
|   ├──BFM_Fitting
|   ├──DAIN_weight
|   ├──hub
|   ├── ...
├──dian_output
|   ├── ...
├──examples
|   ├── audio
|   ├── video
├──results
|   ├── ...
├──src
|   ├── ...
├──sync_show
├──third_part
|   ├── ...
├──...
├──inference.py
├──README.md

3.Model reasoning(Inference)

python inference.py --driven_audio <audio.wav> \
                    --source_video <video.mp4> \
                    --enhancer <none,lip,face> \  #(Default lip)
                    --use_DAIN \ #(Using this function will occupy a large amount of video memory and consume a lot of time.)
             		--time_step 0.5 #(Frame insertion frequency, default 0.5, that is, 25fps—>50fps; 0.25, that is, 25fps—>100fps)

4.Synthetic effects(Results)

#The synthesis effect is displayed in the ./sync_show directory:
#original.mp4 Original video
#sync_none.mp4 No enhanced synthesis effects
#none_dain_50fps.mp4 Add frames from 25fps to 50fps using DAIN model only
#lip_dain_50fps.mp4 Enhance the lip area to make the lip shape clearer + DAIN model adds frames from 25fps to 50fps
#face_dain_50fps.mp4 Enhance the entire face area to make the lip shape clearer + DAIN model adds frames from 25fps to 50fps

#The following is a video of the generation effects of different methods
#our.mp4 Video generated by SoundSketch-Video-LipSync in this project
#SoundSketch.mp4 full video generated by SoundSketch
#retalking.mp4 Video generated by retalking
#wav2lip.mp4 Video generated by wav2lip
lip_sync.mp4

When the videos are spliced together, the frame number is unified to 25fps. The effect of interpolating frames cannot be seen. For specific details, you can see the individual videos in the ./sync_show directory for comparison.

Comparison of the effects of this project with SoundSketch, retalking, and wav2lip lip synthesis:

our SoundSketch
our_sync.mp4
sadtalker_sync.mp4
retalking wav2lip
retalking_sync.mp4
wa2lip_sync.mp4

The video displayed in the readme has been resized. The original video can be compared by viewing the synthesized videos of different categories in the ./sync_show directory.

5.Pre-trained model(Pretrained model)

The pretrained model looks like this:

├──checkpoints
|   ├──BFM_Fitting
|   ├──DAIN_weight
|   ├──hub
|   ├──auido2exp_00300-model.pth
|   ├──auido2pose_00140-model.pth
|   ├──epoch_20.pth
|   ├──facevid2vid_00189-model.pth.tar
|   ├──GFPGANv1.3.pth
|   ├──GPEN-BFR-512.pth
|   ├──mapping_00109-model.pth.tar
|   ├──ParseNet-latest.pth
|   ├──RetinaFace-R50.pth
|   ├──shape_predictor_68_face_landmarks.dat
|   ├──wav2lip.pth

Pre-trained model checkpoints download path:

Baidu Skydisk:https://pan.baidu.com/s/15-zjk64SGQnRT9qIduTe2A Extraction code:klfv

Google Drive:https://drive.google.com/file/d/1lW4mf5YNtS4MAD7ZkAauDDWp2N3_Qzs7/view?usp=sharing

Quark network disk:https://pan.quark.cn/s/2a1042b1d046 Extraction code:zMBP

#Download the compressed package and extract it to the project path (need to be executed when downloading Google Cloud Disk and Quark Cloud Disk)
cd SoundSketch-Video-LipSync
tar -zxvf checkpoints.tar.gz

References:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages