Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
ControlVideo adapts ControlNet to the video counterpart without any finetuning, aiming to directly inherit its high-quality and consistent generation
- [07/16/2023] Add HuggingFace demo!
- [07/11/2023] Support ControlNet 1.1 based version!
- [05/28/2023] Thank chenxwh, add a Replicate demo!
- [05/25/2023] Code ControlVideo released!
- [05/23/2023] Paper ControlVideo released!
All pre-trained weights are downloaded to checkpoints/
directory, including the pre-trained weights of Stable Diffusion v1.5, ControlNet 1.0 conditioned on canny edges, depth maps, human poses, and ControlNet 1.1 in here.
The flownet.pkl
is the weights of RIFE.
The final file tree likes:
checkpoints
├── stable-diffusion-v1-5
├── sd-controlnet-canny
├── sd-controlnet-depth
├── sd-controlnet-hed
├── control_v11p_sd15_lineart
├── flownet.pkl
conda create -n controlvideo python=3.10
conda activate controlvideo
pip install -r requirements.txt
Note: xformers
is recommended to save memory and running time. controlnet-aux
is updated to version 0.0.6.
In order to use one_trajectory.py or run_control.net, you need to first set the logger.prefix
to the path of the input video.
Used to generate a video from a single trajectory input video. The input video should first be on the server where logger.prefix
is set to.
inference.simple for just taking an input video and generating one video. inference.main for taking an input video and generating 5 samples for the trajectory input video.
This work repository borrows heavily from Diffusers, ControlNet, Tune-A-Video, and RIFE. The code of HuggingFace demo borrows from fffiloni/ControlVideo. Thanks for their contributions!
There are also many interesting works on video generation: Tune-A-Video, Text2Video-Zero, Follow-Your-Pose, Control-A-Video, et al.