Make It Move: Controllable Image-to-Video Generation with Text Descriptions

This repository contains datasets and source code used in the CVPR'2022 paper ``Make It Move: Controllable Image-to-Video Generation with Text Descriptions".

Update

We improved MAGE with a more prowerful autoencoder and a controller over VAE. The code and models of the improved version, MAGE+, have been released at google drive.
We proposed two no-reference evaluation metrics, action precision and referring expression precision, to evaluate the precision of fine-grained motions based on a captioning-and-matching method. (We chose SwinBERT as the captioning model. Please download the trained model on CATER-GENs at google drive and put it under 'metrics/swinbert_cater'.)

$ docker run --gpus all --ipc=host --rm -it --mount src=/home/user/SwinBERT/,dst=/videocap,type=bind --mount src=/home/user/,dst=/home/user/,type=bind -w /videocap linjieli222/videocap_torch1.7:fairscale bash -c "source /videocap/setup.sh && bash"
$ python metrics/swinbert_cater/eval_precision_run_caption_VidSwinBert.py --do_lower_case --do_test --eval_model_dir ./metrics/swinbert_cater/ --test_video_fname /home/results/

$ python eval_precision.py --data-root /home/user/datasets/CATER-GEN-v1 --gen-caption /home/user/results/catergenv1_diverse/generated_captions.json --mode ambiguous

Dataset Generation

Moving MNIST datasets

The scripts to generate Moving MNIST datasets are modified based on Sync-DRAW. You can run the following commands to generate Single Moving MNIST, Double Moving MNIST and our Modified Double Moving MNIST, respectively.

$ python data/mnist_caption_single.py
$ python data/mnist_caption_double.py
$ python data/mnist_caption_double_modified.py

CATER-GENs

Datasets Download

The original CATER-GEN-v1 and CATER-GEN-v2 used in our paper are provided at link1 and link2, respectively.

Create Your Own Datasets

Thanks to authors of CATER and CLEVR for making their code available, you can also generate your own datasets as following.

First, please generate videos and metadata according to the guideline of CATER. Please change the hyper-parameters including min_objects, max_objects, num_frames, num_images, width, height, and fix CAM_MOTION = False, start_frame = 0. Then, you can generate text descriptions by running:

$ python data/gen_cater_text_anno.py

MAGE

There are two stages training in our proposed baseline, MAGE. The first stage is to train a VQ-VAE encoder and decoder. The second stage is to train the remaining video generation model. The trained models are provided at google drive.

Environment

Our code has been tested on Ubuntu 18.04. Before starting, please configure your Anaconda environment by

$ conda create -n mage python=3.8
$ conda activate mage
$ pip install -r requirements.txt

Stage 1. VQ-VAE Training

$ python train_vqvae.py --dataset mnist --data-root /data/data_file --output-folder ./models/vqvae_model_file

Stage 2. MAGE Training

$ python main_mage.py --split train --config config/model.yaml --checkpoint-path ./models/MAGE/model_path

Sampling

$ python main_mage.py --split test --config config/model.yaml --checkpoint-path ./models/MAGE/model_path

Citation

If you find this repository useful in your research then please cite

@InProceedings{hu2022mage,
    title={Make It Move: Controllable Image-to-Video Generation with Text Descriptions},
    author={Yaosi Hu and Chong Luo and Zhenzhong Chen},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
config		config
data		data
examples		examples
modules		modules
utils		utils
LICENSE		LICENSE
README.md		README.md
dataload.py		dataload.py
eval_precision.py		eval_precision.py
main_mage.py		main_mage.py
requirements.txt		requirements.txt
train_vqvae.py		train_vqvae.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Make It Move: Controllable Image-to-Video Generation with Text Descriptions

Update

Dataset Generation

Moving MNIST datasets

CATER-GENs

Datasets Download

Create Your Own Datasets

MAGE

Environment

Stage 1. VQ-VAE Training

Stage 2. MAGE Training

Sampling

Citation

About

Releases

Packages

Languages

License

Youncy-Hu/MAGE

Folders and files

Latest commit

History

Repository files navigation

Make It Move: Controllable Image-to-Video Generation with Text Descriptions

Update

Dataset Generation

Moving MNIST datasets

CATER-GENs

Datasets Download

Create Your Own Datasets

MAGE

Environment

Stage 1. VQ-VAE Training

Stage 2. MAGE Training

Sampling

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages