Skip to content

shehel/RaMViD

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RaMViD

This code and README is based on the codebase from https://github.com/openai/improved-diffusion

Installation

Import and create the enroot container

$ enroot import docker:https://nvcr.io#nvidia/pytorch:21.04-py3
$ enroot create --name container_name nvidia+pytorch+21.04-py3.sqsh

and run in addition

pip install torch
pip install tqdm
pip install blobfile>=0.11.0
pip install mpi4py
pip install matplotlib
pip install av 

Preparing Data

Our dataloader can handle videos in the .gif, .mp4 or .av format. Create a folder with your data and simply pass --data_dir path/to/videos to the training script

Training

Similar to the original code baes, will split up our hyperparameters into three groups: model architecture, diffusion process, and training flags.

Kinetics-600:

MODEL_FLAGS="--image_size 64 --num_channels 128 --num_res_blocks 3 --scale_time_dim 0";
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear";
TRAIN_FLAGS="--lr 2e-5 --batch_size 8 --microbatch 2 --seq_len 16 --max_num_mask_frames 4 --uncondition_rate 0.25";

BAIR:

MODEL_FLAGS="--image_size 64 --num_channels 128 --num_res_blocks 2 --scale_time_dim 0";
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear";
TRAIN_FLAGS="--lr 2e-5 --batch_size 4 --microbatch 2 --seq_len 20 --max_num_mask_frames 4 --uncondition_rate 0.25";

UCF-101:

MODEL_FLAGS="--image_size 64 --num_channels 128 --num_res_blocks 3 --scale_time_dim 0";
DIFFUSION_FLAGS="--diffusion_steps 1000 --noise_schedule linear";
TRAIN_FLAGS="--lr 2e-5 --batch_size 8 --microbatch 2 --seq_len 16 --max_num_mask_frames 4 --uncondition_rate 0.75";

Once you have setup your hyper-parameters, you can run an experiment like so:

python scripts/video_train.py --data_dir path/to/videos $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

You may also want to train in a distributed manner. In this case, run the same command with mpirun:

mpirun -n $NUM_GPUS python scripts/video_train.py --data_dir path/to/videos $MODEL_FLAGS $DIFFUSION_FLAGS $TRAIN_FLAGS

When training in a distributed manner, you must manually divide the --batch_size argument by the number of ranks. In lieu of distributed training, you may use --microbatch 16 (or --microbatch 1 in extreme memory-limited cases) to reduce memory usage.

The logs and saved models will be written to a logging directory determined by the OPENAI_LOGDIR environment variable. If it is not set, then a temporary directory will be created in /tmp.

Sampling

The above training script saves checkpoints to .pt files in the logging directory. These checkpoints will have names like ema_0.9999_200000.pt and model200000.pt. You will likely want to sample from the EMA models, since those produce much better samples.

Once you have a path to your model, you can generate a large batch of samples like so:

python scripts/video_sample.py --model_path /path/to/model.pt $MODEL_FLAGS $DIFFUSION_FLAGS

Again, this will save results to a logging directory. Samples are saved as a large npz file, where arr_0 in the file is a large batch of samples.

Just like for training, you can run image_sample.py through MPI to use multiple GPUs and machines.

You can change the number of sampling steps using the --timestep_respacing argument. For example, --timestep_respacing 250 uses 250 steps to sample.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 93.4%
  • Python 6.6%