VDT

[ICLR2024] The official implementation of paper "VDT: General-purpose Video Diffusion Transformers via Mask Modeling", by Haoyu Lu, Guoxing Yang, Nanyi Fei, Yuqi Huo, Zhiwu Lu, Ping Luo, Mingyu Ding.

Introduction

This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers in diffusion-based video generation. It features transformer blocks with modularized temporal and spatial attention modules, allowing separate optimization of each component and leveraging the rich spatial-temporal representation inherited from transformers.

VDT offers several appealing benefits. (1) It excels at capturing temporal dependencies to produce temporally consistent video frames and even simulate the dynamics of 3D objects over time. (2) It enables flexible conditioning information through simple concatenation in the token space, effectively unifying video generation and prediction tasks. (3) Its modularized design facilitates a spatial-temporal decoupled training strategy, leading to improved efficiency.

Extensive experiments on video generation, prediction, and dynamics modeling (i.e., physics-based QA) tasks have been conducted to demonstrate the effectiveness of VDT in various scenarios, including autonomous driving, human action, and physics-based simulation.

Getting Started

Python3, PyTorch>=1.8.0, torchvision>=0.7.0 are required for the current codebase.

To install the other dependencies, run

conda env create -f environment.yml

conda activate VDT

Checkpoint

We now provide Physion_Collide checkpoint for Physion-Collision conditional generation (video prediction). You can download it from here .

Inference

We provide inference script on physion_collide video prediction. To sample results, you can first download the checkpoint, then run:

python physion_sample.py --ckpt $CHECKPOINT_PATH

We also provide a simple demo in inference_physion.ipynb, have fun!

Acknowledgement

Our codebase is built based on DiT, SlotFormer and MVCD. We thank the authors for the nicely organized code!

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
conditional_frames		conditional_frames
diffusion		diffusion
results		results
LICENSE.txt		LICENSE.txt
README.md		README.md
VDT.png		VDT.png
environment.yml		environment.yml
example.png		example.png
inference_physion.ipynb		inference_physion.ipynb
models.py		models.py
physion_sample.py		physion_sample.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VDT

Introduction

Getting Started

Checkpoint

Inference

We also provide a simple demo in inference_physion.ipynb, have fun!

Acknowledgement

About

Releases

Packages

Languages

License

movingforward100/VDT

Folders and files

Latest commit

History

Repository files navigation

VDT

Introduction

Getting Started

Checkpoint

Inference

We also provide a simple demo in inference_physion.ipynb, have fun!

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages