MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding

Codes for MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding (CVPR 2023), a block-wised video compression algorithm.

Introduction and Framework

In this paper, we propose multi-mode video compression (MMVC), a block-wise mode ensemble deep video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns.

Proposed multi-modes include ConvLSTM-based feature domain prediction, optical flow conditioned feature domain prediction, and feature propagation to address a wide range of cases from static scenes without apparent motions to dynamic scenes with a moving camera. We partition the feature space into blocks for temporal prediction in spatial block-based representations.

We consider both dense and sparse post-quantization residual blocks for entropy coding, and apply optional run-length coding to sparse residuals to improve the compression rate.

Performance

RD trade-off curve

MMVC achieves state-of-the-art performance on benchmark datasets

Reconstruction Visualization

Details of the static background and dynamic objects are well preserved. Compared with HEVC, our result yields fewer block artifacts preserving finer details.

Visualization of Multiple Predictions

The decoded scenes are obtained from the predicted features without residual. By adopting multiple prediction modes that complement each other, our prediction is able to cover content variety in the original frame with a shorter bitstream.

Ablation Study

Effectiveness of Optical Flow conditioned Prediction

The residual between the raw frame and the wrapped frame with optical flow information is minor when the motion is slow, indicating that the optical flow based prediction mode works well with some static frames.

Mapping between pixel domain and latent domain

The spatial correlation between the pixel domain and latent space is straightforward. The reconstructions of the divided latent are stitched into a single frame, and the difference between the reconstruction and raw frame is small, showing the effectiveness of block-wised video compression in latent space.

Datasets

In order to use the datasets used in the paper, please download the UVG dataset, the Kinetics dataset, and the UVG dataset.

The UVG and Kinetics datasets are used for training the prediction network.
The Kinetics and UVG datasets are implemented for testing the performance.
Note that we use the learning-based image compression algorithm (Liu et al) as the intra-compression for one single frame.
The latent is used as the optimal latent for each frame.

Arithmetic Coding

To use the entropy coding method in this paper, download the general code library in Python with arithmetic coding. (The code is not provided in this repository due to the license conflict.)

Train a new model

Please download the optical flow model into the Optical_flow_master folder, and change the path in the corresponding file. The method that we used in this paper is the RAFT algorithm.

To train a model with optical flow conditioned prediction,

$ python train_new_model.py

To test the result with the trained model

$ python Testing_new.py

Citation

Please cite our paper if you find our paper useful for your research. MMVC

@inproceedings{liu2023mmvc, title={MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding}, author={Liu, Bowen and Chen, Yu and Machineni, Rakesh Chowdary and Liu, Shiyu and Kim, Hun-Seok}, booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, pages={18487--18496}, year={2023} }

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
Images		Images
Optical_flow_master		Optical_flow_master
__pycache__		__pycache__
configs		configs
models		models
CLIC_auto_dataset.py		CLIC_auto_dataset.py
CLIC_dataset.py		CLIC_dataset.py
Kodak_testin_auto_encoder.py		Kodak_testin_auto_encoder.py
LICENSE		LICENSE
Meter.py		Meter.py
RAFT_test.py		RAFT_test.py
README.md		README.md
Testing.py		Testing.py
Testing_new.py		Testing_new.py
VTL_gain_unit.py		VTL_gain_unit.py
VTL_testing_detector.py		VTL_testing_detector.py
VTL_unchanged_blocks.py		VTL_unchanged_blocks.py
convert_to_png.py		convert_to_png.py
datasets.py		datasets.py
metric.py		metric.py
model_auto_encoder.py		model_auto_encoder.py
model_org_raft_new_mod.py		model_org_raft_new_mod.py
model_org_sim_entr.py		model_org_sim_entr.py
model_org_sim_entr_detector.py		model_org_sim_entr_detector.py
model_org_sim_entr_gain.py		model_org_sim_entr_gain.py
model_rev_raft.py		model_rev_raft.py
model_rev_raft_new_mod.py		model_rev_raft_new_mod.py
model_rev_raft_sim_entr.py		model_rev_raft_sim_entr.py
nohup.out		nohup.out
readme.txt		readme.txt
train_new_model.py		train_new_model.py
train_new_model_auto_encoder.py		train_new_model_auto_encoder.py
train_rev.py		train_rev.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding

Introduction and Framework

Performance

RD trade-off curve

Reconstruction Visualization

Visualization of Multiple Predictions

Ablation Study

Effectiveness of Optical Flow conditioned Prediction

Mapping between pixel domain and latent domain

Datasets

Arithmetic Coding

Train a new model

Citation

About

Releases

Packages

Languages

License

BowenL0218/MMVC_video_codec

Folders and files

Latest commit

History

Repository files navigation

MMVC: Learned Multi-Mode Video Compression with Block-based Prediction Mode Selection and Density-Adaptive Entropy Coding

Introduction and Framework

Performance

RD trade-off curve

Reconstruction Visualization

Visualization of Multiple Predictions

Ablation Study

Effectiveness of Optical Flow conditioned Prediction

Mapping between pixel domain and latent domain

Datasets

Arithmetic Coding

Train a new model

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages