This is the repository of the paper "HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation" (NeurIPS 2023).
By Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower and David Bull
accelerate==0.23.0
deepspeed==0.11.1
pytorch-msssim==1.0.0
timm==0.9.7
torch==2.1.0
torchac==0.9.3
torchvision==0.16.0
This implementation requires first converting the videos into PNGs. For example, you can use FFMpeg:
mkdir video
ffmpeg -video_size 1920x1080 -pixel_format yuv420p -i video.yuv video/%4d.png
The UVG dataset can be downloaded here.
To train HiNeRV-S with the UVG dataset (ReadySetGo sequence) and the video compression setting:
dataset_dir=~/Datasets/UVG/1920x1080
dataset_name=ReadySetGo
output=~/Models/HiNeRV
train_cfg=$(cat "cfgs/train/hinerv_1920x1080.txt")
model_cfg=$(cat "cfgs/models/uvg-hinerv-s_1920x1080.txt")
accelerate launch --mixed_precision=fp16 --dynamo_backend=inductor hinerv_main.py \
--dataset ${dataset_dir} --dataset-name ${dataset_name} --output ${output} \
${train_cfg} ${model_cfg} --batch-size 144 --eval-batch-size 1 --grad-accum 1 --log-eval false --seed 0
The output will be saved into a new folder in the output path, e.g. ~/Models/HiNeRV/HiNeRV-20231030-032238-133f0dfc. This path can be used for resuming training/loading bitstream directly.
Please note that the batch size refers to the number of patches. Make sure to adjust it accordingly if you have changed the patch size (see 'cfgs/train/hinerv_1920x1080.txt' for details). While the original configuration uses 120x120 patches, using larger patches (e.g., 'cfgs/train/hinerv_1920x1080_480x360.txt' uses 480x360 patches) reduces overhead but may slightly impact performance.
To save the model outputs into images, set --log-eval to true.
To evaluate with the compressed bitstream:
dataset_dir=~/Datasets/UVG/1920x1080
dataset_name=ReadySetGo
output=~/Models/HiNeRV
train_cfg=$(cat "cfgs/train/hinerv_1920x1080.txt")
model_cfg=$(cat "cfgs/models/uvg-hinerv-s_1920x1080.txt")
checkpoint_path=~/Models/HiNeRV/HiNeRV-20231030-032238-133f0dfc
accelerate launch --mixed_precision=fp16 --dynamo_backend=inductor hinerv_main.py \
--dataset ${dataset_dir} --dataset-name ${dataset_name} --output ${output} \
${train_cfg} ${model_cfg} --batch-size 144 --eval-batch-size 1 --grad-accum 1 --log-eval false --seed 0 \
--bitstream ${checkpoint_path} --bitstream-q 6 --eval-only
checkpoint_path is the path of the trained model directory, and bitstream-q is the quantization level.
To train HiNeRV-S with the 37 epochs setting (no pruning/quantization):
dataset_dir=~/Datasets/UVG/1920x1080
dataset_name=ReadySetGo
output=~/Models/HiNeRV
train_cfg=$(cat "cfgs/train/hinerv_1920x1080_37e_no-compress.txt")
model_cfg=$(cat "cfgs/models/uvg-hinerv-s_1920x1080.txt")
accelerate launch --mixed_precision=fp16 --dynamo_backend=inductor hinerv_main.py \
--dataset ${dataset_dir} --dataset-name ${dataset_name} --output ${output} \
${train_cfg} ${model_cfg} --batch-size 144 --eval-batch-size 1 --grad-accum 1 --log-eval false --seed 0
This implementation has slightly improved average performance compared to the original one. The results for both the original and this version will be provided in the 'results' folder.
Part of the implementation is based on the code from PyTorch Image Models and HNeRV.
Please consider citing our work if you find that it is useful.
@inproceedings{
author = {Ho Man Kwan and Ge Gao and Fan Zhang and Andrew Gower and David Bull},
title = {HiNeRV: Video Compression with Hierarchical Encoding-based Neural Representation},
booktitle = {NeurIPS},
year = {2023}
}