Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective

This repository is the official implementation for VFS introduced in the paper:

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Jiarui Xu, Xiaolong Wang
ICCV 2021 (Oral)

The project page with video is at https://jerryxu.net/VFS/.

UPDATE

- 2021.12.18: Solve some bugs when building the original dockerfile.

Citation

If you find our work useful in your research, please cite:

@article{xu2021rethinking,
  title={Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective},
  author={Xu, Jiarui and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2103.17263},
  year={2021}
}

Environmental Setup

Python 3.7
PyTorch 1.6-1.8
mmaction2
davis2017-evaluation
got10k

The codebase is implemented based on the awesome MMAction2, please follow the install instruction of MMAction2 to setup the environment.

Quick start full script:

conda create -n vfs python=3.7 -y
conda activate vfs
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
# install customized evaluation API for DAVIS
pip install git+https://github.com/xvjiarui/davis2017-evaluation
# install evaluation API for OTB
pip install got10k

# install VFS
git clone https://github.com/xvjiarui/VFS/
cd VFS
pip install -e .

We also provide the Dockerfile under docker/ folder. (~12.6G)

The code is developed and tested based on PyTorch 1.6-1.8. It also runs smoothly with PyTorch 1.9 but the accuracy is slightly worse for OTB evaluation. Please feel free to open a PR if you find the reason.

Model Zoo

Fine-grained correspondence

Backbone	Config	J&F-Mean	J-Mean	F-Mean	Download	Inference cmd
ResNet-18	cfg	66.7	64.0	69.5	pretrain ckpt	cmd `./tools/dist_test.sh configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth 1 --eval davis --options test_cfg.save_np=True`
ResNet-50	cfg	69.5	67.0	72.0	pretrain ckpt	cmd `./tools/dist_test.sh configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth 1 --eval davis --options test_cfg.save_np=True`

Note: We report the accuracy of the last block in res4, to evaluate all blocks, please pass --options test_cfg.all_blocks=True. The reproduced performance in this repo is slightly higher than reported in the paper.

Object-level correspondence

Backbone	Config	Precision	Success	Download	Inference cmd
ResNet-18	cfg	70.0	52.3	tracking ckpt	cmd `python projects/siamfc-pytorch/train_siamfc.py configs/r18_sgd_cos_100e_r2_1xNx8_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-e3b6a4bc.pth`
ResNet-50	cfg	73.9	52.5	tracking ckpt	cmd `python projects/siamfc-pytorch/train_siamfc.py configs/r50_sgd_cos_100e_r5_1xNx2_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r2_1xNx2_k400-b7fb2a38.pth --options out_scale=0.00001 out_channels=2048`

Note: We fine-tune an extra linear layer. The reproduced performance in this repo is slightly higher than reported in the paper.

Data Preparation

We use Kinetics-400 for self-supervised correspondence pretraining.

The fine-grained correspondence is evaluated on DAVIS2017 w/o any fine-tuning.

The object-level correspondence is evaluated on OTB-100 under linear probing setting (fine-tuning an extra linear layer).

The overall file structure is as followed:

vfs
├── mmaction
├── tools
├── configs
├── data
│   ├── kinetics400
│   │   ├── videos_train
│   │   │   ├── kinetics400_train_list_videos.txt
│   │   │   ├── train
│   │   │   │   ├── abseiling/
│   │   │   │   ├── air_drumming/
│   │   │   │   ├── ...
│   │   │   │   ├── yoga/
│   │   │   │   ├── zumba/
│   ├── davis
│   │   ├── DAVIS
│   │   │   ├── Annotations
│   │   │   │   ├── 480p
│   │   │   │   │   ├── bike-packing/
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── soapbox/
│   │   │   ├── ImageSets
│   │   │   │   ├── 2017/
│   │   │   │   ├── davis2017_val_list_rawframes.txt
│   │   │   ├── JPEGImages
│   │   │   │   ├── 480p
│   │   │   │   │   ├── bike-packing/
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── soapbox/
│   ├── otb
│   │   ├── Basketball/
│   │   ├── ...
│   │   ├── Woman/
│   ├── GOT-10k
│   │   ├── train
│   │   │   ├── GOT-10k_Train_000001/
│   │   │   ├── ...
│   │   │   ├── GOT-10k_Train_009335/

The instructions for preparing each dataset are as followed.

Kinetics-400

Please follow the documentation here to prepare the Kinetics-400. The dataset could be downloaded from kinetics-dataset.

DAVIS2017

DAVIS2017 dataset could be downloaded from the official website. We use the 480p validation set for evaluation.

# download data
wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip
# download filelist
wget https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/davis2017_val_list_rawframes.txt
# unzip data
unzip DAVIS-2017-trainval-480p.zip

Then please unzip and place them according to the file structure above.

OTB-100

The OTB-100 frames and annotations will be downloaded automatically.

GOT-10k

repo:https://github.com/got-10k/toolkit

paper:https://arxiv.org/pdf/1810.11981.pdf

GOT-10k dataset could be downloaded from the official website.

Then please unzip and place them according to the file structure above.

Links to download full_data.zip (choose the link that works best for you):

Baiduyun Disk: https://pan.baidu.com/s/15iXqOEBj99S8-VTpmsLiOg
Google Drive: https://drive.google.com/file/d/1b75MBq7MbDQUc682IoECIekoRim_Ydk1/view?usp=sharing

Links to download val_data.zip (choose the link that works best for you):

Baiduyun Disk: https://pan.baidu.com/s/1wj9AbST0HC2aCnXjy7aTng
Google Drive: https://drive.google.com/file/d/1ZJJZfftL_EEU61TwyHPhLmtZP7nq5QR7/view?usp=sharing

Links to download test_data.zip (choose the link that works best for you):

Baiduyun Disk: https://pan.baidu.com/s/1ygo7CPzNjbhlgjLyl4ANZg
Google Drive: https://drive.google.com/file/d/1Ni7W2r7_nojaQhVNMlya9szknrmKkkRq/view?usp=sharing

@article{2021,
   title={GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild},
   volume={43},
   ISSN={1939-3539},
   url={http:https://dx.doi.org/10.1109/TPAMI.2019.2957464},
   DOI={10.1109/tpami.2019.2957464},
   number={5},
   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
   publisher={Institute of Electrical and Electronics Engineers (IEEE)},
   author={Huang, Lianghua and Zhao, Xin and Huang, Kaiqi},
   year={2021},
   month={May},
   pages={1562–1577}
}

JHMDB

Please find the split txt file for JHMDB here:https://github.com/Liusifei/UVC/blob/jhmdb/testlist_split1.txt. The validation list for VIP is VIP_Fine/lists/val_videos.txt from your download data.

Run Experiments

Pretrain

./tools/dist_train.sh ${CONFIG} ${GPUS}

We use 2 and 8 GPUs for ResNet-18 and ResNet-50 models respectively.

Official: 2 GPUs for ResNet-18 and 8 GPUs for ResNet-50.

On RTX3090, ResNet-50 takes 1 week to train 500 epochs. ResNet-18 takes 2 days to train 100 epochs.

Inference

To run the following inference and evaluation, we need to convert the pretrained checkpoint into the same format as torchvision ResNet.

python tools/convert_weights/convert_to_pretrained.py ${PRETRAIN_CHECKPOINT} ${BACKBONE_WEIGHT}

Evaluate fine-grained correspondence on DAVIS2017

./tools/dist_test.sh ${CONFIG} ${BACKBONE_WEIGHT} ${GPUS}  --eval davis

You may pass --options test_cfg.save_np=True to save memory.

Inference cmd examples:

# testing r18 model
./tools/dist_test.sh configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth 1  --eval davis --options test_cfg.save_np=True
# testing r50 model
./tools/dist_test.sh configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth 1  --eval davis --options test_cfg.save_np=True

Evaluate object-level correspondence

ResNet-18:

 python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --pretrained ${BACKBONE_WEIGHT}

ResNet-50:

 python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --pretrained ${BACKBONE_WEIGHT} --options out_scale=0.00001 out_channels=2048

The results will be saved in work_dirs/${CONFIG}/siamfc.

To inference with provided tracking checkpoints:

 python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --checkpoint ${TRACKING_CHECKPOINT}

Inference cmd examples:

# testing r18 model
python projects/siamfc-pytorch/train_siamfc.py configs/r18_sgd_cos_100e_r2_1xNx8_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-e3b6a4bc.pth
# testing r50 model
python projects/siamfc-pytorch/train_siamfc.py configs/r50_sgd_cos_100e_r5_1xNx2_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r5_1xNx2_k400-b7fb2a38.pth --options out_scale=0.00001 out_channels=2048

Other

calculate the frame similarity matrix between two video frames https://github.com/xvjiarui/VFS/blob/master/mmaction/models/trackers/vanilla_tracker.py#L150

Acknowledgements

The codebase is based on MMAction2. The fine-grained correspondence inference and evaluation follows TimeCycle, UVC and videowalk. The object-level correspondence inference and evaluation is based on SiamFC-PyTorch and vince.

Thank you all for the great open source repositories!

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
configs		configs
demo		demo
docker		docker
docs		docs
figs		figs
mmaction		mmaction
projects/siamfc-pytorch		projects/siamfc-pytorch
requirements		requirements
tests		tests
tools		tools
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective

UPDATE

Citation

Environmental Setup

Model Zoo

Fine-grained correspondence

Object-level correspondence

Data Preparation

Kinetics-400

DAVIS2017

OTB-100

GOT-10k

JHMDB

Run Experiments

Pretrain

Inference

Evaluate fine-grained correspondence on DAVIS2017

Evaluate object-level correspondence

Other

Acknowledgements

About

Releases

Packages

Languages

License

JerryX1110/VFS

Folders and files

Latest commit

History

Repository files navigation

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective

UPDATE

Citation

Environmental Setup

Model Zoo

Fine-grained correspondence

Object-level correspondence

Data Preparation

Kinetics-400

DAVIS2017

OTB-100

GOT-10k

JHMDB

Run Experiments

Pretrain

Inference

Evaluate fine-grained correspondence on DAVIS2017

Evaluate object-level correspondence

Other

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages