Skip to content
/ VFS Public
forked from xvjiarui/VFS

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective, in ICCV 2021 (Oral)

License

Notifications You must be signed in to change notification settings

JerryX1110/VFS

 
 

Repository files navigation

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective

This repository is the official implementation for VFS introduced in the paper:

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective
Jiarui Xu, Xiaolong Wang
ICCV 2021 (Oral)

The project page with video is at https://jerryxu.net/VFS/.

UPDATE

- 2021.12.18: Solve some bugs when building the original dockerfile.

Citation

If you find our work useful in your research, please cite:

@article{xu2021rethinking,
  title={Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective},
  author={Xu, Jiarui and Wang, Xiaolong},
  journal={arXiv preprint arXiv:2103.17263},
  year={2021}
}

Environmental Setup

  • Python 3.7
  • PyTorch 1.6-1.8
  • mmaction2
  • davis2017-evaluation
  • got10k

The codebase is implemented based on the awesome MMAction2, please follow the install instruction of MMAction2 to setup the environment.

Quick start full script:

conda create -n vfs python=3.7 -y
conda activate vfs
conda install pytorch==1.8.0 torchvision==0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.8.0/index.html
# install customized evaluation API for DAVIS
pip install git+https://github.com/xvjiarui/davis2017-evaluation
# install evaluation API for OTB
pip install got10k

# install VFS
git clone https://github.com/xvjiarui/VFS/
cd VFS
pip install -e .

We also provide the Dockerfile under docker/ folder. (~12.6G)

The code is developed and tested based on PyTorch 1.6-1.8. It also runs smoothly with PyTorch 1.9 but the accuracy is slightly worse for OTB evaluation. Please feel free to open a PR if you find the reason.

Model Zoo

Fine-grained correspondence

Backbone Config J&F-Mean J-Mean F-Mean Download Inference cmd
ResNet-18 cfg 66.7 64.0 69.5 pretrain ckpt
cmd./tools/dist_test.sh configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth 1 --eval davis --options test_cfg.save_np=True
ResNet-50 cfg 69.5 67.0 72.0 pretrain ckpt
cmd./tools/dist_test.sh configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth 1 --eval davis --options test_cfg.save_np=True

Note: We report the accuracy of the last block in res4, to evaluate all blocks, please pass --options test_cfg.all_blocks=True. The reproduced performance in this repo is slightly higher than reported in the paper.

Object-level correspondence

Backbone Config Precision Success Download Inference cmd
ResNet-18 cfg 70.0 52.3 tracking ckpt
cmdpython projects/siamfc-pytorch/train_siamfc.py configs/r18_sgd_cos_100e_r2_1xNx8_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-e3b6a4bc.pth
ResNet-50 cfg 73.9 52.5 tracking ckpt
cmdpython projects/siamfc-pytorch/train_siamfc.py configs/r50_sgd_cos_100e_r5_1xNx2_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r2_1xNx2_k400-b7fb2a38.pth --options out_scale=0.00001 out_channels=2048

Note: We fine-tune an extra linear layer. The reproduced performance in this repo is slightly higher than reported in the paper.

Data Preparation

We use Kinetics-400 for self-supervised correspondence pretraining.

The fine-grained correspondence is evaluated on DAVIS2017 w/o any fine-tuning.

The object-level correspondence is evaluated on OTB-100 under linear probing setting (fine-tuning an extra linear layer).

The overall file structure is as followed:

vfs
├── mmaction
├── tools
├── configs
├── data
│   ├── kinetics400
│   │   ├── videos_train
│   │   │   ├── kinetics400_train_list_videos.txt
│   │   │   ├── train
│   │   │   │   ├── abseiling/
│   │   │   │   ├── air_drumming/
│   │   │   │   ├── ...
│   │   │   │   ├── yoga/
│   │   │   │   ├── zumba/
│   ├── davis
│   │   ├── DAVIS
│   │   │   ├── Annotations
│   │   │   │   ├── 480p
│   │   │   │   │   ├── bike-packing/
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── soapbox/
│   │   │   ├── ImageSets
│   │   │   │   ├── 2017/
│   │   │   │   ├── davis2017_val_list_rawframes.txt
│   │   │   ├── JPEGImages
│   │   │   │   ├── 480p
│   │   │   │   │   ├── bike-packing/
│   │   │   │   │   ├── ...
│   │   │   │   │   ├── soapbox/
│   ├── otb
│   │   ├── Basketball/
│   │   ├── ...
│   │   ├── Woman/
│   ├── GOT-10k
│   │   ├── train
│   │   │   ├── GOT-10k_Train_000001/
│   │   │   ├── ...
│   │   │   ├── GOT-10k_Train_009335/

The instructions for preparing each dataset are as followed.

Kinetics-400

Please follow the documentation here to prepare the Kinetics-400. The dataset could be downloaded from kinetics-dataset.

DAVIS2017

DAVIS2017 dataset could be downloaded from the official website. We use the 480p validation set for evaluation.

# download data
wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip
# download filelist
wget https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/davis2017_val_list_rawframes.txt
# unzip data
unzip DAVIS-2017-trainval-480p.zip

Then please unzip and place them according to the file structure above.

OTB-100

The OTB-100 frames and annotations will be downloaded automatically.

GOT-10k

image

image

repo:https://github.com/got-10k/toolkit

paper:https://arxiv.org/pdf/1810.11981.pdf

GOT-10k dataset could be downloaded from the official website.

Then please unzip and place them according to the file structure above.

  • Links to download full_data.zip (choose the link that works best for you):
  • Links to download val_data.zip (choose the link that works best for you):
  • Links to download test_data.zip (choose the link that works best for you):
@article{2021,
   title={GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild},
   volume={43},
   ISSN={1939-3539},
   url={http:https://dx.doi.org/10.1109/TPAMI.2019.2957464},
   DOI={10.1109/tpami.2019.2957464},
   number={5},
   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
   publisher={Institute of Electrical and Electronics Engineers (IEEE)},
   author={Huang, Lianghua and Zhao, Xin and Huang, Kaiqi},
   year={2021},
   month={May},
   pages={1562–1577}
}

JHMDB

Please find the split txt file for JHMDB here:https://github.com/Liusifei/UVC/blob/jhmdb/testlist_split1.txt. The validation list for VIP is VIP_Fine/lists/val_videos.txt from your download data.

Run Experiments

Pretrain

./tools/dist_train.sh ${CONFIG} ${GPUS}

We use 2 and 8 GPUs for ResNet-18 and ResNet-50 models respectively.

Official: 2 GPUs for ResNet-18 and 8 GPUs for ResNet-50.

On RTX3090, ResNet-50 takes 1 week to train 500 epochs. ResNet-18 takes 2 days to train 100 epochs.

Inference

To run the following inference and evaluation, we need to convert the pretrained checkpoint into the same format as torchvision ResNet.

python tools/convert_weights/convert_to_pretrained.py ${PRETRAIN_CHECKPOINT} ${BACKBONE_WEIGHT}

Evaluate fine-grained correspondence on DAVIS2017

./tools/dist_test.sh ${CONFIG} ${BACKBONE_WEIGHT} ${GPUS}  --eval davis

You may pass --options test_cfg.save_np=True to save memory.

Inference cmd examples:

# testing r18 model
./tools/dist_test.sh configs/r18_nc_sgd_cos_100e_r2_1xNx8_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_nc_sgd_cos_100e_r2_1xNx8_k400-db1a4c0d.pth 1  --eval davis --options test_cfg.save_np=True
# testing r50 model
./tools/dist_test.sh configs/r50_nc_sgd_cos_100e_r5_1xNx2_k400.py https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_nc_sgd_cos_100e_r5_1xNx2_k400-d7ce3ad0.pth 1  --eval davis --options test_cfg.save_np=True

Evaluate object-level correspondence

ResNet-18:

 python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --pretrained ${BACKBONE_WEIGHT}

ResNet-50:

 python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --pretrained ${BACKBONE_WEIGHT} --options out_scale=0.00001 out_channels=2048

The results will be saved in work_dirs/${CONFIG}/siamfc.

To inference with provided tracking checkpoints:

 python projects/siamfc-pytorch/train_siamfc.py ${CONFIG} --checkpoint ${TRACKING_CHECKPOINT}

Inference cmd examples:

# testing r18 model
python projects/siamfc-pytorch/train_siamfc.py configs/r18_sgd_cos_100e_r2_1xNx8_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r18_sgd_cos_100e_r2_1xNx8_k400-e3b6a4bc.pth
# testing r50 model
python projects/siamfc-pytorch/train_siamfc.py configs/r50_sgd_cos_100e_r5_1xNx2_k400.py --checkpoint https://github.com/xvjiarui/VFS/releases/download/v0.1-rc1/r50_sgd_cos_100e_r5_1xNx2_k400-b7fb2a38.pth --options out_scale=0.00001 out_channels=2048

Other

calculate the frame similarity matrix between two video frames https://github.com/xvjiarui/VFS/blob/master/mmaction/models/trackers/vanilla_tracker.py#L150

Acknowledgements

The codebase is based on MMAction2. The fine-grained correspondence inference and evaluation follows TimeCycle, UVC and videowalk. The object-level correspondence inference and evaluation is based on SiamFC-PyTorch and vince.

Thank you all for the great open source repositories!

About

Rethinking Self-Supervised Correspondence Learning: A Video Frame-level Similarity Perspective, in ICCV 2021 (Oral)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 97.2%
  • Shell 2.5%
  • Dockerfile 0.3%