Skip to content

zhang-tao-whu/CTVIS_VIT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTVIS: Consistent Training for Online Video Instance Segmentation

Kaining Ying1,2*,   Qing Zhong4*,   Weian Mao4,   Zhenhua Wang3#,   Hao Chen1#

Lin Yuanbo Wu5,   Yifan Liu4,   Chenxiang Fan1,   Yunzhi Zhuge4,   Chunhua Shen1

1Zhejiang University,   2Zhejiang University of Technology

3Northwest A&F University,   4The University of Adelaide,   5Swansea University

📰 News

  • [2023/06/18] CTVIS wins 2nd Place in Pixel-level Video Understanding Challenge (VPS Track) at CVPR2023.
  • [2023/07/14] Our work CTVIS is accepted by ICCV 2023! Congrats! ✌️
  • [2023/07/24] We will release the code ASAP. Stay tuned!
  • [2023/07/31] We release the code and weights on YTVIS19_R50.
  • [2023/08/24] CTVIS wins the 2nd Place in The 5th Large-scale Video Object Segmentation Challenge - Track 2: Video Instance Segmentation at ICCV 2023.

🔨 Install

Here we provide the command lines to build conda environment.

conda create -n ctvis python=3.10 -y 
conda activate ctvis
pip install torch==2.0.0 torchvision  

# install D2
git clone https://gitee.com/yingkaining/detectron2.git
python -m pip install -e detectron2

# install mmcv
pip install openmim
mim install "mmcv==1.7.1"

pip install -r requirements.txt

cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../

🏀 Dataset Preparation

We recommend that you use the following format to organize the dataset format and refer to this for more details.

$DETECTRON2_DATASETS
+-- coco
|   |
|   +-- annotations
|   |   |
|   |   +-- instances_{train,val}2017.json
|   |   +-- coco2ytvis2019_train.json
|   |   +-- coco2ytvis2021_train.json
|   |   +-- coco2ovis_train.json
|   |
|   +-- {train,val}2017
|       |
|       +-- *.jpg
|
+-- ytvis_2019
|   ...
|
+-- ytvis_2021
|   ...
|
+-- ovis
    ...

It is worthwhile to note that annotations coco2ytvis2019_train.json, coco2ytvis2021_train.json and coco2ovis_train.json are post-processing from following command:

python tools/convert_coco2ytvis.py

If you want to visualize the dataset, you can use the following script (YTVIS19):

python browse_datasets.py ytvis_2019_train --save-dir /path/to/save/dir

⚾️ Training and Evaluation

Training

We use the weights of Mask2Former pretrained on MS-COCO as initional. You should download them first and place them in the checkpoints/.

Mask2Former-R50-COCO: Official Download Link

Mask2Former-SwinL-COCO: Official Download Link

Next you can train CTVIS, for example on YTVIS19 using R50.

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50

Evaluation

Typically during training, the model is evaluated on the validation set periodically. I can also evaluate the model separately, like this:

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --eval-only --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50 MODEL.WEIGHTS /path/to/model/weight/file

You can download the model weights in Model Zoo. Finally, we need to submit the submission files to the CodaLab to get the AP. We recommend using following scripts to push the submission to CodaLab. We appeariate this project for providing such useful feature.

python tools/codalab_upload.py --result-dir /path/to/your/submission/dir --id ytvis19 --account your_codalab_account_email --password your_codalab_account_password

Demo and Visualization

We support inference on specified videos (demo/demo.py) as well as visualization of all videos in a given dataset (demo/visualize_all_videos.py).

# demo
python demo/demo.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --video-input --output /path/to/save/output --save-frames --opts MODEL.WEIGHTS /path/to/your/checkpoint

💽 Model Zoo

YouTube-VIS 2019

Model Backbone AP AP50 AP75 AR1 AR10 Link
CTVIS ResNet-50 55.2 79.5 60.2 51.3 63.7 1Drive
CTVIS Swin-L (200 queries) 65.6 87.7 72.2 56.5 70.4

YouTube-VIS 2021

Model Backbone AP AP50 AP75 AR1 AR10 Link
CTVIS ResNet-50 50.1 73.7 54.7 41.8 59.5
CTVIS Swin-L (200 queries) 61.2 84 68.8 48 65.8

YouTube-VIS 2022

Note: YouTube-VIS 2022 shares the same training set as YouTube-VIS 2021.

Model Backbone AP APS APL Link
CTVIS ResNet-50 44.9 50.3 39.4
CTVIS Swin-L (200 queries) 53.8 61.2 46.4

OVIS

Model Backbone AP AP50 AP75 AR1 AR10 Link
CTVIS ResNet-50 35.5 60.8 34.9 16.1 41.9
CTVIS Swin-L (200 queries) 46.9 71.5 47.5 19.1 52.1

🫡 Acknowledgements

We sincerely appreciate HIGH-FLYER for providing the valuable computational resources. At the same time, we would like to express our gratitude to the following open source projects for their inspirations:

🪪 Lincese

The content of this project itself is licensed under LICENSE.

📇 Cite our Paper

If you found this project useful for your paper, please kindly cite our paper.

@misc{ying2023ctvis,
      title={{CTVIS}: {C}onsistent {T}raining for {O}nline {V}ideo {I}nstance {S}egmentation}, 
      author={Kaining Ying and Qing Zhong and Weian Mao and Zhenhua Wang and Hao Chen and Lin Yuanbo Wu and Yifan Liu and Chengxiang Fan and Yunzhi Zhuge and Chunhua Shen},
      year={2023},
      eprint={2307.12616},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 90.9%
  • Cuda 8.1%
  • Other 1.0%