GitHub - zhang-tao-whu/CTVIS_VIT

CTVIS: Consistent Training for Online Video Instance Segmentation

Kaining Ying^1,2*, Qing Zhong^4*, Weian Mao⁴, Zhenhua Wang^3#, Hao Chen^1#

Lin Yuanbo Wu⁵, Yifan Liu⁴, Chenxiang Fan¹, Yunzhi Zhuge⁴, Chunhua Shen¹

¹Zhejiang University, ²Zhejiang University of Technology

³Northwest A&F University, ⁴The University of Adelaide, ⁵Swansea University

📰 News

[2023/06/18] CTVIS wins 2nd Place in Pixel-level Video Understanding Challenge (VPS Track) at CVPR2023.
[2023/07/14] Our work CTVIS is accepted by ICCV 2023! Congrats! ✌️
[2023/07/24] ~~We will release the code ASAP. Stay tuned!~~
[2023/07/31] We release the code and weights on YTVIS19_R50.
[2023/08/24] CTVIS wins the 2nd Place in The 5th Large-scale Video Object Segmentation Challenge - Track 2: Video Instance Segmentation at ICCV 2023.

🔨 Install

Here we provide the command lines to build conda environment.

conda create -n ctvis python=3.10 -y 
conda activate ctvis
pip install torch==2.0.0 torchvision  

# install D2
git clone https://gitee.com/yingkaining/detectron2.git
python -m pip install -e detectron2

# install mmcv
pip install openmim
mim install "mmcv==1.7.1"

pip install -r requirements.txt

cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../

🏀 Dataset Preparation

We recommend that you use the following format to organize the dataset format and refer to this for more details.

$DETECTRON2_DATASETS
+-- coco
|   |
|   +-- annotations
|   |   |
|   |   +-- instances_{train,val}2017.json
|   |   +-- coco2ytvis2019_train.json
|   |   +-- coco2ytvis2021_train.json
|   |   +-- coco2ovis_train.json
|   |
|   +-- {train,val}2017
|       |
|       +-- *.jpg
|
+-- ytvis_2019
|   ...
|
+-- ytvis_2021
|   ...
|
+-- ovis
    ...

It is worthwhile to note that annotations coco2ytvis2019_train.json, coco2ytvis2021_train.json and coco2ovis_train.json are post-processing from following command:

python tools/convert_coco2ytvis.py

If you want to visualize the dataset, you can use the following script (YTVIS19):

python browse_datasets.py ytvis_2019_train --save-dir /path/to/save/dir

⚾️ Training and Evaluation

Training

We use the weights of Mask2Former pretrained on MS-COCO as initional. You should download them first and place them in the checkpoints/.

Mask2Former-R50-COCO: Official Download Link

Mask2Former-SwinL-COCO: Official Download Link

Next you can train CTVIS, for example on YTVIS19 using R50.

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50

Evaluation

Typically during training, the model is evaluated on the validation set periodically. I can also evaluate the model separately, like this:

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --eval-only --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50 MODEL.WEIGHTS /path/to/model/weight/file

You can download the model weights in Model Zoo. Finally, we need to submit the submission files to the CodaLab to get the AP. We recommend using following scripts to push the submission to CodaLab. We appeariate this project for providing such useful feature.

python tools/codalab_upload.py --result-dir /path/to/your/submission/dir --id ytvis19 --account your_codalab_account_email --password your_codalab_account_password

Demo and Visualization

We support inference on specified videos (demo/demo.py) as well as visualization of all videos in a given dataset (demo/visualize_all_videos.py).

# demo
python demo/demo.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --video-input --output /path/to/save/output --save-frames --opts MODEL.WEIGHTS /path/to/your/checkpoint

💽 Model Zoo

YouTube-VIS 2019

Model	Backbone	AP	AP50	AP75	AR1	AR10	Link
CTVIS	ResNet-50	55.2	79.5	60.2	51.3	63.7	1Drive
CTVIS	Swin-L (200 queries)	65.6	87.7	72.2	56.5	70.4

YouTube-VIS 2021

Model	Backbone	AP	AP50	AP75	AR1	AR10	Link
CTVIS	ResNet-50	50.1	73.7	54.7	41.8	59.5
CTVIS	Swin-L (200 queries)	61.2	84	68.8	48	65.8

YouTube-VIS 2022

Note: YouTube-VIS 2022 shares the same training set as YouTube-VIS 2021.

Model	Backbone	AP	APS	APL	Link
CTVIS	ResNet-50	44.9	50.3	39.4
CTVIS	Swin-L (200 queries)	53.8	61.2	46.4

OVIS

Model	Backbone	AP	AP50	AP75	AR1	AR10	Link
CTVIS	ResNet-50	35.5	60.8	34.9	16.1	41.9
CTVIS	Swin-L (200 queries)	46.9	71.5	47.5	19.1	52.1

🫡 Acknowledgements

We sincerely appreciate HIGH-FLYER for providing the valuable computational resources. At the same time, we would like to express our gratitude to the following open source projects for their inspirations:

🪪 Lincese

The content of this project itself is licensed under LICENSE.

📇 Cite our Paper

If you found this project useful for your paper, please kindly cite our paper.

@misc{ying2023ctvis,
      title={{CTVIS}: {C}onsistent {T}raining for {O}nline {V}ideo {I}nstance {S}egmentation}, 
      author={Kaining Ying and Qing Zhong and Weian Mao and Zhenhua Wang and Hao Chen and Lin Yuanbo Wu and Yifan Liu and Chengxiang Fan and Yunzhi Zhuge and Chunhua Shen},
      year={2023},
      eprint={2307.12616},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CTVIS: Consistent Training for Online Video Instance Segmentation

📰 News

🔨 Install

🏀 Dataset Preparation

⚾️ Training and Evaluation

Training

Evaluation

Demo and Visualization

💽 Model Zoo

YouTube-VIS 2019

YouTube-VIS 2021

YouTube-VIS 2022

OVIS

🫡 Acknowledgements

🪪 Lincese

📇 Cite our Paper

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
assets		assets
configs		configs
ctvis		ctvis
demo		demo
mask2former		mask2former
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_ctvis.py		train_ctvis.py

License

zhang-tao-whu/CTVIS_VIT

Folders and files

Latest commit

History

Repository files navigation

CTVIS: Consistent Training for Online Video Instance Segmentation

📰 News

🔨 Install

🏀 Dataset Preparation

⚾️ Training and Evaluation

Training

Evaluation

Demo and Visualization

💽 Model Zoo

YouTube-VIS 2019

YouTube-VIS 2021

YouTube-VIS 2022

OVIS

🫡 Acknowledgements

🪪 Lincese

📇 Cite our Paper

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages