A Generalized Framework for Video Instance Segmentation (CVPR 2023)

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

[arXiv] [BibTeX]

Updates

Feb 28, 2023: GenVIS is accepted to CVPR 2023!
Jan 20, 2023: Code is now available!

Installation

GenVIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_genvis.py, that is made to train all the configs provided in GenVIS.

To train a model with "train_net_genvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets.

Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  MODEL.WEIGHTS vita_r50_ovis.pth

To evaluate a model's performance, use

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Additional weights will be updated soon!

YouTubeVIS-2019

Backbone	Method	AP	AP50	AP75	AR1	AR10	Download
R-50	online	50.0	71.5	54.6	49.5	59.7	model
R-50	semi-online	51.3	72.0	57.8	49.5	60.0	model
Swin-L	online	64.0	84.9	68.3	56.1	69.4	model
Swin-L	semi-online	63.8	85.7	68.5	56.3	68.4	model

YouTubeVIS-2021

Backbone	Method	AP	AP50	AP75	AR1	AR10	Download
R-50	online	47.1	67.5	51.5	41.6	54.7	model
R-50	semi-online	46.3	67.0	50.2	40.6	53.2	model
Swin-L	online	59.6	80.9	65.8	48.7	65.0	model
Swin-L	semi-online	60.1	80.9	66.5	49.1	64.7	model

OVIS

Backbone	Method	AP	AP50	AP75	AR1	AR10	Download
R-50	online	35.8	60.8	36.2	16.3	39.6	model
R-50	semi-online	34.5	59.4	35.0	16.6	38.3	model
Swin-L	online	45.2	69.1	48.4	19.1	48.6	model
Swin-L	semi-online	45.4	69.2	47.8	18.9	49.0	model

License

The majority of GenVIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), and VITA(Apache-2.0 License).

Citing GenVIS

If you use GenVIS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{GenVIS,
  title={A Generalized Framework for Video Instance Segmentation},
  author={Heo, Miran and Hwang, Sukjun and Hyun, Jeongseok and Kim, Hanjung and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={CVPR},
  year={2023}
}

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, Deformable DETR, and VITA. We are truly grateful for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
configs		configs
demo		demo
genvis		genvis
mask2former		mask2former
vita		vita
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_net_genvis.py		train_net_genvis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Generalized Framework for Video Instance Segmentation (CVPR 2023)

Updates

Installation

Getting Started

Model Zoo

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing GenVIS

Acknowledgement

About

Releases

Packages

Languages

License

fbragman/ModGenVIS

Folders and files

Latest commit

History

Repository files navigation

A Generalized Framework for Video Instance Segmentation (CVPR 2023)

Updates

Installation

Getting Started

Model Zoo

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing GenVIS

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages