Learning Better Video Query with SAM for Video Instance Segmentation (TCSVT 2024)

Hao Fang, Tong Zhang, Xiaofei Zhou, Xinxin Zhang

[paper] [BibTeX]

Installation

Getting Started

We provide a script train_net.py, that is made to train all the configs provided in LBVQ.

To train a model with "train_net.py" on VIS, first setup the corresponding datasets following Preparing Datasets for LBVQ.

Then run with COCO pretrained weights in the Model Zoo:

python train_net.py --num-gpus 8 \
  --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  MODEL.WEIGHTS mask2former_r50_coco.pkl

To evaluate a model's performance, use

python train_net.py \
  --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS lbvq_r50_ytvis19.pth

If you want to use SAM to refine your results, use

python train_net.py \
  --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS lbvq_r50_ytvis19.pth SAM True

To visualize a video in the dataset, use

python demo_lbvq/demo.py --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  --input datasets/ytvis_2019/valid/JPEGImages/xxxxxxx/*.jpg \
  --output output/demo --save-frames True \
  --opts MODEL.WEIGHTS lbvq_r50_ytvis2019.pth

Model Zoo

Pretrained weights on COCO

Name	R-50	R-101
Mask2Former	model	model

HQ-SAM

Name	vit_h
HQ-SAM	model

YouTubeVIS-2019

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
LBVQ	R-50	52.2	74.8	57.7	49.9	59.8	model
LBVQ	R-101	53.1	76.3	60.2	50.0	59.2	model

YouTubeVIS-2021

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
LBVQ	R-50	44.8	67.4	46.0	41.6	52.3	model

OVIS

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
LBVQ	R-50	22.2	45.3	19.0	12.4	27.5	model

License

The majority of LBVQ is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), and VITA(Apache-2.0 License).

Citing LBVQ

If you use LBVQ in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{Fang2024learning,
  title={Learning Better Video Query with SAM for Video Instance Segmentation},
  author={Fang, Hao and Zhang, Tong and Zhou, Xiaofei and Zhang, Xinxin},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2024},
  publisher={IEEE}
}

Acknowledgement

Our code is largely based on Detectron2, Mask2Former, and VITA. We are truly grateful for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
datasets		datasets
demo_lbvq		demo_lbvq
lbvq		lbvq
mask2former		mask2former
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LBVQ.png		LBVQ.png
LICENSE		LICENSE
README.md		README.md
convert_coco2ytvis.py		convert_coco2ytvis.py
requirements.txt		requirements.txt
train_net.py		train_net.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Better Video Query with SAM for Video Instance Segmentation (TCSVT 2024)

Installation

Getting Started

Model Zoo

Pretrained weights on COCO

HQ-SAM

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing LBVQ

Acknowledgement

About

Releases

Packages

Languages

License

fanghaook/LBVQ

Folders and files

Latest commit

History

Repository files navigation

Learning Better Video Query with SAM for Video Instance Segmentation (TCSVT 2024)

Installation

Getting Started

Model Zoo

Pretrained weights on COCO

HQ-SAM

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing LBVQ

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages