Skip to content
/ LBVQ Public

Learning Better Video Query with SAM for Video Instance Segmentation (TCSVT 2024)

License

Notifications You must be signed in to change notification settings

fanghaook/LBVQ

Repository files navigation

Learning Better Video Query with SAM for Video Instance Segmentation (TCSVT 2024)

Hao Fang, Tong Zhang, Xiaofei Zhou, Xinxin Zhang

[paper] [BibTeX]


Installation

See installation instructions.

Getting Started

We provide a script train_net.py, that is made to train all the configs provided in LBVQ.

To train a model with "train_net.py" on VIS, first setup the corresponding datasets following Preparing Datasets for LBVQ.

Then run with COCO pretrained weights in the Model Zoo:

python train_net.py --num-gpus 8 \
  --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  MODEL.WEIGHTS mask2former_r50_coco.pkl

To evaluate a model's performance, use

python train_net.py \
  --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS lbvq_r50_ytvis19.pth

If you want to use SAM to refine your results, use

python train_net.py \
  --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS lbvq_r50_ytvis19.pth SAM True

To visualize a video in the dataset, use

python demo_lbvq/demo.py --config-file configs/youtubevis_2019/lbvq_R50_bs8.yaml \
  --input datasets/ytvis_2019/valid/JPEGImages/xxxxxxx/*.jpg \
  --output output/demo --save-frames True \
  --opts MODEL.WEIGHTS lbvq_r50_ytvis2019.pth

Model Zoo

Pretrained weights on COCO

Name R-50 R-101
Mask2Former model model

HQ-SAM

Name vit_h
HQ-SAM model

YouTubeVIS-2019

Name Backbone AP AP50 AP75 AR1 AR10 Download
LBVQ R-50 52.2 74.8 57.7 49.9 59.8 model
LBVQ R-101 53.1 76.3 60.2 50.0 59.2 model

YouTubeVIS-2021

Name Backbone AP AP50 AP75 AR1 AR10 Download
LBVQ R-50 44.8 67.4 46.0 41.6 52.3 model

OVIS

Name Backbone AP AP50 AP75 AR1 AR10 Download
LBVQ R-50 22.2 45.3 19.0 12.4 27.5 model

License

The majority of LBVQ is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), and VITA(Apache-2.0 License).

Citing LBVQ

If you use LBVQ in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{Fang2024learning,
  title={Learning Better Video Query with SAM for Video Instance Segmentation},
  author={Fang, Hao and Zhang, Tong and Zhou, Xiaofei and Zhang, Xinxin},
  journal={IEEE Transactions on Circuits and Systems for Video Technology},
  year={2024},
  publisher={IEEE}
}

Acknowledgement

Our code is largely based on Detectron2, Mask2Former, and VITA. We are truly grateful for their excellent work.

About

Learning Better Video Query with SAM for Video Instance Segmentation (TCSVT 2024)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages