Skip to content

(NeurIPS 2023) Open-set visual object query search & localization in long-form videos

Notifications You must be signed in to change notification settings

hwjiang1510/VQLoC

Repository files navigation

Single-Stage Visual Query Localization in Egocentric Videos (NeurIPS 2023)


Single-Stage Visual Query Localization in Egocentric Videos

Hanwen Jiang, Santhosh Ramakrishnan, Kristen Grauman

Installation

conda create --name vqloc python=3.8
conda activate vqloc

# Install pytorch or use your own torch version
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge

pip install -r requirements.txt 

Pre-trained Weights

We provide the model weights trained on here.

Train VQLoC

Download Dataset

  • Please follow vq2d baseline step 1/2/4/5 to process the dataset into video clips.

Training

  • Use ./train.sh and change your training config accordingly.
  • The default training configurations require about 200GB at most, e.g. 8 A40 GPUs with 40GB VRAM, each.

Evaluate VQLoC

    1. Use ./inference_predict.sh to inference on the target video clips. Change the path of your model checkpoint.
    1. Use python inference_results.py --cfg ./config/val.yaml to format the results. Use --eval and --cfg ./config/eval.yaml for evaluation (submit to leaderboard).
    1. Use python evaluate.py to get the numbers. Please change --pred-file and --gt-file accordingly.

Known Issues

  • The hard negative mining is not very steady. We set use_hnm=False by default.

Citation

@article{jiang2023vqloc,
   title={Single-Stage Visual Query Localization in Egocentric Videos},
   author={Jiang, Hanwen and Ramakrishnan, Santhosh and Grauman, Kristen},
   journal={ArXiv},
   year={2023},
   volume={2306.09324}
}

About

(NeurIPS 2023) Open-set visual object query search & localization in long-form videos

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published