AirObject is a CVPR 2022 research project. In this work, we present AirObject, a temporal class-agnostic object encoding method to obtain global keypoint graph-based embeddings of objects. Specifically, the global 3D object embeddings are generated using a temporal convolutional network across structural information of multiple frames obtained from a graph attention-based encoding method. This repo contains the official code to run AirObject and other Baselines presented in the paper for Video Object Identification. We also provide code for training and global object tracking.
Matching Temporally Evolving Representations using AirObject
For more details, please see:
-
Full paper PDF: AirObject: A Temporally Evolving Graph Embedding for Object Identification.
-
Authors: Nikhil Varma Keetha, Chen Wang, Yuheng Qiu, Kuan Xu, Sebastian Scherer
Simply run the following commands:
conda create --channel conda-forge --name airobj --file ./AirObject/conda_requirements.txt
conda activate airobj
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.2 -c pytorch
pip install pyyaml opencv-python scipy tqdm pycocotools
cd ./AirObject/cocoapi/PythonAPI
python setup.py build
python setup.py install
For Data Loading, we use dataloaders present in the datasets folder.
Please download train.zip and annotations_train.json from the OVIS Dataset for Inference:
For inference, please download the models.zip file:
We first start by pre-extracting SuperPoint features for all the images. Please modify the superpoint_extraction.yaml
config file to extract SuperPoint Features for different datasets:
python './AirObject/superpoint_extraction.py' -c './AirObject/config/superpoint_extraction.yaml' -g 1
Please modify the eval.yaml
config file to test for different methods and datasets.
On the first run, we save the video object dictionary to the save_dir
to avoid redundant computation. Subsequently, to use the video object dictionary, set resume
to True
.
python './AirObject/eval.py' -c './AirObject/config/eval.yaml' -g 1
PR-AUC(%) results on YT-VIS Test Split:
2D Baseline | 3D Baseline | NetVLAD | SeqNet | AirObject |
---|---|---|---|---|
81.00 | 80.01 | 75.43 | 89.42 | 91.20 |
PR-AUC(%) results on OVIS:
2D Baseline | 3D Baseline | NetVLAD | SeqNet | AirObject |
---|---|---|---|---|
40.11 | 43.50 | 40.54 | 62.60 | 63.07 |
Please modify the eval_seq.yaml
config file to test SeqNet & AirObject for different sequence lengths.
Use the saved video object dictionary obtained from eval.py
.
python './AirObject/eval_seq.py' -c './AirObject/config/eval_seq.yaml' -g 1
Expected directory structure for track.py
:
Base Directory/
├── image_0
├── semantic
└── sp_0
MaskRCNN Inference:
python './tracking/maskrcnn_inference.py' -c './config/tracking/maskrcnn_inference.yaml' -g 1
SuperPoint Inference:
python './tracking/superpoint_inference.py' -c './config/tracking/superpoint_inference.yaml' -g 1
Please modify track.yaml
to run global object tracking with different methods and parameters.
Global Object Tracking:
python './tracking/track.py' -c './config/tracking/track.yaml' -g 1
To train Graph Attention Encoder: (Please refer to train_gcn.yaml
)
python './train/train_gcn.py' -c './config/train_gcn.yaml' -g 1
To train NetVLAD: (Please refer to train_netvlad.yaml
)
python './train/train_netvlad.py' -c './config/train_netvlad.yaml' -g 1
To train SeqNet: (Please refer to train_seqnet.yaml
)
python './train/train_seqnet.py' -c './config/train_seqnet.yaml' -g 1
To train AirObject: (Please refer to train_airobj.yaml
)
python './train/train_airobj.py' -c './config/train_airobj.yaml' -g 1
If any ideas from the paper or code from this repo are used, please consider citing:
@inproceedings{keetha2022airobject,
title = {AirObject: A Temporally Evolving Graph Embedding for Object Identification},
author = {Keetha, Nikhil Varma and Wang, Chen and Qiu, Yuheng and Xu, Kuan and Scherer, Sebastian},
booktitle = {CVPR},
year = {2022},
url = {https://arxiv.org/abs/2111.15150}}
The code is licensed under the BSD 3-Clause License.
The authors acknowledge the support from the AirLab and Robotics Institute, Carnegie Mellon University. This work was supported by ONR Grant N0014-19-1-2266 and ARL DCIST CRA award W911NF-17-2-0181.
We would also like to acknowledge the AirCode and YT-VOS API code repos.
Please check out the Air Series Articles.