This is an implementation of ViTDet based on MMDetection, MMCV, and MMEngine.
Follow original setting, this project is trained with total batch size of 64 (16 GPU with 4 images per GPU).
In MMDetection's root directory, run the following command to train the model:
GPUS=${GPUS} ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} ${WORK_DIR}
Below is an example of using 16 GPUs to train VitDet on a Slurm partition named dev, and set the work-dir to some shared file systems.
GPUS=16 ./tools/slurm_train.sh dev vitdet_mask_b projects/ViTDet/configs/vitdet_mask-rcnn_vit-b-mae_lsj-100e.py /nfs/xxxx/vitdet_mask-rcnn_vit-b-mae_lsj-100e
In MMDetection's root directory, run the following command to test the model:
python tools/test.py projects/ViTDet/configs/vitdet_mask-rcnn_vit-b-mae_lsj-100e.py ${CHECKPOINT_PATH}
Based on mmdetection, this project almost aligns the test and train accuracy of the ViTDet.
Method | Backbone | Pretrained Model | Training set | Test set | Epoch | Val Box AP | Val Mask AP | Download |
---|---|---|---|---|---|---|---|---|
ViTDet | ViT-B | MAE | COCO2017 Train | COCO2017 Val | 100 | 51.6 | 45.7 | model / log |
Note:
- The mask AP is lower than official repo slightly
- other model vision will release code and weights in the future
@article{li2022exploring,
title={Exploring plain vision transformer backbones for object detection},
author={Li, Yanghao and Mao, Hanzi and Girshick, Ross and He, Kaiming},
journal={arXiv preprint arXiv:2203.16527},
year={2022}
}
-
Milestone 1: PR-ready, and acceptable to be one of the
projects/
.-
Finish the code
-
Basic docstrings & proper citation
-
Test-time correctness
-
A full README
-
-
Milestone 2: Indicates a successful model implementation.
-
Training-time correctness
-
-
Milestone 3: Good to be a part of our core package!
-
Type hints and docstrings
-
Unit tests
-
Code polishing
-
Metafile.yml
-
-
Move your modules into the core package following the codebase's file hierarchy structure.
-
Refactor your modules into the core package following the codebase's file hierarchy structure.