Fast and accurate single stage object detection with end-to-end GPU optimization.
ODTK is a single shot object detector with various backbones and detection heads. This allows performance/accuracy trade-offs.
It is optimized for end-to-end GPU processing using:
- The PyTorch deep learning framework with ONNX support
- NVIDIA Apex for mixed precision and distributed training
- NVIDIA DALI for optimized data pre-processing
- NVIDIA TensorRT for high-performance inference
- NVIDIA DeepStream for optimized real-time video streams support
This repo now supports rotated bounding box detections. See rotated detections training and rotated detections inference documents for more information on how to use the --rotated-bbox
command.
Bounding box annotations are described by [x, y, w, h, theta]
.
The detection pipeline allows the user to select a specific backbone depending on the latency-accuracy trade-off preferred.
ODTK RetinaNet model accuracy and inference latency & FPS (frames per seconds) for COCO 2017 (train/val) after full training schedule. Inference results include bounding boxes post-processing for a batch size of 1. Inference measured at --resize 800
using --with-dali
on a FP16 TensorRT engine.
Backbone | mAP @[IoU=0.50:0.95] | Training Time on DGX1v | Inference latency FP16 on V100 | Inference latency INT8 on T4 | Inference latency FP16 on A100 | Inference latency INT8 on A100 |
---|---|---|---|---|---|---|
ResNet18FPN | 0.318 | 5 hrs | 14 ms; 71 FPS |
18 ms; 56 FPS |
9 ms; 110 FPS |
7 ms; 141 FPS |
MobileNetV2FPN | 0.333 | 14 ms; 74 FPS |
18 ms; 56 FPS |
9 ms; 114 FPS |
7 ms; 138 FPS |
|
ResNet34FPN | 0.343 | 6 hrs | 16 ms; 64 FPS |
20 ms; 50 FPS |
10 ms; 103 FPS |
7 ms; 142 FPS |
ResNet50FPN | 0.358 | 7 hrs | 18 ms; 56 FPS |
22 ms; 45 FPS |
11 ms; 93 FPS |
8 ms; 129 FPS |
ResNet101FPN | 0.376 | 10 hrs | 22 ms; 46 FPS |
27 ms; 37 FPS |
13 ms; 78 FPS |
9 ms; 117 FPS |
ResNet152FPN | 0.393 | 12 hrs | 26 ms; 38 FPS |
33 ms; 31 FPS |
15 ms; 66 FPS |
10 ms; 103 FPS |
For best performance, use the latest PyTorch NGC docker container. Clone this repository, build and run your own image:
git clone https://github.com/nvidia/retinanet-examples
docker build -t odtk:latest retinanet-examples/
docker run --gpus all --rm --ipc=host -it odtk:latest
Training, inference, evaluation and model export can be done through the odtk
utility.
For more details, including a list of parameters, please refer to the TRAINING and INFERENCE documentation.
Train a detection model on COCO 2017 from pre-trained backbone:
odtk train retinanet_rn50fpn.pth --backbone ResNet50FPN \
--images /coco/images/train2017/ --annotations /coco/annotations/instances_train2017.json \
--val-images /coco/images/val2017/ --val-annotations /coco/annotations/instances_val2017.json
Fine-tune a pre-trained model on your dataset. In the example below we use Pascal VOC with JSON annotations:
odtk train model_mydataset.pth --backbone ResNet50FPN \
--fine-tune retinanet_rn50fpn.pth \
--classes 20 --iters 10000 --val-iters 1000 --lr 0.0005 \
--resize 512 --jitter 480 640 --images /voc/JPEGImages/ \
--annotations /voc/pascal_train2012.json --val-annotations /voc/pascal_val2012.json
Note: the shorter side of the input images will be resized to resize
as long as the longer side doesn't get larger than max-size
. During training, the images will be randomly randomly resized to a new size within the jitter
range.
Evaluate your detection model on COCO 2017:
odtk infer retinanet_rn50fpn.pth --images /coco/images/val2017/ --annotations /coco/annotations/instances_val2017.json
Run inference on your dataset:
odtk infer retinanet_rn50fpn.pth --images /dataset/val --output detections.json
For faster inference, export the detection model to an optimized FP16 TensorRT engine: