Skip to content

Commit

Permalink
Nvidia TensorRT detector (blakeblackshear#4718)
Browse files Browse the repository at this point in the history
* Initial WIP dockerfile and scripts to add tensorrt support

* Add tensorRT detector

* WIP attempt to install TensorRT 8.5

* Updates to detector for cuda python library

* TensorRT Cuda library rework WIP

Does not run

* Fixes from rebase to detector factory

* Fix parsing output memory pointer

* Handle TensorRT logs with the python logger

* Use non-async interface and convert input data to float32. Detection runs without error.

* Make TensorRT a separate build from the base Frigate image.

* Add script and documentation for generating TRT Models

* Add support for TensorRT devcontainer

* Add labelmap to trt model script and docs.  Cleanup of old scripts.

* Update detect to normalize input tensor using model input type

* Add config for selecting GPU. Fix Async inference. Update documentation.

* Update some CUDA libraries to clean up version warning

* Add CI stage to build TensorRT tag

* Add note in docs for image tag and model support
  • Loading branch information
NateMeyer committed Dec 30, 2022
1 parent e3ec292 commit 3f05f74
Show file tree
Hide file tree
Showing 9 changed files with 515 additions and 16 deletions.
12 changes: 12 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,19 @@ jobs:
context: .
push: true
platforms: linux/amd64,linux/arm64,linux/arm/v7
target: frigate
tags: |
ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}
cache-from: type=gha
cache-to: type=gha,mode=max
- name: Build and push TensorRT
uses: docker/build-push-action@v3
with:
context: .
push: true
platforms: linux/amd64
target: frigate-tensorrt
tags: |
ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}-tensorrt
cache-from: type=gha
cache-to: type=gha,mode=max
31 changes: 27 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,15 @@ WORKDIR /rootfs/usr/local/go2rtc/bin
RUN wget -qO go2rtc "https://github.com/AlexxIT/go2rtc/releases/download/v0.1-rc.5/go2rtc_linux_${TARGETARCH}" \
&& chmod +x go2rtc


####
#
# OpenVino Support
#
# 1. Download and convert a model from Intel's Public Open Model Zoo
# 2. Build libUSB without udev to handle NCS2 enumeration
#
####
# Download and Convert OpenVino model
FROM base_amd64 AS ov-converter
ARG DEBIAN_FRONTEND
Expand Down Expand Up @@ -115,8 +124,6 @@ RUN /bin/mkdir -p '/usr/local/lib' && \
/usr/bin/install -c -m 644 libusb-1.0.pc '/usr/local/lib/pkgconfig' && \
ldconfig



FROM wget AS models

# Get model and labels
Expand Down Expand Up @@ -160,7 +167,8 @@ RUN apt-get -qq update \
libtbb2 libtbb-dev libdc1394-22-dev libopenexr-dev \
libgstreamer-plugins-base1.0-dev libgstreamer1.0-dev \
# scipy dependencies
gcc gfortran libopenblas-dev liblapack-dev
gcc gfortran libopenblas-dev liblapack-dev && \
rm -rf /var/lib/apt/lists/*

RUN wget -q https://bootstrap.pypa.io/get-pip.py -O get-pip.py \
&& python3 get-pip.py "pip"
Expand All @@ -176,6 +184,10 @@ RUN pip3 install -r requirements.txt
COPY requirements-wheels.txt /requirements-wheels.txt
RUN pip3 wheel --wheel-dir=/wheels -r requirements-wheels.txt

# Add TensorRT wheels to another folder
COPY requirements-tensorrt.txt /requirements-tensorrt.txt
RUN mkdir -p /trt-wheels && pip3 wheel --wheel-dir=/trt-wheels -r requirements-tensorrt.txt


# Collect deps in a single layer
FROM scratch AS deps-rootfs
Expand Down Expand Up @@ -283,7 +295,18 @@ COPY migrations migrations/
COPY --from=web-build /work/dist/ web/

# Frigate final container
FROM deps
FROM deps AS frigate

WORKDIR /opt/frigate/
COPY --from=rootfs / /

# Frigate w/ TensorRT Support as separate image
FROM frigate AS frigate-tensorrt
RUN --mount=type=bind,from=wheels,source=/trt-wheels,target=/deps/trt-wheels \
pip3 install -U /deps/trt-wheels/*.whl

# Dev Container w/ TRT
FROM devcontainer AS devcontainer-trt

RUN --mount=type=bind,from=wheels,source=/trt-wheels,target=/deps/trt-wheels \
pip3 install -U /deps/trt-wheels/*.whl
17 changes: 11 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,22 +10,27 @@ version:
echo 'VERSION = "$(VERSION)-$(COMMIT_HASH)"' > frigate/version.py

local: version
docker buildx build --tag frigate:latest --load .
docker buildx build --target=frigate --tag frigate:latest --load .

local-trt: version
docker buildx build --target=frigate-tensorrt --tag frigate:latest-tensorrt --load .

amd64:
docker buildx build --platform linux/amd64 --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .
docker buildx build --platform linux/amd64 --target=frigate --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .
docker buildx build --platform linux/amd64 --target=frigate-tensorrt --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH)-tensorrt .

arm64:
docker buildx build --platform linux/arm64 --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .
docker buildx build --platform linux/arm64 --target=frigate --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .

armv7:
docker buildx build --platform linux/arm/v7 --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .
docker buildx build --platform linux/arm/v7 --target=frigate --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .

build: version amd64 arm64 armv7
docker buildx build --platform linux/arm/v7,linux/arm64/v8,linux/amd64 --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .
docker buildx build --platform linux/arm/v7,linux/arm64/v8,linux/amd64 --target=frigate --tag $(IMAGE_REPO):$(VERSION)-$(COMMIT_HASH) .

push: build
docker buildx build --push --platform linux/arm/v7,linux/arm64/v8,linux/amd64 --tag $(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH) .
docker buildx build --push --platform linux/arm/v7,linux/arm64/v8,linux/amd64 --target=frigate --tag $(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH) .
docker buildx build --push --platform linux/amd64 --target=frigate-tensorrt --tag $(IMAGE_REPO):${GITHUB_REF_NAME}-$(COMMIT_HASH)-tensorrt .

run: local
docker run --rm --publish=5000:5000 --volume=${PWD}/config/config.yml:/config/config.yml frigate:latest
Expand Down
10 changes: 10 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,15 @@ services:
shm_size: "256mb"
build:
context: .
# Use target devcontainer-trt for TensorRT dev
target: devcontainer
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
devices:
- /dev/bus/usb:/dev/bus/usb
# - /dev/dri:/dev/dri # for intel hwaccel, needs to be updated for your hardware
Expand All @@ -21,6 +29,8 @@ services:
- /etc/localtime:/etc/localtime:ro
- ./config/config.yml:/config/config.yml:ro
- ./debug:/media/frigate
# Create the trt-models folder using the documented method of generating TRT models
# - ./debug/trt-models:/trt-models
- /dev/bus/usb:/dev/bus/usb
mqtt:
container_name: mqtt
Expand Down
37 changes: 37 additions & 0 deletions docker/tensorrt_models.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#!/bin/bash

set -euxo pipefail

CUDA_HOME=/usr/local/cuda
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
OUTPUT_FOLDER=/tensorrt_models
echo "Generating the following TRT Models: ${YOLO_MODELS:="yolov4-tiny-288,yolov4-tiny-416,yolov7-tiny-416"}"

# Create output folder
mkdir -p ${OUTPUT_FOLDER}

# Install packages
pip install --upgrade pip && pip install onnx==1.9.0 protobuf==3.20.3

# Clone tensorrt_demos repo
git clone --depth 1 https://github.com/yeahme49/tensorrt_demos.git /tensorrt_demos

# Build libyolo
cd /tensorrt_demos/plugins && make all
cp libyolo_layer.so ${OUTPUT_FOLDER}/libyolo_layer.so

# Download yolo weights
cd /tensorrt_demos/yolo && ./download_yolo.sh

# Build trt engine
cd /tensorrt_demos/yolo

for model in ${YOLO_MODELS//,/ }
do
python3 yolo_to_onnx.py -m ${model}
python3 onnx_to_tensorrt.py -m ${model}
cp /tensorrt_demos/yolo/${model}.trt ${OUTPUT_FOLDER}/${model}.trt;
done

# Download Labelmap
wget -q https://github.com/openvinotoolkit/open_model_zoo/raw/master/data/dataset_classes/coco_91cl.txt -O ${OUTPUT_FOLDER}/coco_91cl.txt
97 changes: 91 additions & 6 deletions docs/docs/configuration/detectors.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ id: detectors
title: Detectors
---

Frigate provides the following builtin detector types: `cpu`, `edgetpu`, and `openvino`. By default, Frigate will use a single CPU detector. Other detectors may require additional configuration as described below. When using multiple detectors they will run in dedicated processes, but pull from a common queue of detection requests from across all cameras.

**Note**: There is not yet support for Nvidia GPUs to perform object detection with tensorflow. It can be used for ffmpeg decoding, but not object detection.
Frigate provides the following builtin detector types: `cpu`, `edgetpu`, `openvino`, and `tensorrt`. By default, Frigate will use a single CPU detector. Other detectors may require additional configuration as described below. When using multiple detectors they will run in dedicated processes, but pull from a common queue of detection requests from across all cameras.

## CPU Detector (not recommended)

The CPU detector type runs a TensorFlow Lite model utilizing the CPU without hardware acceleration. It is recommended to use a hardware accelerated detector type instead for better performance. To configure a CPU based detector, set the `"type"` attribute to `"cpu"`.

The number of threads used by the interpreter can be specified using the `"num_threads"` attribute, and defaults to `3.`
Expand Down Expand Up @@ -60,6 +59,7 @@ detectors:
```

### Native Coral (Dev Board)

_warning: may have [compatibility issues](https://github.com/blakeblackshear/frigate/issues/1706) after `v0.9.x`_

```yaml
Expand Down Expand Up @@ -99,7 +99,7 @@ The OpenVINO detector type runs an OpenVINO IR model on Intel CPU, GPU and VPU h

The OpenVINO device to be used is specified using the `"device"` attribute according to the naming conventions in the [Device Documentation](https://docs.openvino.ai/latest/openvino_docs_OV_UG_Working_with_devices.html). Other supported devices could be `AUTO`, `CPU`, `GPU`, `MYRIAD`, etc. If not specified, the default OpenVINO device will be selected by the `AUTO` plugin.

OpenVINO is supported on 6th Gen Intel platforms (Skylake) and newer. A supported Intel platform is required to use the `GPU` device with OpenVINO. The `MYRIAD` device may be run on any platform, including Arm devices. For detailed system requirements, see [OpenVINO System Requirements](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/system-requirements.html)
OpenVINO is supported on 6th Gen Intel platforms (Skylake) and newer. A supported Intel platform is required to use the `GPU` device with OpenVINO. The `MYRIAD` device may be run on any platform, including Arm devices. For detailed system requirements, see [OpenVINO System Requirements](https://www.intel.com/content/www/us/en/developer/tools/openvino-toolkit/system-requirements.html)

An OpenVINO model is provided in the container at `/openvino-model/ssdlite_mobilenet_v2.xml` and is used by this detector type by default. The model comes from Intel's Open Model Zoo [SSDLite MobileNet V2](https://github.com/openvinotoolkit/open_model_zoo/tree/master/models/public/ssdlite_mobilenet_v2) and is converted to an FP16 precision IR model. Use the model configuration shown below when using the OpenVINO detector.

Expand All @@ -121,7 +121,7 @@ model:

### Intel NCS2 VPU and Myriad X Setup

Intel produces a neural net inference accelleration chip called Myriad X. This chip was sold in their Neural Compute Stick 2 (NCS2) which has been discontinued. If intending to use the MYRIAD device for accelleration, additional setup is required to pass through the USB device. The host needs a udev rule installed to handle the NCS2 device.
Intel produces a neural net inference accelleration chip called Myriad X. This chip was sold in their Neural Compute Stick 2 (NCS2) which has been discontinued. If intending to use the MYRIAD device for accelleration, additional setup is required to pass through the USB device. The host needs a udev rule installed to handle the NCS2 device.

```bash
sudo usermod -a -G users "$(whoami)"
Expand All @@ -139,11 +139,96 @@ Additionally, the Frigate docker container needs to run with the following confi
```bash
--device-cgroup-rule='c 189:\* rmw' -v /dev/bus/usb:/dev/bus/usb
```

or in your compose file:

```yml
device_cgroup_rules:
- 'c 189:* rmw'
- "c 189:* rmw"
volumes:
- /dev/bus/usb:/dev/bus/usb
```

## NVidia TensorRT Detector

NVidia GPUs may be used for object detection using the TensorRT libraries. Due to the size of the additional libraries, this detector is only provided in images with the `-tensorrt` tag suffix. This detector is designed to work with Yolo models for object detection.

### Minimum Hardware Support

The TensorRT detector uses the 11.x series of CUDA libraries which have minor version compatibility. The minimum driver version on the host system must be `>=450.80.02`. Also the GPU must support a Compute Capability of `5.0` or greater. This generally correlates to a Maxwell-era GPU or newer, check the NVIDIA GPU Compute Capability table linked below.

> **TODO:** NVidia claims support on compute 3.5 and 3.7, but marks it as deprecated. This would have some, but not all, Kepler GPUs as possibly working. This needs testing before making any claims of support.
There are improved capabilities in newer GPU architectures that TensorRT can benefit from, such as INT8 operations and Tensor cores. The features compatible with your hardware will be optimized when the model is converted to a trt file. Currently the script provided for generating the model provides a switch to enable/disable FP16 operations. If you wish to use newer features such as INT8 optimization, more work is required.

#### Compatibility References:

[NVIDIA TensorRT Support Matrix](https://docs.nvidia.com/deeplearning/tensorrt/archives/tensorrt-841/support-matrix/index.html)

[NVIDIA CUDA Compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/index.html)

[NVIDIA GPU Compute Capability](https://developer.nvidia.com/cuda-gpus)

### Generate Models

The models used for TensorRT must be preprocessed on the same hardware platform that they will run on. This means that each user must run additional setup to generate these model files for the TensorRT library. A script is provided that will build several common models.

To generate the model files, create a new folder to save the models, download the script, and launch a docker container that will run the script.

```bash
mkdir trt-models
wget https://raw.githubusercontent.com/blakeblackshear/frigate/nvidia-detector/docker/tensorrt_models.sh
chmod +x tensorrt_models.sh
docker run --gpus=all --rm -it -v `pwd`/trt-models:/tensorrt_models -v `pwd`/tensorrt_models.sh:/tensorrt_models.sh nvcr.io/nvidia/tensorrt:22.07-py3 /tensorrt_models.sh
```

The `trt-models` folder can then be mapped into your frigate container as `trt-models` and the models referenced from the config.

If your GPU does not support FP16 operations, you can pass the environment variable `-e USE_FP16=False` to the `docker run` command to disable it.

Specific models can be selected by passing an environment variable to the `docker run` command. Use the form `-e YOLO_MODELS=yolov4-416,yolov4-tiny-416` to select one or more model names. The models available are shown below.

```
yolov3-288
yolov3-416
yolov3-608
yolov3-spp-288
yolov3-spp-416
yolov3-spp-608
yolov3-tiny-288
yolov3-tiny-416
yolov4-288
yolov4-416
yolov4-608
yolov4-csp-256
yolov4-csp-512
yolov4-p5-448
yolov4-p5-896
yolov4-tiny-288
yolov4-tiny-416
yolov4x-mish-320
yolov4x-mish-640
yolov7-tiny-288
yolov7-tiny-416
```

### Configuration Parameters

The TensorRT detector can be selected by specifying `tensorrt` as the model type. The GPU will need to be passed through to the docker container using the same methods described in the [Hardware Acceleration](hardware_acceleration.md#nvidia-gpu) section. If you pass through multiple GPUs, you can select which GPU is used for a detector with the `device` configuration parameter. The `device` parameter is an integer value of the GPU index, as shown by `nvidia-smi` within the container.

The TensorRT detector uses `.trt` model files that are located in `/trt-models/` by default. These model file path and dimensions used will depend on which model you have generated.

```yaml
detectors:
tensorrt:
type: tensorrt
device: 0 #This is the default, select the first GPU

model:
path: /trt-models/yolov7-tiny-416.trt
labelmap_path: /trt-models/coco_91cl.txt
input_tensor: nchw
input_pixel_format: rgb
width: 416
height: 416
```
Loading

0 comments on commit 3f05f74

Please sign in to comment.