Nvidia TensorRT detector #4718

NateMeyer · 2022-12-15T21:20:19Z

This branch is a work-in-progress and will need cleanup yet before merging. Currently the amd64 image builds and should have the necessary libraries for cuda/tensorrt. These are installed from the NVidia pyindex repository. This was the easiest way I could find to install the libraries without needing to use the nvidia docker image as a base or to login to download the installers.

This is based on the work from yury-sannikov (#2548) and yeahme49 (#3016)

Current Status:

~~Non-functional~~ Working with a single camera input
amd64 image builds w/ TensorRT and Cuda python libraries
Engine loads a yolov4 model
~~occasional segfault~~
~~CUDNN Mapping Error when running inference~~
~~Inference runs, but results are not parsed correctly~~ Fixed issue with input normalization

Generate yolo models by running the tensorrt_models.sh script. See draft documentation for more details.

mkdir trt-models
wget https://raw.githubusercontent.com/NateMeyer/frigate/nvidia-detector/docker/tensorrt_models.sh
chmod +x tensorrt_models.sh
docker run --gpus=all --rm -it -v `pwd`/trt-models:/tensorrt_models -v `pwd`/tensorrt_models.sh:/tensorrt_models.sh nvcr.io/nvidia/tensorrt:22.07-py3 /tensorrt_models.sh

TODO:

Get detector working with cuda-python api
Add build stage to generate and convert a model to the .trt format
Update detector to format input data correctly per model (uint8 vs float32, etc.)
~~Evaluate support for jetson platforms (arm64)~~ [Arm platforms are not expected to be included in the 0.12 release]
Update documentation

netlify · 2022-12-15T21:20:34Z

✅ Deploy Preview for frigate-docs ready!

Name	Link
🔨 Latest commit	`177551a`
🔍 Latest deploy log	https://app.netlify.com/sites/frigate-docs/deploys/63af14d0d7d6470008f28d24
😎 Deploy Preview	https://deploy-preview-4718--frigate-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

NickM-27 · 2022-12-15T21:33:38Z

Evaluate support for jetson platforms (arm64)

As far as I understand, we're not looking to have that in this 0.12 release, adds a whole other level of device & user support / complexity

NateMeyer · 2022-12-21T01:10:04Z

FYI, the current image size is about 5GB when building the amd64 image.

NickM-27 · 2022-12-21T02:11:49Z

Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though.

NateMeyer · 2022-12-25T05:00:34Z

I was able to get around the inference CUDNN errors by avoiding the async memcopy and execute functions. Not sure why these had trouble, may revisit this later if needed.

dennispg · 2022-12-28T00:19:13Z

Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though.

I think you might be able to get smaller image sizes using the nvidia/cuda base images instead of installing pycuda and tensorrt manually. Also you could set environment variables that enable/disable the plugin using conditional imports depending on whether the image has the supported dependencies.

I just submitted a new pull request #4774 which enables detectors to reside in separate containers and communicate using zeromq. This might help facilitate in maintaining various flavors of container images?

NateMeyer · 2022-12-28T02:55:24Z

Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though.

I think you might be able to get smaller image sizes using the nvidia/cuda base images instead of installing pycuda and tensorrt manually. Also you could set environment variables that enable/disable the plugin using conditional imports depending on whether the image has the supported dependencies.

I just submitted a new pull request #4774 which enables detectors to reside in separate containers and communicate using zeromq. This might help facilitate in maintaining various flavors of container images?

That may be an option, but first I want to get something working in a single image. I'll take a closer look at your PR. The cuda base image does not have TensorRT, python, or any other dependencies of frigate installed, so it would likely end up being close to the same size.

The next issue I'm trying to wrap my head around is the creation of the trt engine. From what I can tell, the process to generate the trt model 1) requires an NVidia GPU, and 2) generates a model specific to that GPU. The other TensorRT images that I am working from solve this by generating the trt during initialization (and preferably saving the output so it only has to do so once). Unfortunately this PR currently doesn't include all the tools and onnx/tensorflow/pytorch framework to do the conversion and the two images I looked at that do (#2548 and #3016) are 2-4 times larger than mine.

Options could be to bloat this image more with the extra tools for conversion, or to have the user run another dedicated script/image to do the conversion before running frigate. From what I understand of the Frigate project, I'm giving the following considerations:

How does it affect the user who just installs frigate as a HomeAssistant add-on? Not everyone is familiar with dockers/containers and requiring a complicated setup process could turn people off to the project.
How hard is it to support users who run into issues? The more complicated setting up Frigate is, the more time developers spend helping troubleshoot instead of working on new features.
Image sizes should be reasonably efficient. General good practice to not force everyone to download large libraries that they will not need.

Sorry for my rambling. I admittedly haven't looked too closely at how the HA add-on is set up, maybe adding the second image isn't too much of a stretch. Maybe we move forward with a different tag for the TensorRT-enabled image. If there is a preferred solution, let me know. I haven't made as much progress as I'd hoped the last 2 weeks for a variety of reasons, but I should be able to get a bit more done this week.

NickM-27 · 2022-12-28T03:31:55Z

Yeah, I think as powerful as an nvidia GPU based detection is, it is still relatively niche for the overall frigate user base and will continue to be that way.

What I was imagining was a separate nvidia image which uses the main (current) Docker image as its base and builds on top of that with the tensorrt parts so the other detectors can be combined with tensorrt.

I think regardless of how it's done, there needs to be a minimal image option. That's also just my ramblings / opinions.

dennispg · 2022-12-28T03:48:17Z

There are some TensorRT images here that should be useful as a base: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt

My thought on this is to lean more into relegating Nvidia GPU based detection into sidecar containers or a remote container on another machine, esp in the case of the HA addon. This scenario was actually my entire goal with the zeromq PR. And I would also leave this as more of an advanced use case with limited support, but I don't know how well that flies. But the core image can continue to work and be supported as it always has that way.

NateMeyer · 2022-12-28T04:19:47Z

Are there any issues with zeromq when one side is using a different minor version of python? Frigate is currently using 3.9, but all of the Nvidia images I've seen use 3.8 or 3.10. We could use the Nvidia base image for a sidecar, but I'm not sure what will break if try to build the whole project on top of that.

Do you have any examples of a remote detector using your zeromq changes?

For now I'll plan on building an image with a -nvidia tag suffix.

alekc · 2022-12-28T13:19:14Z

3. Image sizes should be reasonably efficient. General good practice to not force everyone to download large libraries that they will not need

Given that most of the times people with nvidia will be running frigate only on one pc, I think it would be more than acceptable to them (me included) to download library files (if required) via init container and place them inside the volume on the first ever run.

That should be a good enough compromise in relation to the image size.

NickM-27 · 2022-12-28T13:31:05Z

For now I'll plan on building an image with a -nvidia tag suffix.

Use -tensorrt not nvidia so it's not confused with the previous -nvidia

NateMeyer · 2022-12-28T17:24:49Z

Image sizes should be reasonably efficient. General good practice to not force everyone to download large libraries that they will not need

Given that most of the times people with nvidia will be running frigate only on one pc, I think it would be more than acceptable to them (me included) to download library files (if required) via init container and place them inside the volume on the first ever run.

That should be a good enough compromise in relation to the image size.

Thanks for the input! Would you mind taking a look at the updated info at the top of this PR about generating the models and let me know if you have any questions.

I tried estimating what the final URL would be to download the script from once it was merged into master, use the link I posted above to my fork for now.

NateMeyer · 2022-12-29T17:46:42Z

I think I fixed the async execution and added a way to select which GPU is used if multiple are present. I only have a single GPU, so I have not tested that the selected GPU is actually used. I think this leaves the branch in a usable state if we want to kick off some testing. I will rebase and build an image.

Does not run

…runs without error.

NateMeyer · 2022-12-29T20:06:05Z

Image for testing at ghcr.io/natemeyer/frigate:0.12.0-bd4983d-tensorrt.

NickM-27 · 2022-12-29T23:16:53Z

Seeing the warning TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 but running well with an inference speed of 5 ms with yolov4tiny-416

NateMeyer · 2022-12-29T23:57:53Z

Seeing the warning TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 but running well with an inference speed of 5 ms with yolov4tiny-416

Yes, I've seen the same warning all along, it seems benign. I'll try upgrading that package to a newer version.

NateMeyer · 2022-12-30T05:07:51Z

Image with updated libraries ghcr.io/natemeyer/frigate:0.12.0-e1b075a-tensorrt

blakeblackshear · 2022-12-30T14:15:05Z

Spent some time sifting through the image this morning with dive. Dang those nvidia and tensorrt packages are massive. I think a separate tag here is probably the right approach.

blakeblackshear

This looks really great.

Just noticed two things:

Looks like this will be limited to yolo models specifically, so the docs should probably state that.
We will need to update the github actions ci.yml workflow in addition to the Makefile

NickM-27 · 2022-12-30T16:09:32Z

I think the docs also need to more obviously call out that the tensorrt detector is only available in the specific image stable-tensorrt so that there is no confusion there

NateMeyer · 2022-12-30T16:12:52Z

@blakeblackshear I got a permission error trying to push a commit with the ci.yml changes. I've not done too much with github's CI before, but I think the below changes are what you're looking for?

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index c89b4e9..1c166eb 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -36,7 +36,19 @@ jobs:
           context: .
           push: true
           platforms: linux/amd64,linux/arm64,linux/arm/v7
+          target: frigate
           tags: |
             ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}
           cache-from: type=gha
           cache-to: type=gha,mode=max
+      - name: Build and push TensorRT
+        uses: docker/build-push-action@v3
+        with:
+          context: .
+          push: true
+          platforms: linux/amd64
+          target: frigate-tensorrt
+          tags: |
+            ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}-tensorrt
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
\ No newline at end of file

NickM-27 · 2022-12-30T16:15:03Z

I don't think there should be a permission issue, other users have made PRs with changes to those files despite not having write access. What was the error exactly?

I think that looks right though.

NateMeyer · 2022-12-30T16:15:44Z

I'll try again. It's my own fork, so I should be able to push to it. hmm...

NateMeyer · 2022-12-30T16:17:00Z

Oh, ha. my access token ~~expired today~~ didn't have workflow permissions. whoops!

NateMeyer · 2022-12-30T16:44:05Z

This looks really great.

Just noticed two things:

Looks like this will be limited to yolo models specifically, so the docs should probably state that.

We will need to update the github actions ci.yml workflow in addition to the Makefile

OK, these two items should be updated now.

klutzzykitty · 2023-03-19T14:42:44Z

@NateMeyer
Hey Nate!
just went through the PR and that's some good work on concatenating support for TRT.

I was going through other nvidia related discussions here and am a bit lost on how to get any of the PRs running separately.

I know that this PR is merged to V0.12 but I'd like to know how to build only the NateMeyer:nvidia-detector branch's docker.
I've been thinking of trying some luck with experimenting on other nvidia ecosystem software like deepstream and triton inference servers so, building V0.12 is way overkill compared to having the Nvidia tensorrt detector branch's docker built.

Are the instructions present anywhere ?
tried looking up in your PR but apart from the code changes, i couldn't make out what steps to follow to create a docker of it.

is it just following the makefile or is there more to it like creating TRT yolo models etc ?

Thanks !!! : )

NateMeyer · 2023-03-19T16:25:17Z

Hey @klutzzykitty,

This branch builds on the Frigate dockerfile, based on Debian 11, and adds in the CUDA, TensortRT, and supporting libraries in the requirements-tensorrt.txt file. If you just want to build the TensorRT version, you can run make local-trt to build that target in the makefile.

If you don't need a specific base image, it would be easier to start with the TensorRT image from NVidia's container repo. The instructions to convert the TRT Yolo models make use of one of these images from NVidia.

Hope this helps!

klutzzykitty · 2023-03-21T16:27:05Z

@NateMeyer
thanks!
Yeah that helps but that's not exactly what I was looking for.

Following that would also build frigate v0.12 with all the new UI features as well isn't it ?

I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled.
because you started working no it when 0.11.1 was the official stable release isn't it ?

sure I can run make local-trt but it does also build frigate with other add-ons like web RTC and UI changes.
Or is it that running make local-trt from your fork builds it on V0.11.1 ?

NickM-27 · 2023-03-21T16:30:06Z

@NateMeyer thanks! Yeah that helps but that's not exactly what I was looking for.

Following that would also build frigate v0.12 with all the new UI features as well isn't it ?

I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled. because you started working no it when 0.11.1 was the official stable release isn't it ?

sure I can run make local-trt but it does also build frigate with other add-ons like web RTC and UI changes. Or is it that running make local-trt from your fork builds it on V0.11.1 ?

You can see on the branch that the nvidia-detector work started on commit 368c07c which is well into 0.12 development. Also, it is unclear why that would make a difference, either way you are building frigate with webUI / other modules.

You'll probably just want to follow https://deploy-preview-4055--frigate-docs.netlify.app/development/contributing and run off the latest dev branch. That way you don't need to worry about building the whole image and can just work on the frigate backend

klutzzykitty · 2023-03-21T16:43:21Z

@NateMeyer thanks! Yeah that helps but that's not exactly what I was looking for.
Following that would also build frigate v0.12 with all the new UI features as well isn't it ?
I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled. because you started working no it when 0.11.1 was the official stable release isn't it ?
sure I can run make local-trt but it does also build frigate with other add-ons like web RTC and UI changes. Or is it that running make local-trt from your fork builds it on V0.11.1 ?

You can see on the branch that the nvidia-detector work started on commit 368c07c which is well into 0.12 development. Also, it is unclear why that would make a difference, either way you are building frigate with webUI / other modules.

You'll probably just want to follow https://deploy-preview-4055--frigate-docs.netlify.app/development/contributing and run off the latest dev branch. That way you don't need to worry about building the whole image and can just work on the frigate backend

Hey Nick!
Ah, I see now.

It sure doesn't make a difference for normal use. I however was trying to experiment with a few of my rough understandings and changes already done on my custom built frigate image which is on V0.11.1
Ex: like having a notification trigger when neighbours throw garbage into a certain area (relies on motion detection).
Or identifying the color of my dogs that get detected as I have a few Kangals of different colors.

I was thinking development of tensorrt integration would have used the code of whatever stable release existed that time, i.e. V0.11.1.

so I suppose i'd have to go with following the image build instructions with V0.12 betas only isn't it ?

NateMeyer force-pushed the nvidia-detector branch from 2881256 to 945a382 Compare December 28, 2022 15:21

NateMeyer added 15 commits December 29, 2022 14:06

Initial WIP dockerfile and scripts to add tensorrt support

aecea31

Add tensorRT detector

0ef7bfb

WIP attempt to install TensorRT 8.5

4b562ad

Updates to detector for cuda python library

ccc1218

TensorRT Cuda library rework WIP

556d82d

Does not run

Fixes from rebase to detector factory

36d2d20

Fix parsing output memory pointer

aa9271e

Handle TensorRT logs with the python logger

643c3f2

Use non-async interface and convert input data to float32. Detection …

4198e79

…runs without error.

Make TensorRT a separate build from the base Frigate image.

111fdfb

Add script and documentation for generating TRT Models

a16231e

Add support for TensorRT devcontainer

bd10a27

Add labelmap to trt model script and docs. Cleanup of old scripts.

c015070

Update detect to normalize input tensor using model input type

de251c2

Add config for selecting GPU. Fix Async inference. Update documentation.

bd4983d

NateMeyer force-pushed the nvidia-detector branch from 0d82c7c to bd4983d Compare December 29, 2022 19:57

NateMeyer marked this pull request as ready for review December 29, 2022 20:42

Update some CUDA libraries to clean up version warning

e1b075a

blakeblackshear requested changes Dec 30, 2022

View reviewed changes

Add CI stage to build TensorRT tag

083eba1

Add note in docs for image tag and model support

177551a

NickM-27 approved these changes Dec 30, 2022

View reviewed changes

NickM-27 requested a review from blakeblackshear December 30, 2022 16:44

blakeblackshear approved these changes Dec 30, 2022

View reviewed changes

blakeblackshear merged commit 3f05f74 into blakeblackshear:dev Dec 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nvidia TensorRT detector #4718

Nvidia TensorRT detector #4718

NateMeyer commented Dec 15, 2022 •

edited

Loading

netlify bot commented Dec 15, 2022 •

edited

Loading

NickM-27 commented Dec 15, 2022

NateMeyer commented Dec 21, 2022

NickM-27 commented Dec 21, 2022 •

edited

Loading

NateMeyer commented Dec 25, 2022

dennispg commented Dec 28, 2022

NateMeyer commented Dec 28, 2022

NickM-27 commented Dec 28, 2022

dennispg commented Dec 28, 2022

NateMeyer commented Dec 28, 2022

alekc commented Dec 28, 2022

NickM-27 commented Dec 28, 2022

NateMeyer commented Dec 28, 2022

NateMeyer commented Dec 29, 2022

NateMeyer commented Dec 29, 2022

NickM-27 commented Dec 29, 2022

NateMeyer commented Dec 29, 2022

NateMeyer commented Dec 30, 2022

blakeblackshear commented Dec 30, 2022

blakeblackshear left a comment •

edited

Loading

NickM-27 commented Dec 30, 2022

NateMeyer commented Dec 30, 2022

NickM-27 commented Dec 30, 2022 •

edited

Loading

NateMeyer commented Dec 30, 2022

NateMeyer commented Dec 30, 2022 •

edited

Loading

NateMeyer commented Dec 30, 2022

klutzzykitty commented Mar 19, 2023 •

edited

Loading

NateMeyer commented Mar 19, 2023

klutzzykitty commented Mar 21, 2023

NickM-27 commented Mar 21, 2023

klutzzykitty commented Mar 21, 2023 •

edited

Loading

Nvidia TensorRT detector #4718

Nvidia TensorRT detector #4718

Conversation

NateMeyer commented Dec 15, 2022 • edited Loading

netlify bot commented Dec 15, 2022 • edited Loading

✅ Deploy Preview for frigate-docs ready!

NickM-27 commented Dec 15, 2022

NateMeyer commented Dec 21, 2022

NickM-27 commented Dec 21, 2022 • edited Loading

NateMeyer commented Dec 25, 2022

dennispg commented Dec 28, 2022

NateMeyer commented Dec 28, 2022

NickM-27 commented Dec 28, 2022

dennispg commented Dec 28, 2022

NateMeyer commented Dec 28, 2022

alekc commented Dec 28, 2022

NickM-27 commented Dec 28, 2022

NateMeyer commented Dec 28, 2022

NateMeyer commented Dec 29, 2022

NateMeyer commented Dec 29, 2022

NickM-27 commented Dec 29, 2022

NateMeyer commented Dec 29, 2022

NateMeyer commented Dec 30, 2022

blakeblackshear commented Dec 30, 2022

blakeblackshear left a comment • edited Loading

Choose a reason for hiding this comment

NickM-27 commented Dec 30, 2022

NateMeyer commented Dec 30, 2022

NickM-27 commented Dec 30, 2022 • edited Loading

NateMeyer commented Dec 30, 2022

NateMeyer commented Dec 30, 2022 • edited Loading

NateMeyer commented Dec 30, 2022

klutzzykitty commented Mar 19, 2023 • edited Loading

NateMeyer commented Mar 19, 2023

klutzzykitty commented Mar 21, 2023

NickM-27 commented Mar 21, 2023

klutzzykitty commented Mar 21, 2023 • edited Loading

NateMeyer commented Dec 15, 2022 •

edited

Loading

netlify bot commented Dec 15, 2022 •

edited

Loading

NickM-27 commented Dec 21, 2022 •

edited

Loading

blakeblackshear left a comment •

edited

Loading

NickM-27 commented Dec 30, 2022 •

edited

Loading

NateMeyer commented Dec 30, 2022 •

edited

Loading

klutzzykitty commented Mar 19, 2023 •

edited

Loading

klutzzykitty commented Mar 21, 2023 •

edited

Loading