Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nvidia TensorRT detector #4718

Merged
merged 18 commits into from
Dec 30, 2022
Merged

Conversation

NateMeyer
Copy link
Contributor

@NateMeyer NateMeyer commented Dec 15, 2022

This branch is a work-in-progress and will need cleanup yet before merging. Currently the amd64 image builds and should have the necessary libraries for cuda/tensorrt. These are installed from the NVidia pyindex repository. This was the easiest way I could find to install the libraries without needing to use the nvidia docker image as a base or to login to download the installers.

This is based on the work from yury-sannikov (#2548) and yeahme49 (#3016)

Current Status:

  • Non-functional Working with a single camera input
  • amd64 image builds w/ TensorRT and Cuda python libraries
  • Engine loads a yolov4 model
  • occasional segfault
  • CUDNN Mapping Error when running inference
  • Inference runs, but results are not parsed correctly Fixed issue with input normalization

Generate yolo models by running the tensorrt_models.sh script. See draft documentation for more details.

mkdir trt-models
wget https://raw.githubusercontent.com/NateMeyer/frigate/nvidia-detector/docker/tensorrt_models.sh
chmod +x tensorrt_models.sh
docker run --gpus=all --rm -it -v `pwd`/trt-models:/tensorrt_models -v `pwd`/tensorrt_models.sh:/tensorrt_models.sh nvcr.io/nvidia/tensorrt:22.07-py3 /tensorrt_models.sh

TODO:

  • Get detector working with cuda-python api
  • Add build stage to generate and convert a model to the .trt format
  • Update detector to format input data correctly per model (uint8 vs float32, etc.)
  • Evaluate support for jetson platforms (arm64) [Arm platforms are not expected to be included in the 0.12 release]
  • Update documentation

@netlify
Copy link

netlify bot commented Dec 15, 2022

Deploy Preview for frigate-docs ready!

Name Link
🔨 Latest commit 177551a
🔍 Latest deploy log https://app.netlify.com/sites/frigate-docs/deploys/63af14d0d7d6470008f28d24
😎 Deploy Preview https://deploy-preview-4718--frigate-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@NickM-27
Copy link
Sponsor Collaborator

Evaluate support for jetson platforms (arm64)

As far as I understand, we're not looking to have that in this 0.12 release, adds a whole other level of device & user support / complexity

@NateMeyer
Copy link
Contributor Author

FYI, the current image size is about 5GB when building the amd64 image.

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Dec 21, 2022

Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though.

@NateMeyer
Copy link
Contributor Author

I was able to get around the inference CUDNN errors by avoiding the async memcopy and execute functions. Not sure why these had trouble, may revisit this later if needed.

@dennispg
Copy link
Contributor

Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though.

I think you might be able to get smaller image sizes using the nvidia/cuda base images instead of installing pycuda and tensorrt manually. Also you could set environment variables that enable/disable the plugin using conditional imports depending on whether the image has the supported dependencies.

I just submitted a new pull request #4774 which enables detectors to reside in separate containers and communicate using zeromq. This might help facilitate in maintaining various flavors of container images?

@NateMeyer
Copy link
Contributor Author

Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though.

I think you might be able to get smaller image sizes using the nvidia/cuda base images instead of installing pycuda and tensorrt manually. Also you could set environment variables that enable/disable the plugin using conditional imports depending on whether the image has the supported dependencies.

I just submitted a new pull request #4774 which enables detectors to reside in separate containers and communicate using zeromq. This might help facilitate in maintaining various flavors of container images?

That may be an option, but first I want to get something working in a single image. I'll take a closer look at your PR. The cuda base image does not have TensorRT, python, or any other dependencies of frigate installed, so it would likely end up being close to the same size.

The next issue I'm trying to wrap my head around is the creation of the trt engine. From what I can tell, the process to generate the trt model 1) requires an NVidia GPU, and 2) generates a model specific to that GPU. The other TensorRT images that I am working from solve this by generating the trt during initialization (and preferably saving the output so it only has to do so once). Unfortunately this PR currently doesn't include all the tools and onnx/tensorflow/pytorch framework to do the conversion and the two images I looked at that do (#2548 and #3016) are 2-4 times larger than mine.

Options could be to bloat this image more with the extra tools for conversion, or to have the user run another dedicated script/image to do the conversion before running frigate. From what I understand of the Frigate project, I'm giving the following considerations:

  1. How does it affect the user who just installs frigate as a HomeAssistant add-on? Not everyone is familiar with dockers/containers and requiring a complicated setup process could turn people off to the project.
  2. How hard is it to support users who run into issues? The more complicated setting up Frigate is, the more time developers spend helping troubleshoot instead of working on new features.
  3. Image sizes should be reasonably efficient. General good practice to not force everyone to download large libraries that they will not need.

Sorry for my rambling. I admittedly haven't looked too closely at how the HA add-on is set up, maybe adding the second image isn't too much of a stretch. Maybe we move forward with a different tag for the TensorRT-enabled image. If there is a preferred solution, let me know. I haven't made as much progress as I'd hoped the last 2 weeks for a variety of reasons, but I should be able to get a bit more done this week.

@NickM-27
Copy link
Sponsor Collaborator

Yeah, I think as powerful as an nvidia GPU based detection is, it is still relatively niche for the overall frigate user base and will continue to be that way.

What I was imagining was a separate nvidia image which uses the main (current) Docker image as its base and builds on top of that with the tensorrt parts so the other detectors can be combined with tensorrt.

I think regardless of how it's done, there needs to be a minimal image option. That's also just my ramblings / opinions.

@dennispg
Copy link
Contributor

There are some TensorRT images here that should be useful as a base: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt

My thought on this is to lean more into relegating Nvidia GPU based detection into sidecar containers or a remote container on another machine, esp in the case of the HA addon. This scenario was actually my entire goal with the zeromq PR. And I would also leave this as more of an advanced use case with limited support, but I don't know how well that flies. But the core image can continue to work and be supported as it always has that way.

@NateMeyer
Copy link
Contributor Author

Are there any issues with zeromq when one side is using a different minor version of python? Frigate is currently using 3.9, but all of the Nvidia images I've seen use 3.8 or 3.10. We could use the Nvidia base image for a sidecar, but I'm not sure what will break if try to build the whole project on top of that.

Do you have any examples of a remote detector using your zeromq changes?

For now I'll plan on building an image with a -nvidia tag suffix.

@alekc
Copy link

alekc commented Dec 28, 2022

3. Image sizes should be reasonably efficient. General good practice to not force everyone to download large libraries that they will not need

Given that most of the times people with nvidia will be running frigate only on one pc, I think it would be more than acceptable to them (me included) to download library files (if required) via init container and place them inside the volume on the first ever run.

That should be a good enough compromise in relation to the image size.

@NickM-27
Copy link
Sponsor Collaborator

For now I'll plan on building an image with a -nvidia tag suffix.

Use -tensorrt not nvidia so it's not confused with the previous -nvidia

@NateMeyer
Copy link
Contributor Author

  1. Image sizes should be reasonably efficient. General good practice to not force everyone to download large libraries that they will not need

Given that most of the times people with nvidia will be running frigate only on one pc, I think it would be more than acceptable to them (me included) to download library files (if required) via init container and place them inside the volume on the first ever run.

That should be a good enough compromise in relation to the image size.

Thanks for the input! Would you mind taking a look at the updated info at the top of this PR about generating the models and let me know if you have any questions.

I tried estimating what the final URL would be to download the script from once it was merged into master, use the link I posted above to my fork for now.

@NateMeyer
Copy link
Contributor Author

I think I fixed the async execution and added a way to select which GPU is used if multiple are present. I only have a single GPU, so I have not tested that the selected GPU is actually used. I think this leaves the branch in a usable state if we want to kick off some testing. I will rebase and build an image.

@NateMeyer
Copy link
Contributor Author

Image for testing at ghcr.io/natemeyer/frigate:0.12.0-bd4983d-tensorrt.

@NateMeyer NateMeyer marked this pull request as ready for review December 29, 2022 20:42
@NickM-27
Copy link
Sponsor Collaborator

Seeing the warning TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 but running well with an inference speed of 5 ms with yolov4tiny-416

@NateMeyer
Copy link
Contributor Author

Seeing the warning TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.4.0 but running well with an inference speed of 5 ms with yolov4tiny-416

Yes, I've seen the same warning all along, it seems benign. I'll try upgrading that package to a newer version.

@NateMeyer
Copy link
Contributor Author

Image with updated libraries ghcr.io/natemeyer/frigate:0.12.0-e1b075a-tensorrt

@blakeblackshear
Copy link
Owner

Spent some time sifting through the image this morning with dive. Dang those nvidia and tensorrt packages are massive. I think a separate tag here is probably the right approach.

Copy link
Owner

@blakeblackshear blakeblackshear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great.

Just noticed two things:

  • Looks like this will be limited to yolo models specifically, so the docs should probably state that.
  • We will need to update the github actions ci.yml workflow in addition to the Makefile

@NickM-27
Copy link
Sponsor Collaborator

I think the docs also need to more obviously call out that the tensorrt detector is only available in the specific image stable-tensorrt so that there is no confusion there

@NateMeyer
Copy link
Contributor Author

@blakeblackshear I got a permission error trying to push a commit with the ci.yml changes. I've not done too much with github's CI before, but I think the below changes are what you're looking for?

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index c89b4e9..1c166eb 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -36,7 +36,19 @@ jobs:
           context: .
           push: true
           platforms: linux/amd64,linux/arm64,linux/arm/v7
+          target: frigate
           tags: |
             ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}
           cache-from: type=gha
           cache-to: type=gha,mode=max
+      - name: Build and push TensorRT
+        uses: docker/build-push-action@v3
+        with:
+          context: .
+          push: true
+          platforms: linux/amd64
+          target: frigate-tensorrt
+          tags: |
+            ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}-tensorrt
+          cache-from: type=gha
+          cache-to: type=gha,mode=max
\ No newline at end of file

@NickM-27
Copy link
Sponsor Collaborator

NickM-27 commented Dec 30, 2022

I don't think there should be a permission issue, other users have made PRs with changes to those files despite not having write access. What was the error exactly?

I think that looks right though.

@NateMeyer
Copy link
Contributor Author

I'll try again. It's my own fork, so I should be able to push to it. hmm...

@NateMeyer
Copy link
Contributor Author

NateMeyer commented Dec 30, 2022

Oh, ha. my access token expired today didn't have workflow permissions. whoops!

@NateMeyer
Copy link
Contributor Author

This looks really great.

Just noticed two things:

  • Looks like this will be limited to yolo models specifically, so the docs should probably state that.
  • We will need to update the github actions ci.yml workflow in addition to the Makefile

OK, these two items should be updated now.

@blakeblackshear blakeblackshear merged commit 3f05f74 into blakeblackshear:dev Dec 30, 2022
@klutzzykitty
Copy link

klutzzykitty commented Mar 19, 2023

@NateMeyer
Hey Nate!
just went through the PR and that's some good work on concatenating support for TRT.

I was going through other nvidia related discussions here and am a bit lost on how to get any of the PRs running separately.

I know that this PR is merged to V0.12 but I'd like to know how to build only the NateMeyer:nvidia-detector branch's docker.
I've been thinking of trying some luck with experimenting on other nvidia ecosystem software like deepstream and triton inference servers so, building V0.12 is way overkill compared to having the Nvidia tensorrt detector branch's docker built.

Are the instructions present anywhere ?
tried looking up in your PR but apart from the code changes, i couldn't make out what steps to follow to create a docker of it.

is it just following the makefile or is there more to it like creating TRT yolo models etc ?

Thanks !!! : )

@NateMeyer
Copy link
Contributor Author

Hey @klutzzykitty,

This branch builds on the Frigate dockerfile, based on Debian 11, and adds in the CUDA, TensortRT, and supporting libraries in the requirements-tensorrt.txt file. If you just want to build the TensorRT version, you can run make local-trt to build that target in the makefile.

If you don't need a specific base image, it would be easier to start with the TensorRT image from NVidia's container repo. The instructions to convert the TRT Yolo models make use of one of these images from NVidia.

Hope this helps!

@klutzzykitty
Copy link

@NateMeyer
thanks!
Yeah that helps but that's not exactly what I was looking for.

Following that would also build frigate v0.12 with all the new UI features as well isn't it ?

I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled.
because you started working no it when 0.11.1 was the official stable release isn't it ?

sure I can run make local-trt but it does also build frigate with other add-ons like web RTC and UI changes.
Or is it that running make local-trt from your fork builds it on V0.11.1 ?

@NickM-27
Copy link
Sponsor Collaborator

@NateMeyer thanks! Yeah that helps but that's not exactly what I was looking for.

Following that would also build frigate v0.12 with all the new UI features as well isn't it ?

I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled. because you started working no it when 0.11.1 was the official stable release isn't it ?

sure I can run make local-trt but it does also build frigate with other add-ons like web RTC and UI changes. Or is it that running make local-trt from your fork builds it on V0.11.1 ?

You can see on the branch that the nvidia-detector work started on commit 368c07c which is well into 0.12 development. Also, it is unclear why that would make a difference, either way you are building frigate with webUI / other modules.

You'll probably just want to follow https://deploy-preview-4055--frigate-docs.netlify.app/development/contributing and run off the latest dev branch. That way you don't need to worry about building the whole image and can just work on the frigate backend

@klutzzykitty
Copy link

klutzzykitty commented Mar 21, 2023

@NateMeyer thanks! Yeah that helps but that's not exactly what I was looking for.
Following that would also build frigate v0.12 with all the new UI features as well isn't it ?
I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled. because you started working no it when 0.11.1 was the official stable release isn't it ?
sure I can run make local-trt but it does also build frigate with other add-ons like web RTC and UI changes. Or is it that running make local-trt from your fork builds it on V0.11.1 ?

You can see on the branch that the nvidia-detector work started on commit 368c07c which is well into 0.12 development. Also, it is unclear why that would make a difference, either way you are building frigate with webUI / other modules.

You'll probably just want to follow https://deploy-preview-4055--frigate-docs.netlify.app/development/contributing and run off the latest dev branch. That way you don't need to worry about building the whole image and can just work on the frigate backend

Hey Nick!
Ah, I see now.

It sure doesn't make a difference for normal use. I however was trying to experiment with a few of my rough understandings and changes already done on my custom built frigate image which is on V0.11.1
Ex: like having a notification trigger when neighbours throw garbage into a certain area (relies on motion detection).
Or identifying the color of my dogs that get detected as I have a few Kangals of different colors.

I was thinking development of tensorrt integration would have used the code of whatever stable release existed that time, i.e. V0.11.1.

so I suppose i'd have to go with following the image build instructions with V0.12 betas only isn't it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants