-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia TensorRT detector #4718
Nvidia TensorRT detector #4718
Conversation
✅ Deploy Preview for frigate-docs ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
As far as I understand, we're not looking to have that in this 0.12 release, adds a whole other level of device & user support / complexity |
FYI, the current image size is about 5GB when building the amd64 image. |
Wow that's massive, I wonder if it makes sense to have an image as it is now and then one with everything plus nvidia detector. That gets more confusing though. |
I was able to get around the inference CUDNN errors by avoiding the async memcopy and execute functions. Not sure why these had trouble, may revisit this later if needed. |
I think you might be able to get smaller image sizes using the I just submitted a new pull request #4774 which enables detectors to reside in separate containers and communicate using zeromq. This might help facilitate in maintaining various flavors of container images? |
That may be an option, but first I want to get something working in a single image. I'll take a closer look at your PR. The cuda base image does not have TensorRT, python, or any other dependencies of frigate installed, so it would likely end up being close to the same size. The next issue I'm trying to wrap my head around is the creation of the trt engine. From what I can tell, the process to generate the trt model 1) requires an NVidia GPU, and 2) generates a model specific to that GPU. The other TensorRT images that I am working from solve this by generating the trt during initialization (and preferably saving the output so it only has to do so once). Unfortunately this PR currently doesn't include all the tools and onnx/tensorflow/pytorch framework to do the conversion and the two images I looked at that do (#2548 and #3016) are 2-4 times larger than mine. Options could be to bloat this image more with the extra tools for conversion, or to have the user run another dedicated script/image to do the conversion before running frigate. From what I understand of the Frigate project, I'm giving the following considerations:
Sorry for my rambling. I admittedly haven't looked too closely at how the HA add-on is set up, maybe adding the second image isn't too much of a stretch. Maybe we move forward with a different tag for the TensorRT-enabled image. If there is a preferred solution, let me know. I haven't made as much progress as I'd hoped the last 2 weeks for a variety of reasons, but I should be able to get a bit more done this week. |
Yeah, I think as powerful as an nvidia GPU based detection is, it is still relatively niche for the overall frigate user base and will continue to be that way. What I was imagining was a separate nvidia image which uses the main (current) Docker image as its base and builds on top of that with the tensorrt parts so the other detectors can be combined with tensorrt. I think regardless of how it's done, there needs to be a minimal image option. That's also just my ramblings / opinions. |
There are some TensorRT images here that should be useful as a base: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorrt My thought on this is to lean more into relegating Nvidia GPU based detection into sidecar containers or a remote container on another machine, esp in the case of the HA addon. This scenario was actually my entire goal with the zeromq PR. And I would also leave this as more of an advanced use case with limited support, but I don't know how well that flies. But the core image can continue to work and be supported as it always has that way. |
Are there any issues with zeromq when one side is using a different minor version of python? Frigate is currently using 3.9, but all of the Nvidia images I've seen use 3.8 or 3.10. We could use the Nvidia base image for a sidecar, but I'm not sure what will break if try to build the whole project on top of that. Do you have any examples of a remote detector using your zeromq changes? For now I'll plan on building an image with a |
Given that most of the times people with nvidia will be running frigate only on one pc, I think it would be more than acceptable to them (me included) to download library files (if required) via init container and place them inside the volume on the first ever run. That should be a good enough compromise in relation to the image size. |
Use -tensorrt not nvidia so it's not confused with the previous -nvidia |
2881256
to
945a382
Compare
Thanks for the input! Would you mind taking a look at the updated info at the top of this PR about generating the models and let me know if you have any questions. I tried estimating what the final URL would be to download the script from once it was merged into master, use the link I posted above to my fork for now. |
I think I fixed the async execution and added a way to select which GPU is used if multiple are present. I only have a single GPU, so I have not tested that the selected GPU is actually used. I think this leaves the branch in a usable state if we want to kick off some testing. I will rebase and build an image. |
Does not run
…runs without error.
0d82c7c
to
bd4983d
Compare
Image for testing at ghcr.io/natemeyer/frigate:0.12.0-bd4983d-tensorrt. |
Seeing the warning |
Yes, I've seen the same warning all along, it seems benign. I'll try upgrading that package to a newer version. |
Image with updated libraries ghcr.io/natemeyer/frigate:0.12.0-e1b075a-tensorrt |
Spent some time sifting through the image this morning with dive. Dang those nvidia and tensorrt packages are massive. I think a separate tag here is probably the right approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really great.
Just noticed two things:
- Looks like this will be limited to yolo models specifically, so the docs should probably state that.
- We will need to update the github actions
ci.yml
workflow in addition to the Makefile
I think the docs also need to more obviously call out that the tensorrt detector is only available in the specific image |
@blakeblackshear I got a permission error trying to push a commit with the diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index c89b4e9..1c166eb 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -36,7 +36,19 @@ jobs:
context: .
push: true
platforms: linux/amd64,linux/arm64,linux/arm/v7
+ target: frigate
tags: |
ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}
cache-from: type=gha
cache-to: type=gha,mode=max
+ - name: Build and push TensorRT
+ uses: docker/build-push-action@v3
+ with:
+ context: .
+ push: true
+ platforms: linux/amd64
+ target: frigate-tensorrt
+ tags: |
+ ghcr.io/blakeblackshear/frigate:${{ github.ref_name }}-${{ env.SHORT_SHA }}-tensorrt
+ cache-from: type=gha
+ cache-to: type=gha,mode=max
\ No newline at end of file |
I don't think there should be a permission issue, other users have made PRs with changes to those files despite not having write access. What was the error exactly? I think that looks right though. |
I'll try again. It's my own fork, so I should be able to push to it. hmm... |
Oh, ha. my access token |
OK, these two items should be updated now. |
@NateMeyer I was going through other nvidia related discussions here and am a bit lost on how to get any of the PRs running separately. I know that this PR is merged to V0.12 but I'd like to know how to build only the NateMeyer:nvidia-detector branch's docker. Are the instructions present anywhere ? is it just following the Thanks !!! : ) |
Hey @klutzzykitty, This branch builds on the Frigate dockerfile, based on Debian 11, and adds in the CUDA, TensortRT, and supporting libraries in the If you don't need a specific base image, it would be easier to start with the TensorRT image from NVidia's container repo. The instructions to convert the TRT Yolo models make use of one of these images from NVidia. Hope this helps! |
@NateMeyer Following that would also build frigate v0.12 with all the new UI features as well isn't it ? I was trying to ask if your fork builds on V0.11.1 with only nvidia tensorrt enabled. sure I can run |
You can see on the branch that the nvidia-detector work started on commit 368c07c which is well into 0.12 development. Also, it is unclear why that would make a difference, either way you are building frigate with webUI / other modules. You'll probably just want to follow https://deploy-preview-4055--frigate-docs.netlify.app/development/contributing and run off the latest dev branch. That way you don't need to worry about building the whole image and can just work on the frigate backend |
Hey Nick! It sure doesn't make a difference for normal use. I however was trying to experiment with a few of my rough understandings and changes already done on my custom built frigate image which is on V0.11.1 I was thinking development of tensorrt integration would have used the code of whatever stable release existed that time, i.e. V0.11.1. so I suppose i'd have to go with following the image build instructions with V0.12 betas only isn't it ? |
This branch is a work-in-progress and will need cleanup yet before merging. Currently the amd64 image builds and should have the necessary libraries for cuda/tensorrt. These are installed from the NVidia pyindex repository. This was the easiest way I could find to install the libraries without needing to use the nvidia docker image as a base or to login to download the installers.
This is based on the work from yury-sannikov (#2548) and yeahme49 (#3016)
Current Status:
Non-functionalWorking with a single camera inputoccasional segfaultCUDNN Mapping Error when running inferenceInference runs, but results are not parsed correctlyFixed issue with input normalizationGenerate yolo models by running the
tensorrt_models.sh
script. See draft documentation for more details.TODO:
.trt
formatEvaluate support for jetson platforms (arm64)[Arm platforms are not expected to be included in the 0.12 release]