Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
-
Updated
Sep 26, 2024 - C++
Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
Add a description, image, and links to the jan topic page so that developers can more easily learn about it.
To associate your repository with the jan topic, visit your repo's landing page and select "manage topics."