Installing on Windows

Note

The Windows release of TensorRT-LLM is currently in beta. We recommend checking out the v0.12.0 tag for the most stable experience.

Prerequisites

  1. Clone this repository using Git for Windows.

  2. Install the dependencies one of two ways:

    1. Install all dependencies together.

      1. Run the provided PowerShell script setup_env.ps1 located under the /windows/ folder which installs Python and CUDA 12.4.1 automatically with default settings. Run PowerShell as Administrator to use the script.

      ./setup_env.ps1 [-skipCUDA] [-skipPython]
      
      1. Close and re-open any existing PowerShell or Git Bash windows so they pick up the new Path modified by the setup_env.ps1 script above.

    2. Install the dependencies one at a time.

      1. Install Python 3.10.

        1. Select Add python.exe to PATH at the start of the installation. The installation may only add the python command, but not the python3 command.

        2. Navigate to the installation path %USERPROFILE%\AppData\Local\Programs\Python\Python310 (AppData is a hidden folder) and copy python.exe to python3.exe.

      2. Install CUDA 12.5.1 Toolkit. Use the Express Installation option. Installation may require a restart.

  3. If using conda environment, run the following command before installing TensorRT-LLM.

    conda install -c conda-forge pyarrow
    

Steps

  1. Install TensorRT-LLM.

If you have an existing TensorRT installation (from older versions of tensorrt_llm), please execute

pip uninstall -y tensorrt tensorrt_libs tensorrt_bindings
pip uninstall -y nvidia-cublas-cu12 nvidia-cuda-nvrtc-cu12 nvidia-cuda-runtime-cu12 nvidia-cudnn-cu12

before installing TensorRT-LLM with the following command.

pip install tensorrt_llm==0.12.0 --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu121/torch/

Run the following command to verify that your TensorRT-LLM installation is working properly.

python -c "import tensorrt_llm; print(tensorrt_llm._utils.trt_version())"
  1. Build the model.

  2. Deploy the model.

Known Issue

  1. OSError: exception: access violation reading 0x0000000000000000 during import tensorrt_llm or trtllm-build.

This may be caused by an outdated Microsoft Visual C++ Redistributable Version. Please install the latest MSVC and retry. Check the system path to make sure the latest version installed in System32 is searched first. Check dependencies to make sure no other packages are using an outdated version (e.g. package pyarrow might contain an outdated MSCV DLL).