feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

Van-QA · 2024-04-04T09:15:44Z

Problem
Users encounter difficulties when attempting to run GPU due to missing dependencies / driver. We need to help users bypass issues via the download for dll and guideline within the application, reducing frustration caused by missing dependencies.

Success Criteria for Nvidia Driver MVP

Users encountering Nvidia driver issues should be provided with clear guidance on installation steps.
Error messages related to missing Nvidia drivers should direct users to installation instructions.
Address complexities related to Nvidia SMI commands and export path for Nvidia drivers to ensure smooth operation of the application.
A restart of the Jan app is required. After that, users should be able to continue using the app

Success Criteria for CUDA toolkit MVP

In case of a failed initial attempt due to missing CUDA toolkit, users should be prompted to install it. Clear error messages related to missing CUDA toolkit should redirect users to the settings for download / installation process of CUDA toolkit.
Users should be able to download the binary file without encountering errors.
A restart of the Jan app is required. After that, users should be able to continue using the app

Sample copy
For CUDA Toolkit:

Error: The CUDA toolkit may be unavailable. Please use the 'Install Additional Dependencies' setting to proceed with the download / installation process.
Description (if any): Initiates the download and installation process for optional dependencies. Use this setting if you encounter errors related to CUDA toolkit during application execution.

For Nvidia Driver:

Error: Problem with Nvidia drivers. Please follow the 'Nvidia Drivers guideline' to access installation instructions and ensure proper functioning of the application.
Description (if any): Provides guidance on downloading and installing Nvidia drivers, which are crucial for optimal performance of the application via GPU. Use this document if you encounter issues related to missing or outdated Nvidia drivers during application execution.

The text was updated successfully, but these errors were encountered:

imtuyethan · 2024-04-05T10:39:06Z

Design: https://www.figma.com/file/ytn1nRZ17FUmJHTlhmZB9f/Jan-App-(version-1)?type=design&node-id=7256-174031&mode=design&t=orMGlzcEQQe8hIVW-4

cc @louis-jan @urmauur @namchuai

louis-jan · 2024-04-11T04:49:29Z

Since we will be working on the Better GPU Onboarding epic, this issue is just a small part of error handling and has been scaled down for release 0.4.11.

I would prefer to focus solely on Model load error handling and provide a clear error message with guidance for users to install additional dependencies directly. This would eliminate the need for extra steps, such as Thread -> GPU Setting -> Extension Setting, which falls under the broader topic of GPU onboarding.

AC:

The user attempts to load a model and sends a message. If the model requires additional dependencies, such as the CUDA toolkit, they will receive an error message: "The CUDA toolkit may be unavailable. Please use the Install Additional Dependencies setting to proceed with the download/installation process."
When the user clicks on Install Additional Dependencies, they are directed to the Nitro Inference Extension settings.
The user clicks on Install and waits for the installation process to complete.
Finally, the user can resend the message, and it should be successful this time, as the required dependencies are now installed.

We have also scaled down the scope for the scenario where the Nvidia driver is missing. Users will not be able to enable GPU acceleration without a properly installed Nvidia driver

This behavior is consistent with our TensorRT-LLM support

imtuyethan · 2024-04-11T06:23:11Z

Since we will be working on the Better GPU Onboarding epic, this issue is just a small part of error handling and has been scaled down for release 0.4.11.

I would prefer to focus solely on Model load error handling and provide a clear error message with guidance for users to install additional dependencies directly. This would eliminate the need for extra steps, such as Thread -> GPU Setting -> Extension Setting, which falls under the broader topic of GPU onboarding.

AC:

The user attempts to load a model and sends a message. If the model requires additional dependencies, such as the CUDA toolkit, they will receive an error message: "The CUDA toolkit may be unavailable. Please use the Install Additional Dependencies setting to proceed with the download/installation process."

When the user clicks on Install Additional Dependencies, they are directed to the Nitro Inference Extension settings.

The user clicks on Install and waits for the installation process to complete.

Finally, the user can resend the message, and it should be successful this time, as the required dependencies are now installed.

We have also scaled down the scope for the scenario where the Nvidia driver is missing. Users will not be able to enable GPU acceleration without a properly installed Nvidia driver

This behavior is consistent with our TensorRT-LLM support

As it is urgent & important to cut release, i'd go with your decision now because the way i did is not correct either (needs to have a full epic, which im gonna discuss with Daniel & Hiro this afternoon to cover this).

We need a bandage anw, let's go with what's easier for you.

Van-QA added the type: feature request A new feature label Apr 4, 2024

Van-QA changed the title ~~feat: Better error handling for Nvidia driver and CUDA toolkit~~ feat: Better error handling for Nvidia driver and CUDA toolkit via settings Apr 4, 2024

Van-QA mentioned this issue Apr 4, 2024

epic: Better Nvidia GPU Experience #2165

Closed

4 tasks

Van-QA assigned louis-jan Apr 4, 2024

Van-QA added this to the v0.4.11 milestone Apr 4, 2024

Van-QA added the P1: important Important feature / fix label Apr 4, 2024

Van-QA changed the title ~~feat: Better error handling for Nvidia driver and CUDA toolkit via settings~~ feat: Better error handling for Nvidia driver and CUDA toolkit Apr 4, 2024

imtuyethan self-assigned this Apr 4, 2024

imtuyethan added the needs designs Needs designs label Apr 4, 2024

Van-QA changed the title ~~feat: Better error handling for Nvidia driver and CUDA toolkit~~ feat: MVP - Better error handling for Nvidia driver and CUDA toolkit Apr 4, 2024

Van-QA mentioned this issue Apr 5, 2024

milestone: Release 0.4.11 #2627

Closed

10 tasks

louis-jan mentioned this issue Apr 10, 2024

feat: nitro additional dependencies #2674

Merged

3 tasks

Van-QA closed this as completed Apr 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

Van-QA commented Apr 4, 2024 •

edited

Loading

imtuyethan commented Apr 5, 2024

louis-jan commented Apr 11, 2024 •

edited

Loading

imtuyethan commented Apr 11, 2024

feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

Comments

Van-QA commented Apr 4, 2024 • edited Loading

imtuyethan commented Apr 5, 2024

louis-jan commented Apr 11, 2024 • edited Loading

imtuyethan commented Apr 11, 2024

Van-QA commented Apr 4, 2024 •

edited

Loading

louis-jan commented Apr 11, 2024 •

edited

Loading