Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

Closed
Tracked by #2165
Van-QA opened this issue Apr 4, 2024 · 3 comments
Closed
Tracked by #2165

feat: MVP - Better error handling for Nvidia driver and CUDA toolkit #2619

Van-QA opened this issue Apr 4, 2024 · 3 comments
Assignees
Labels
needs designs Needs designs P1: important Important feature / fix type: feature request A new feature
Milestone

Comments

@Van-QA
Copy link
Contributor

Van-QA commented Apr 4, 2024

Problem
Users encounter difficulties when attempting to run GPU due to missing dependencies / driver. We need to help users bypass issues via the download for dll and guideline within the application, reducing frustration caused by missing dependencies.

Success Criteria for Nvidia Driver MVP

  1. Users encountering Nvidia driver issues should be provided with clear guidance on installation steps.
  2. Error messages related to missing Nvidia drivers should direct users to installation instructions.
  3. Address complexities related to Nvidia SMI commands and export path for Nvidia drivers to ensure smooth operation of the application.
  4. A restart of the Jan app is required. After that, users should be able to continue using the app

Success Criteria for CUDA toolkit MVP

  1. In case of a failed initial attempt due to missing CUDA toolkit, users should be prompted to install it. Clear error messages related to missing CUDA toolkit should redirect users to the settings for download / installation process of CUDA toolkit.
  2. Users should be able to download the binary file without encountering errors.
  3. A restart of the Jan app is required. After that, users should be able to continue using the app

Sample copy
For CUDA Toolkit:

  • Error: The CUDA toolkit may be unavailable. Please use the 'Install Additional Dependencies' setting to proceed with the download / installation process.

  • Description (if any): Initiates the download and installation process for optional dependencies. Use this setting if you encounter errors related to CUDA toolkit during application execution.

For Nvidia Driver:

  • Error: Problem with Nvidia drivers. Please follow the 'Nvidia Drivers guideline' to access installation instructions and ensure proper functioning of the application.

  • Description (if any): Provides guidance on downloading and installing Nvidia drivers, which are crucial for optimal performance of the application via GPU. Use this document if you encounter issues related to missing or outdated Nvidia drivers during application execution.

@Van-QA Van-QA added the type: feature request A new feature label Apr 4, 2024
@Van-QA Van-QA changed the title feat: Better error handling for Nvidia driver and CUDA toolkit feat: Better error handling for Nvidia driver and CUDA toolkit via settings Apr 4, 2024
@Van-QA Van-QA added this to the v0.4.11 milestone Apr 4, 2024
@Van-QA Van-QA added the P1: important Important feature / fix label Apr 4, 2024
@Van-QA Van-QA changed the title feat: Better error handling for Nvidia driver and CUDA toolkit via settings feat: Better error handling for Nvidia driver and CUDA toolkit Apr 4, 2024
@imtuyethan imtuyethan self-assigned this Apr 4, 2024
@imtuyethan imtuyethan added the needs designs Needs designs label Apr 4, 2024
@Van-QA Van-QA changed the title feat: Better error handling for Nvidia driver and CUDA toolkit feat: MVP - Better error handling for Nvidia driver and CUDA toolkit Apr 4, 2024
@Van-QA Van-QA mentioned this issue Apr 5, 2024
10 tasks
@louis-jan
Copy link
Contributor

louis-jan commented Apr 11, 2024

Since we will be working on the Better GPU Onboarding epic, this issue is just a small part of error handling and has been scaled down for release 0.4.11.

I would prefer to focus solely on Model load error handling and provide a clear error message with guidance for users to install additional dependencies directly. This would eliminate the need for extra steps, such as Thread -> GPU Setting -> Extension Setting, which falls under the broader topic of GPU onboarding.

AC:

  • The user attempts to load a model and sends a message. If the model requires additional dependencies, such as the CUDA toolkit, they will receive an error message: "The CUDA toolkit may be unavailable. Please use the Install Additional Dependencies setting to proceed with the download/installation process."
  • When the user clicks on Install Additional Dependencies, they are directed to the Nitro Inference Extension settings.
  • The user clicks on Install and waits for the installation process to complete.
  • Finally, the user can resend the message, and it should be successful this time, as the required dependencies are now installed.

We have also scaled down the scope for the scenario where the Nvidia driver is missing. Users will not be able to enable GPU acceleration without a properly installed Nvidia driver

This behavior is consistent with our TensorRT-LLM support

@imtuyethan
Copy link
Contributor

Since we will be working on the Better GPU Onboarding epic, this issue is just a small part of error handling and has been scaled down for release 0.4.11.

I would prefer to focus solely on Model load error handling and provide a clear error message with guidance for users to install additional dependencies directly. This would eliminate the need for extra steps, such as Thread -> GPU Setting -> Extension Setting, which falls under the broader topic of GPU onboarding.

AC:

  • The user attempts to load a model and sends a message. If the model requires additional dependencies, such as the CUDA toolkit, they will receive an error message: "The CUDA toolkit may be unavailable. Please use the Install Additional Dependencies setting to proceed with the download/installation process."
  • When the user clicks on Install Additional Dependencies, they are directed to the Nitro Inference Extension settings.
  • The user clicks on Install and waits for the installation process to complete.
  • Finally, the user can resend the message, and it should be successful this time, as the required dependencies are now installed.

We have also scaled down the scope for the scenario where the Nvidia driver is missing. Users will not be able to enable GPU acceleration without a properly installed Nvidia driver

This behavior is consistent with our TensorRT-LLM support

As it is urgent & important to cut release, i'd go with your decision now because the way i did is not correct either (needs to have a full epic, which im gonna discuss with Daniel & Hiro this afternoon to cover this).

We need a bandage anw, let's go with what's easier for you.

@Van-QA Van-QA closed this as completed Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs designs Needs designs P1: important Important feature / fix type: feature request A new feature
Projects
Archived in project
Development

No branches or pull requests

3 participants