Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backport v2.8.next1] Add warning banner to allocate the number of nodes + 1 vGPUs #11009

Open
github-actions bot opened this issue May 10, 2024 · 4 comments
Assignees
Labels
area/harvester kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this QA/None release-note
Milestone

Comments

@github-actions
Copy link
Contributor

This is a backport issue for #10989, automatically created via GitHub Actions workflow initiated by @gaktive

Original issue body:

Setup
Rancher version:v2.8-head
Browser type & version: Chrome Version 124.0.6367.78
Harvester Version: v1.3.0

To Reproduce

  1. Set up vGPU profiles (multiple) in Harvester
  2. Import Harvester into Rancher
  3. Go to Virtualization management -> Harvester UI for cluster -> vGPU Devices and enable a vGPU with 2 allocatable.
  4. From Cluster Management, Create a new 2-node RKE2 cluster with Harvester as the downstream provider. Under Advanced options, add the vGPU with 2 allocatable resources ( same number as the cluster nodes)
  5. After the creation of the cluster is completed, edit the config file of the cluster
  6. Observe the logs of the failed process of provisioning.

Result
Once the harvester cluster is redeployed for any reason (the user edits the config, the nodes go into an error state, etc), the new VMs spin up before the old ones are completely shut down, which causes the "un-schedulable" error as the vGPUs won't be available yet.

Expected Result
We could add a warning banner in the UI to recommend that the user should provision N+1 allocatable vgpu, where N is number of nodes.

@github-actions github-actions bot added area/harvester kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this QA/None release-note labels May 10, 2024
@github-actions github-actions bot added this to the v2.8.next2 milestone May 10, 2024
@nwmac
Copy link
Member

nwmac commented Jul 2, 2024

@torchiaf based on this comment #11017 (comment) - I assume the same is true here - and that we should push this to 2.8.next2 - can you confirm?

@torchiaf
Copy link
Member

torchiaf commented Jul 2, 2024

@torchiaf based on this comment #11017 (comment) - I assume the same is true here - and that we should push this to 2.8.next2 - can you confirm?

@nwmac confirmed.

@nwmac
Copy link
Member

nwmac commented Jul 2, 2024

FYI @gaktive

@gaktive
Copy link
Member

gaktive commented Aug 1, 2024

This may be unblocked by Harvester now. @torchiaf check if that's the case since we need to include the warning in this release too. @ibrokethecloud can get an env up upon request to help.

@torchiaf torchiaf changed the title [backport v2.8.next2] Add warning banner to allocate the number of nodes + 1 vGPUs [backport v2.8.next1] Add warning banner to allocate the number of nodes + 1 vGPUs Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/harvester kind/enhancement QA/dev-automation Issues that engineers have written automation around so QA doesn't have look at this QA/None release-note
Projects
None yet
Development

No branches or pull requests

3 participants