Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cudo] Update and bugfixes #3256

Merged
merged 9 commits into from
Jul 1, 2024
Merged

Conversation

JungleCatSW
Copy link
Contributor

Provides checks to prevent VMs hanging on pending, and updates base images for VM with newer Cuda version.

Tested (run the relevant ones):

  • [x ] Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
  • All smoke tests: pytest tests/test_smoke.py
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating the Cudo support @JungleCatSW! Left two comments :)

sky/provision/cudo/instance.py Outdated Show resolved Hide resolved
sky/provision/cudo/cudo_wrapper.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @JungleCatSW! Just left a comment.

sky/provision/cudo/cudo_wrapper.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change @JungleCatSW! Left a comment for a potential remnant. : )

sky/clouds/service_catalog/data_fetchers/fetch_cudo.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Michaelvll Michaelvll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @JungleCatSW! It works for me. Merging.

Note: not sure why it happens, but during launching a V100 machine on Cudo compute, skypilot fails to pip install ray[default]==2.9.3 as its dependency as it takes forever to download the package.

@Michaelvll
Copy link
Collaborator

Just tested it with sky launch --gpus V100 --cloud cudo nvidia-smi; sky autostop -i 0 --down; sky status -r, and it works perfectly. Merging now.

@Michaelvll Michaelvll merged commit 0a4b0ef into skypilot-org:master Jul 1, 2024
20 checks passed
Michaelvll pushed a commit that referenced this pull request Aug 23, 2024
* bug fixes and improvements

* moved shared function to helper, added error message

* moved catalog helper to utils

* small fixes

* fetch cudo fix

* id fix for vms.csv file

* format fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants