-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Storage] Add storage translation for newly created buckets #3671
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @romilbhardwaj for catching this! I found that it might make more sense to do the translation locally, instead of skipping the check on controller. See comments below. Wdyt?
sky/data/storage.py
Outdated
@@ -325,6 +326,13 @@ def __deepcopy__(self, memo): | |||
|
|||
def _validate_existing_bucket(self): | |||
"""Validates the storage fields for existing buckets.""" | |||
if controller_utils.is_running_on_jobs_controller(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of skipping the check on the controller, we may want to translate those storage entries into external ones in controller_utils.maybe_translate_local_file_mounts_and_sync_up
before submitting to the controller.
Reason: if the storage entry does not contain store
field, it might be created on one cloud provider locally, but the controller may create another one on a different cloud provider:
file_mounts:
/output_path:
name: my-bucket
mode: MOUNT
…o controller_new_storage_fix
Thanks @Michaelvll - updated the implementation to translate storage entries to externally sourced storages. Updated PR title and description too. Running tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @romilbhardwaj! The PR looks mostly good to me. : )
'persistent': storage_obj.persistent, | ||
'mode': storage_lib.StorageMode.MOUNT.value, | ||
}) | ||
updated_mount_storages[storage_path] = new_storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to set force_delete
to True
, so that when persistent=False
, the jobs controller can delete the storage, even if it is considered as external storage on the controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! we should add force_delete. updated the PR
…o controller_new_storage_fix
Thanks @Michaelvll! Tested with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @romilbhardwaj! Just tested it a bit and it works great. LGTM.
Co-authored-by: Zhanghao Wu <[email protected]>
Closes #3666.
We skip the storage name validation because it is already done when the job is submitted from the client and the guardrails are not required for the controller.We now translate storage objects which create new buckets to become externally sourced buckets.
E.g.,
is translated to:
Also updated smoke tests to test for writing to an object store.
Tested (run the relevant ones):
pytest -v tests/test_smoke.py::test_managed_jobs_storage --kubernetes
pytest -v tests/test_smoke.py::test_managed_jobs_storage --aws
pytest -v tests/test_smoke.py::test_managed_jobs_storage --gcp
pytest tests/test_smoke.py::TestStorageWithCredentials