Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add option to avoid uploading credentials and custom labels for GCP #2904

Merged
merged 17 commits into from
Feb 12, 2024

Conversation

Michaelvll
Copy link
Collaborator

@Michaelvll Michaelvll commented Dec 27, 2023

This is to avoid uploading credentials of the current cloud to be launched, as we can use the service account assigned to the instance to access the buckets and cloud APIs.

Fixes #2820

To enable this feature, a user should specify the following config in the ~/.sky/config.yaml, and this only supports AWS and GCP for now.

aws:
  remote_identity: SERVICE_ACCOUNT

gcp:
  remote_identity: SERVICE_ACCOUNT

Tested (run the relevant ones):

  • Code formatting: bash format.sh
  • Any manual or new tests for this PR (please specify below)
    • sky launch --cloud gcp --cpus 2 -i 0 echo hi
    • sky launch --cloud gcp --cpus 2 -i 0 echo hi with the following config.yaml
      gcp:
        remote_identity: SERVICE_ACCOUNT
        instance_tags:
          owner: zhwu
    • sky launch --cloud aws --cpus 2 -i 0 echo hi with the following config.yaml
      aws:
        remote_identity: SERVICE_ACCOUNT
        instance_tags:
          owner: zhwu
    • sky launch --cloud gcp --cpus 2 -i 0 --num-nodes 2 echo hi
  • All smoke tests: pytest tests/test_smoke.py --gcp
  • All smoke tests: pytest tests/test_smoke.py --aws
  • Relevant individual smoke tests: pytest tests/test_smoke.py::test_fill_in_the_name
    • pytest tests/test_smoke.py::test_file_mounts --aws with two private buckets (1 aws, 1 gcp) enabled in using_file_mounts.yaml.
    • pytest tests/test_smoke.py::test_file_mounts --gcp with two private buckets (1 aws, 1 gcp) enabled in using_file_mounts.yaml.
  • Backward compatibility tests: bash tests/backward_comaptibility_tests.sh

@Michaelvll Michaelvll changed the title [Core] Avoid uploading credentials of the current cloud [Core] Add option to avoid uploading credentials and custom labels for GCP Feb 8, 2024
@Michaelvll Michaelvll marked this pull request as ready for review February 8, 2024 22:58
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Michaelvll, did a pass and some questions.

docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
sky/templates/gcp-ray.yml.j2 Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
f'{skypilot_config.loaded_config_path!r} for {cloud}, but it '
'not supported this cloud, please use \'USER\' instead.')
excluded_clouds = [cloud]
credentials = sky_check.get_cloud_credential_file_mounts(excluded_clouds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's always a singleton, any reason to make the arg a list?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for the generality of the function, as in the future it might be able to allow the remote instance to not have multiple clouds' credential, e.g., if the image has the cloud credential set already. I am ok to make it singleton first, but I guess it does not matter that much to have an argument as a list. : )

sky/exceptions.py Outdated Show resolved Hide resolved
sky/exceptions.py Outdated Show resolved Hide resolved
@concretevitamin
Copy link
Member

Actually a question: Do we want users to toggle this setting vs. always defaulting to use a service account for clouds that support it?

@Michaelvll
Copy link
Collaborator Author

Michaelvll commented Feb 11, 2024

Actually a question: Do we want users to toggle this setting vs. always defaulting to use a service account for clouds that support it?

I think it might be better to ask the user to specify the config explicitly to turn this on. Reasons:

  1. If SERVICE_ACCOUNTS is turned on by default, any managed spot jobs/service replica launched on the clouds different from the controller will not be able to access the resources (e.g., the buckets) on the cloud as the controller. (This might be possible to solve by having some special handling for the controller in our code to still upload those credentials to the controllers)
  2. The service accounts can have different permissions than the user account, it might be quite suprising if users can access a bucket locally while the remote instance fails to access it.
  3. Considering this feature is requested by some of our users, it might be easier and more convenient for most of the others to have the same permission on the remote instances created.

Wdyt?

@concretevitamin
Copy link
Member

concretevitamin commented Feb 11, 2024

For (2) & (3): Is it true that our main branch currently uses service accounts for AWS/GCP remote nodes? IIRC service accounts take precedence over the credential files present on those nodes in identity discovery.

If this is true and the permission issue hasn't caused any troubles, maybe it's fine. #2820 mentioned some permission mismatch which I don't fully understand.

For (1): I think there are two cases

  • single-cloud users
  • cloud X controller (spot or serve) to launch clusters on cloud Y

Since former is more common, we probably should minimize their config overhead and default to the no-upload behavior if we can make it work. Once the user wants the latter behavior, they should be able to set this flag and have the (maybe an existing) controller use the upload behavior for new jobs.


Relatedly we need to test the AWS_PROFILE=xx sky launch case in #2820.

Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some minor nits.

docs/source/reference/config.rst Outdated Show resolved Hide resolved
sky/backends/backend_utils.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/backends/cloud_vm_ray_backend.py Outdated Show resolved Hide resolved
sky/provision/slurm/__init__.py Outdated Show resolved Hide resolved
raise NotImplementedError

def in_cloud_list(self, cloud_list: Iterable['Cloud']) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the boundary is subjective. Here I think we want to have a minimal cognitive load for readers that browse and learn about the interface. If that reader is a new cloud implementor or someone who wants to know "what does it take to implement a Cloud", I think it's best to leave out such one-liner utility funcs.

sky/utils/schemas.py Outdated Show resolved Hide resolved
Copy link
Member

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Michaelvll!

docs/source/reference/config.rst Outdated Show resolved Hide resolved
docs/source/reference/config.rst Outdated Show resolved Hide resolved
@@ -40,4 +41,6 @@
'CLOUD_REGISTRY',
'ProvisionerVersion',
'StatusVersion',
# Utility functions
'cloud_in_list',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(In the future we can put utils in this init module too)

@Michaelvll Michaelvll merged commit 2f0432b into master Feb 12, 2024
19 checks passed
@Michaelvll Michaelvll deleted the avoid-upload-credential branch February 12, 2024 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Skypilot copies .aws/credentials
2 participants