From 6a13b094585bd85674825088779077dc1d03104c Mon Sep 17 00:00:00 2001 From: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com> Date: Wed, 24 May 2023 12:50:37 -0700 Subject: [PATCH] [Docs] [runtime_env] Add instructions on using `.netrc` for remote URIs (#35578) Users can provide dependencies via a remote URI in their runtime_env. To access private dependencies, users must include authentication information with their request. Commonly, this is done by including credentials in the URI itself. However, this pattern can be insecure since Ray may log the URI or use it to name temporary directories. Instead, users should supply their credentials using a .netrc file. This change adds documentation explaining how to use a .netrc file on VMs or KubeRay. Thanks to @Xalag and @Martin4R for the discussion in #28253. Some of the examples have been adapted from that issue. netrc documentation link: https://anyscale-ray--35578.com.readthedocs.build/en/35578/ray-core/runtime_env_auth.html runtime_env URL templates link: https://anyscale-ray--35578.com.readthedocs.build/en/35578/ray-core/handling-dependencies.html#option-2-manually-create-url-slower-to-implement-but-recommended-for-production-environments Related issue number See #28253 --- doc/source/ray-core/handling-dependencies.rst | 2 +- doc/source/ray-core/runtime_env_auth.md | 122 ++++++++++++++++++ 2 files changed, 123 insertions(+), 1 deletion(-) create mode 100644 doc/source/ray-core/runtime_env_auth.md diff --git a/doc/source/ray-core/handling-dependencies.rst b/doc/source/ray-core/handling-dependencies.rst index c9ddd015d0054..26ca1cde706ff 100644 --- a/doc/source/ray-core/handling-dependencies.rst +++ b/doc/source/ray-core/handling-dependencies.rst @@ -683,7 +683,7 @@ Here is a list of different use cases and corresponding URLs: runtime_env = {"working_dir": ("https://github.com" "/[username]/[repository]/archive/[commit hash].zip")} -- Example: Retrieve package from a private GitHub repository using a Personal Access Token +- Example: Retrieve package from a private GitHub repository using a Personal Access Token **during development**. **For production** see :ref:`this document ` to learn how to authenticate private dependencies safely. .. testcode:: diff --git a/doc/source/ray-core/runtime_env_auth.md b/doc/source/ray-core/runtime_env_auth.md new file mode 100644 index 0000000000000..93f306727e736 --- /dev/null +++ b/doc/source/ray-core/runtime_env_auth.md @@ -0,0 +1,122 @@ +(runtime-env-auth)= +# Authenticating Remote URIs in runtime_env + +This section helps you: + +* Avoid leaking remote URI credentials in your `runtime_env` +* Provide credentials safely in KubeRay +* Understand best practices for authenticating your remote URI + +## Authenticating Remote URIs + +You can add dependencies to your `runtime_env` with [remote URIs](remote-uris). This is straightforward for files hosted publicly, because you simply paste the public URI into your `runtime_env`: + +```python +runtime_env = {"working_dir": ( + "https://github.com/" + "username/repo/archive/refs/heads/master.zip" + ) +} +``` + +However, dependencies hosted privately, in a private GitHub repo for example, require authentication. One common way to authenticate is to insert credentials into the URI itself: + +```python +runtime_env = {"working_dir": ( + "https://username:personal_access_token@github.com/" + "username/repo/archive/refs/heads/master.zip" + ) +} +``` + +In this example, `personal_access_token` is a secret credential that authenticates this URI. While Ray can successfully access your dependencies using authenticated URIs, **you should not include secret credentials in your URIs** for two reasons: + +1. Ray may log the URIs used in your `runtime_env`, which means the Ray logs could contain your credentials. +2. Ray stores your remote dependency package in a local directory, and it uses a parsed version of the remote URI–including your credential–as the directory's name. + +In short, your remote URI is not treated as a secret, so it should not contain secret info. Instead, use a `netrc` file. + +## Running on VMs: the netrc File + +The [netrc file](https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html) contains credentials that Ray uses to automatically log into remote servers. Set your credentials in this file instead of in the remote URI: + +```bash +# "$HOME/.netrc" + +machine github.com +login username +password personal_access_token +``` + +In this example, the `machine github.com` line specifies that any access to `github.com` should be authenticated using the provided `login` and `password`. + +:::{note} +On Unix, name the `netrc` file as `.netrc`. On Windows, name the +file as `_netrc`. +::: + +The `netrc` file requires owner read/write access, so make sure to run the `chmod` command after creating the file: + +```bash +chmod 600 "$HOME/.netrc" +``` + +Add the `netrc` file to your VM container's home directory, so Ray can access the `runtime_env`'s private remote URIs, even when they don't contain credentials. + +## Running on KubeRay: Secrets with netrc + +[KubeRay](https://ray-project.github.io/kuberay/) can also obtain credentials from a `netrc` file for remote URIs. Supply your `netrc` file using a Kubernetes secret and a Kubernetes volume with these steps: + +1\. Launch your Kubernetes cluster. + +2\. Create the `netrc` file locally in your home directory. + +3\. Store the `netrc` file's contents as a Kubernetes secret on your cluster: + +```bash +kubectl create secret generic netrc-secret --from-file=.netrc="$HOME/.netrc" +``` + +4\. Expose the secret to your KubeRay application using a mounted volume, and update the `NETRC` environment variable to point to the `netrc` file. Include the following YAML in your KubeRay config. + +```yaml +headGroupSpec: + ... + containers: + - name: ... + image: rayproject/ray:latest + ... + volumeMounts: + - mountPath: "/home/ray/netrcvolume/" + name: netrc-kuberay + readOnly: true + env: + - name: NETRC + value: "/home/ray/netrcvolume/.netrc" + volumes: + - name: netrc-kuberay + secret: + secretName: netrc-secret + +workerGroupSpecs: + ... + containers: + - name: ... + image: rayproject/ray:latest + ... + volumeMounts: + - mountPath: "/home/ray/netrcvolume/" + name: netrc-kuberay + readOnly: true + env: + - name: NETRC + value: "/home/ray/netrcvolume/.netrc" + volumes: + - name: netrc-kuberay + secret: + secretName: netrc-secret +``` + +5\. Apply your KubeRay config. + +Your KubeRay application can use the `netrc` file to access private remote URIs, even when they don't contain credentials.