Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Add TLS/SSL support to gRPC channels #18631

Merged
merged 80 commits into from
Oct 21, 2021
Merged

Conversation

oscarknagg
Copy link
Contributor

@oscarknagg oscarknagg commented Sep 15, 2021

Why are these changes needed?

Ray currently has limited authentication and security capabilities. Some organisations need these features for compliance reasons in order to use Ray. Also Dask, a similar library to Ray, has mTLS support already so this PR would give feature parity to Ray.

This PR adds optional TLS support to all gRPC channels in Ray. TLS is off by default bit if TLS support is enabled then users will need a correct set of SSL credentials in order to connect to the Ray client server. Also, gRPC communications between workers will be encrypted.

Performance impact

Adding mTLS does have a performance impact. The plot below shows the ratio of execution times between Ray with/without mTLS for some Dask-on-Ray operations on random DataFrames. The performance impact is negligible for large data presumably because TLS handshakes between workers become a small fraction of execution time. For small data there is a significant performance hit although I believe many organisations would accept this tradeoff. This performance hit could be reduced by using connection pooling/session tickets as optimisations later down the line.

image

Request for feedback

At the moment TLS support is toggled on/off with a set of environment variables. This is convenient for me and is already widely used in Ray Tune however this doesn't seem to be done much in Ray Core. Should I implement new command line/ ray.init() arguments for TLS or are environment variables acceptable? If we go down the path of new CLI args what form should they take? (I would suggest following Dask's example as its fairly simple)

I've added a new test module to test that only connections from authenticated clients are allowed when TLS is enabled. I've also changed some of the fixtures in 'test_basic.py` to also run some tests with TLS enabled to check that basic functionality still works with TLS. However, Dask has a separate test file to check that basic operations work with TLS. I think my way is better because it reduces duplication - do you agree or do you think it's better to have a separate test file for this?

Related issue number

Closes #17290

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

src/ray/rpc/grpc_server.h Outdated Show resolved Hide resolved
Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one suggestion on the config definition.

@ericl
Copy link
Contributor

ericl commented Oct 19, 2021

Few more conflicts.

@ericl ericl removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Oct 19, 2021
@ericl
Copy link
Contributor

ericl commented Oct 20, 2021

Can you pull in master again? I think the build might have been broken at your current master commit.

@ericl ericl added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Oct 20, 2021
@oscarknagg
Copy link
Contributor Author

I've pulled in master again, hopefully this one looks good.

@ericl
Copy link
Contributor

ericl commented Oct 20, 2021

Everything looks good except Windows build:

//python/ray/tests:test_tls_auth FAILED in 3 out of 3 in 36.9s
Stats over 3 runs: max = 36.9s, min = 35.5s, avg = 36.4s, dev = 0.6s
C:/tmp/vlncal46/execroot/com_github_ray_project_ray/bazel-out/x64_windows-opt/testlogs/python/ray/tests/test_tls_auth/test.log
C:/tmp/vlncal46/execroot/com_github_ray_project_ray/bazel-out/x64_windows-opt/testlogs/python/ray/tests/test_tls_auth/test_attempts/attempt_1.log
C:/tmp/vlncal46/execroot/com_github_ray_project_ray/bazel-out/x64_windows-opt/testlogs/python/ray/tests/test_tls_auth/test_attempts/attempt_2.log

@ericl ericl removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Oct 21, 2021
@ericl ericl merged commit 5a05e89 into ray-project:master Oct 21, 2021
@zhe-thoughts
Copy link
Collaborator

This is very cool contribution. Thanks @oscarknagg (and @ericl for review!)

@oscarknagg
Copy link
Contributor Author

Thanks @zhe-thoughts. Would the project be open to another contribution that adds optional TLS to Ray's Redis communication? (https://redis.io/topics/encryption)

@zhe-thoughts
Copy link
Collaborator

zhe-thoughts commented Oct 21, 2021

@oscarknagg : we are working on removing Redis as a dependency actually: https://github.com/ray-project/ray/milestone/65

cc @mwtian

@mwtian
Copy link
Member

mwtian commented Oct 21, 2021

@oscarknagg @zhe-thoughts Great contribution! Yes, the plan is to make Ray not shipping Redis by default, targeting EOY.

@oscarknagg
Copy link
Contributor Author

@mwtian That's really cool. I assume this means that the pickled code for remote functions/actors will also be sent using gRPC via the internal KV store?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Authentication and encryption in Ray - SSL support
7 participants