-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Add TLS/SSL support to gRPC channels #18631
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one suggestion on the config definition.
…nel in recent code
Few more conflicts. |
Can you pull in master again? I think the build might have been broken at your current master commit. |
I've pulled in master again, hopefully this one looks good. |
Everything looks good except Windows build: //python/ray/tests:test_tls_auth FAILED in 3 out of 3 in 36.9s |
This is very cool contribution. Thanks @oscarknagg (and @ericl for review!) |
Thanks @zhe-thoughts. Would the project be open to another contribution that adds optional TLS to Ray's Redis communication? (https://redis.io/topics/encryption) |
@oscarknagg : we are working on removing Redis as a dependency actually: https://github.com/ray-project/ray/milestone/65 cc @mwtian |
@oscarknagg @zhe-thoughts Great contribution! Yes, the plan is to make Ray not shipping Redis by default, targeting EOY. |
@mwtian That's really cool. I assume this means that the pickled code for remote functions/actors will also be sent using gRPC via the internal KV store? |
Why are these changes needed?
Ray currently has limited authentication and security capabilities. Some organisations need these features for compliance reasons in order to use Ray. Also Dask, a similar library to Ray, has mTLS support already so this PR would give feature parity to Ray.
This PR adds optional TLS support to all gRPC channels in Ray. TLS is off by default bit if TLS support is enabled then users will need a correct set of SSL credentials in order to connect to the Ray client server. Also, gRPC communications between workers will be encrypted.
Performance impact
Adding mTLS does have a performance impact. The plot below shows the ratio of execution times between Ray with/without mTLS for some Dask-on-Ray operations on random DataFrames. The performance impact is negligible for large data presumably because TLS handshakes between workers become a small fraction of execution time. For small data there is a significant performance hit although I believe many organisations would accept this tradeoff. This performance hit could be reduced by using connection pooling/session tickets as optimisations later down the line.
Request for feedback
At the moment TLS support is toggled on/off with a set of environment variables. This is convenient for me and is already widely used in Ray Tune however this doesn't seem to be done much in Ray Core. Should I implement new command line/ ray.init() arguments for TLS or are environment variables acceptable? If we go down the path of new CLI args what form should they take? (I would suggest following Dask's example as its fairly simple)
I've added a new test module to test that only connections from authenticated clients are allowed when TLS is enabled. I've also changed some of the fixtures in 'test_basic.py` to also run some tests with TLS enabled to check that basic functionality still works with TLS. However, Dask has a separate test file to check that basic operations work with TLS. I think my way is better because it reduces duplication - do you agree or do you think it's better to have a separate test file for this?
Related issue number
Closes #17290
Checks
scripts/format.sh
to lint the changes in this PR.