[Core] Add TLS/SSL support to gRPC channels #18631

oscarknagg · 2021-09-15T12:57:05Z

Why are these changes needed?

Ray currently has limited authentication and security capabilities. Some organisations need these features for compliance reasons in order to use Ray. Also Dask, a similar library to Ray, has mTLS support already so this PR would give feature parity to Ray.

This PR adds optional TLS support to all gRPC channels in Ray. TLS is off by default bit if TLS support is enabled then users will need a correct set of SSL credentials in order to connect to the Ray client server. Also, gRPC communications between workers will be encrypted.

Performance impact

Adding mTLS does have a performance impact. The plot below shows the ratio of execution times between Ray with/without mTLS for some Dask-on-Ray operations on random DataFrames. The performance impact is negligible for large data presumably because TLS handshakes between workers become a small fraction of execution time. For small data there is a significant performance hit although I believe many organisations would accept this tradeoff. This performance hit could be reduced by using connection pooling/session tickets as optimisations later down the line.

Request for feedback

At the moment TLS support is toggled on/off with a set of environment variables. This is convenient for me and is already widely used in Ray Tune however this doesn't seem to be done much in Ray Core. Should I implement new command line/ ray.init() arguments for TLS or are environment variables acceptable? If we go down the path of new CLI args what form should they take? (I would suggest following Dask's example as its fairly simple)

I've added a new test module to test that only connections from authenticated clients are allowed when TLS is enabled. I've also changed some of the fixtures in 'test_basic.py` to also run some tests with TLS enabled to check that basic functionality still works with TLS. However, Dask has a separate test file to check that basic operations work with TLS. I think my way is better because it reduces duplication - do you agree or do you think it's better to have a separate test file for this?

Related issue number

Closes #17290

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

src/ray/common/ray_config_def.h

src/ray/rpc/grpc_server.h

ericl

LGTM, just one suggestion on the config definition.

…nel in recent code

ericl · 2021-10-19T18:01:50Z

Few more conflicts.

ericl · 2021-10-20T02:54:12Z

Can you pull in master again? I think the build might have been broken at your current master commit.

oscarknagg · 2021-10-20T08:29:39Z

I've pulled in master again, hopefully this one looks good.

ericl · 2021-10-20T18:58:43Z

Everything looks good except Windows build:

//python/ray/tests:test_tls_auth FAILED in 3 out of 3 in 36.9s
Stats over 3 runs: max = 36.9s, min = 35.5s, avg = 36.4s, dev = 0.6s
C:/tmp/vlncal46/execroot/com_github_ray_project_ray/bazel-out/x64_windows-opt/testlogs/python/ray/tests/test_tls_auth/test.log
C:/tmp/vlncal46/execroot/com_github_ray_project_ray/bazel-out/x64_windows-opt/testlogs/python/ray/tests/test_tls_auth/test_attempts/attempt_1.log
C:/tmp/vlncal46/execroot/com_github_ray_project_ray/bazel-out/x64_windows-opt/testlogs/python/ray/tests/test_tls_auth/test_attempts/attempt_2.log

zhe-thoughts · 2021-10-21T05:40:55Z

This is very cool contribution. Thanks @oscarknagg (and @ericl for review!)

oscarknagg · 2021-10-21T13:58:53Z

Thanks @zhe-thoughts. Would the project be open to another contribution that adds optional TLS to Ray's Redis communication? (https://redis.io/topics/encryption)

zhe-thoughts · 2021-10-21T15:26:46Z

@oscarknagg : we are working on removing Redis as a dependency actually: https://github.com/ray-project/ray/milestone/65

cc @mwtian

mwtian · 2021-10-21T15:59:57Z

@oscarknagg @zhe-thoughts Great contribution! Yes, the plan is to make Ray not shipping Redis by default, targeting EOY.

oscarknagg · 2021-10-21T16:26:29Z

@mwtian That's really cool. I assume this means that the pickled code for remote functions/actors will also be sent using gRPC via the internal KV store?

oscarknagg added 30 commits September 8, 2021 12:36

Add use_tls_ member to GrpcServer

efe18dd

Hacky TLS

d38af35

Create secure gRPC channels in Python code

3b5f210

Remove unecessary std::cout

01c5cd9

More TLS

2769675

Merge branch 'master' of https://github.com/ray-project/ray into tls

c6ad485

Linting

2962be3

Add secure grpc in tests

64be21a

Fix secure grpc server initialisation

d38e2b0

Merge branch 'master' of https://github.com/ray-project/ray into tls

7aaa8ac

Use single environment variable as feature flag

1668ecc

Merge branch 'master' of https://github.com/ray-project/ray into tls

621cfc7

Pass environment in test_client_builder.py

a2c49d6

Read RAY_USE_TLS in client worker

0b73c38

Unify init_grpc_channel and init_aiogrpc_channel functions

ddc8749

Make function to add port to grpc server

966fc49

Upgrade to mTLS

b173b78

Merge branch 'master' of https://github.com/ray-project/ray into tls

bc39b8f

Function to load certs from env variables

65361a2

Merge branch 'master' of https://github.com/ray-project/ray into tls

2dcff3a

Add example cluster yaml which generates self-signed keys

f19e7a7

Add TLS auth test

b57c2e2

Merge branch 'master' of https://github.com/ray-project/ray into tls

a4cc458

Add some fixtures to run test_basic.py with TLS auth

65f0080

Merge branch 'master' of https://github.com/ray-project/ray into tls

da45c78

Fix test_tls_auth.py

b4dc0ca

Remove duplicated ReadFile function

16c0cb3

Formatting

30bebae

Remove EKS cluster YAML

c551c30

Don't assume TLS env vars are set

1fa0fbf

ericl reviewed Oct 18, 2021

View reviewed changes

src/ray/common/ray_config_def.h Outdated Show resolved Hide resolved

ericl reviewed Oct 18, 2021

View reviewed changes

src/ray/common/ray_config_def.h Outdated Show resolved Hide resolved

ericl reviewed Oct 18, 2021

View reviewed changes

src/ray/rpc/grpc_server.h Outdated Show resolved Hide resolved

ericl approved these changes Oct 18, 2021

View reviewed changes

oscarknagg added 6 commits October 19, 2021 10:23

Remove unecessary logic in ray_config_def.h

50c2da2

Actually check for ConnectionError in test_client_connect_to_tls_server

8599854

Merge branch 'master' of https://github.com/ray-project/ray into tls

60355a2

Remove unused ReadFile declaration

9dfd106

Lint

4feae45

Replace grpc.insercure_channel with ray._private.utils.init_grpc_chan…

5b57d7d

…nel in recent code

Merge branch 'master' of https://github.com/ray-project/ray into tls

a600eaf

ericl removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Oct 19, 2021

Trigger retest

67d32b7

ericl added the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Oct 20, 2021

Merge branch 'master' of https://github.com/ray-project/ray into tls

6fa08dc

oscarknagg and others added 4 commits October 20, 2021 22:27

Attempt to fix windows build

f4032f1

Merge branch 'master' of https://github.com/ray-project/ray into tls

7fb64f0

Merge branch 'master' into tls

f4c8ae7

Update worker.py

e74d707

ericl removed the @external-author-action-required Alternate tag for PRs where the author doesn't have labeling permission. label Oct 21, 2021

ericl merged commit 5a05e89 into ray-project:master Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Add TLS/SSL support to gRPC channels #18631

[Core] Add TLS/SSL support to gRPC channels #18631

oscarknagg commented Sep 15, 2021 •

edited

Loading

ericl left a comment

ericl commented Oct 19, 2021

ericl commented Oct 20, 2021

oscarknagg commented Oct 20, 2021

ericl commented Oct 20, 2021

zhe-thoughts commented Oct 21, 2021

oscarknagg commented Oct 21, 2021

zhe-thoughts commented Oct 21, 2021 •

edited

Loading

mwtian commented Oct 21, 2021

oscarknagg commented Oct 21, 2021

[Core] Add TLS/SSL support to gRPC channels #18631

[Core] Add TLS/SSL support to gRPC channels #18631

Conversation

oscarknagg commented Sep 15, 2021 • edited Loading

Why are these changes needed?

Performance impact

Request for feedback

Related issue number

Checks

ericl left a comment

Choose a reason for hiding this comment

ericl commented Oct 19, 2021

ericl commented Oct 20, 2021

oscarknagg commented Oct 20, 2021

ericl commented Oct 20, 2021

zhe-thoughts commented Oct 21, 2021

oscarknagg commented Oct 21, 2021

zhe-thoughts commented Oct 21, 2021 • edited Loading

mwtian commented Oct 21, 2021

oscarknagg commented Oct 21, 2021

oscarknagg commented Sep 15, 2021 •

edited

Loading

zhe-thoughts commented Oct 21, 2021 •

edited

Loading