[core][1/N] Add standalone runtime_env_agent. #37158

rynewang · 2023-07-06T20:17:15Z

Why are these changes needed?

Breaks runtime env agent from dashboard agent into a standalone process. Part of issue #35472.

The whole initiative is very big (see PR #36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings

The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Ruiyang Wang <[email protected]>

rkooo567

I have one question!

Any particular reason why we copy & paste runtime_env_agent.py now? Is it difficult to just use the modules under dashboard and change it later? (so that we can avoid having accidental code into runtime_env_agent.py while we are working on refactoring). In this way, we can avoid creating a new test file now (and if we want to change the path, we can do it right before we enable it by default)?

rkooo567 · 2023-07-07T08:15:10Z

python/ray/_private/runtime_env/agent/runtime_env_agent.py

@@ -0,0 +1,560 @@
+import asyncio


Can I assume this code is 100% copy/paste?

It's not 100% copy paste. I have these changes:

imports, change of *_consts.py; introduced GcsAioClient; remove runtime_env_agent_pb2_grpc

removed each function's grpc context arg

added logs

The diff is pasted here:

diff --git a/tmp/1.py b/python/ray/_private/runtime_env/agent/runtime_env_agent.py index babaa97eb0..2d5052a633 100644 --- a/tmp/1.py +++ b/python/ray/_private/runtime_env/agent/runtime_env_agent.py @@ -11,15 +11,14 @@ from ray._private.ray_constants import ( DEFAULT_RUNTIME_ENV_TIMEOUT_SECONDS, ) -import ray.dashboard.consts as dashboard_consts -import ray.dashboard.modules.runtime_env.runtime_env_consts as runtime_env_consts -import ray.dashboard.utils as dashboard_utils +import ray._private.runtime_env.agent.runtime_env_consts as runtime_env_consts from ray._private.ray_logging import setup_component_logger from ray._private.runtime_env.conda import CondaPlugin from ray._private.runtime_env.container import ContainerManager from ray._private.runtime_env.context import RuntimeEnvContext from ray._private.runtime_env.java_jars import JavaJarsPlugin from ray._private.runtime_env.pip import PipPlugin +from ray._private.gcs_utils import GcsAioClient from ray._private.runtime_env.plugin import ( RuntimeEnvPlugin, create_for_plugin_if_needed, @@ -29,9 +28,8 @@ from ray._private.runtime_env.plugin import RuntimeEnvPluginManager from ray._private.runtime_env.py_modules import PyModulesPlugin from ray._private.runtime_env.working_dir import WorkingDirPlugin from ray.core.generated import ( - agent_manager_pb2, runtime_env_agent_pb2, - runtime_env_agent_pb2_grpc, + agent_manager_pb2, ) from ray.core.generated.runtime_env_common_pb2 import ( RuntimeEnvState as ProtoRuntimeEnvState, @@ -158,10 +156,7 @@ class ReferenceTable: return self._runtime_env_reference -class RuntimeEnvAgent( - dashboard_utils.DashboardAgentModule, - runtime_env_agent_pb2_grpc.RuntimeEnvServiceServicer, -): +class RuntimeEnvAgent: """An RPC server to create and delete runtime envs. Attributes: @@ -170,10 +165,19 @@ class RuntimeEnvAgent( LOG_FILENAME = "runtime_env_agent.log" - def __init__(self, dashboard_agent): - super().__init__(dashboard_agent) - self._runtime_env_dir = dashboard_agent.runtime_env_dir - self._logging_params = dashboard_agent.logging_params + def __init__( + self, + runtime_env_dir, + logging_params, + gcs_address, + temp_dir, + address, + runtime_env_agent_port, + ): + super().__init__() + self._runtime_env_dir = runtime_env_dir + self._logging_params = logging_params + self._gcs_address = gcs_address self._per_job_logger_cache = dict() # Cache the results of creating envs to avoid repeatedly calling into # conda and other slow calls. @@ -181,7 +185,7 @@ class RuntimeEnvAgent( # Maps a serialized runtime env to a lock that is used # to prevent multiple concurrent installs of the same env. self._env_locks: Dict[str, asyncio.Lock] = dict() - self._gcs_aio_client = self._dashboard_agent.gcs_aio_client + self._gcs_aio_client = GcsAioClient(address=self._gcs_address) self._pip_plugin = PipPlugin(self._runtime_env_dir) self._conda_plugin = CondaPlugin(self._runtime_env_dir) @@ -194,7 +198,7 @@ class RuntimeEnvAgent( self._working_dir_plugin = WorkingDirPlugin( self._runtime_env_dir, self._gcs_aio_client ) - self._container_manager = ContainerManager(dashboard_agent.temp_dir) + self._container_manager = ContainerManager(temp_dir) # TODO(architkulkarni): "base plugins" and third-party plugins should all go # through the same code path. We should never need to refer to @@ -226,6 +230,12 @@ class RuntimeEnvAgent( # be confined to the runtime env agent log file `self.LOG_FILENAME`. self._logger.propagate = False + self._logger.info("Starting runtime env agent at pid %s", os.getpid()) + self._logger.info("Parent raylet pid is %s", int(os.environ["RAY_RAYLET_PID"])) + self._logger.info( + "Listening to address %s, port %d", address, runtime_env_agent_port + ) + def uris_parser(self, runtime_env): result = list() for name, plugin_setup_context in self._plugin_manager.plugins.items(): @@ -251,7 +261,7 @@ class RuntimeEnvAgent( loop = get_or_create_event_loop() # Cache the bad runtime env result by ttl seconds. loop.call_later( - dashboard_consts.BAD_RUNTIME_ENV_CACHE_TTL_SECONDS, + runtime_env_consts.BAD_RUNTIME_ENV_CACHE_TTL_SECONDS, delete_runtime_env, ) else: @@ -268,7 +278,7 @@ class RuntimeEnvAgent( self._per_job_logger_cache[job_id] = per_job_logger return self._per_job_logger_cache[job_id] - async def GetOrCreateRuntimeEnv(self, request, context): + async def GetOrCreateRuntimeEnv(self, request): self._logger.debug( f"Got request from {request.source_process} to increase " "reference for runtime env: " @@ -487,7 +497,7 @@ class RuntimeEnvAgent( error_message=error_message, ) - async def DeleteRuntimeEnvIfPossible(self, request, context): + async def DeleteRuntimeEnvIfPossible(self, request): self._logger.info( f"Got request from {request.source_process} to decrease " "reference for runtime env: " @@ -516,7 +526,7 @@ class RuntimeEnvAgent( status=agent_manager_pb2.AGENT_RPC_STATUS_OK ) - async def GetRuntimeEnvsInfo(self, request, context): + async def GetRuntimeEnvsInfo(self, request): """Return the runtime env information of the node.""" # TODO(sang): Currently, it only includes runtime_env information. # We should include the URI information which includes, @@ -548,13 +558,3 @@ class RuntimeEnvAgent( reply.runtime_env_states.append(runtime_env_state) reply.total = len(runtime_env_states) return reply - - async def run(self, server): - if server: - runtime_env_agent_pb2_grpc.add_RuntimeEnvServiceServicer_to_server( - self, server - ) - - @staticmethod - def is_minimal_module(): - return True

rkooo567 · 2023-07-07T08:15:18Z

python/ray/_private/runtime_env/agent/runtime_env_consts.py

@@ -0,0 +1,20 @@
+import ray._private.ray_constants as ray_constants


same question as ^

replied above

python/ray/_private/runtime_env/agent/main.py

rkooo567 · 2023-07-07T08:21:45Z

python/ray/_private/runtime_env/agent/main.py

+ # reply is serialzied protobuf GetRuntimeEnvsInfoReply
+ async def get_runtime_envs_info(request):
+ data = await request.read()
+ request = runtime_env_agent_pb2.GetRuntimeEnvsInfoRequest()


Q: if the parsing fails here, what's going to happen? Is there any good way to do validation?

it raises (protobuf's) message.DecodeError. aiohttp handles this by returning a 500:

< HTTP/1.1 500 Internal Server Error < Content-Type: text/plain; charset=utf-8 < Content-Length: 55 < Date: Fri, 07 Jul 2023 18:40:54 GMT < Server: Python/3.9 aiohttp/3.8.4 < Connection: close < 500 Internal Server Error * Closing connection 0 Server got itself in trouble%

rkooo567 · 2023-07-07T08:27:27Z

python/ray/_private/runtime_env/agent/main.py

+
+import runtime_env_consts # noqa: E402
+from runtime_env_agent import RuntimeEnvAgent # noqa: E402
+from aiohttp import web # noqa: E402


Q: how does it work with aiohttp dependency downloaded by ray[default]?

we prepend the sys path so it will always use vendored aiohttp regardless of the customer's environment

Signed-off-by: Ruiyang Wang <[email protected]>

rkooo567

Hmm I prefer to not create a new file runtime_env_agent.py (any way we just add some if/else to the original file?). But if it is necessary, let's just make sure to keep a close eye on agent/runtime_env_agent.py (so that we don't miss bug fixes and stuff like that added to that file).

python/ray/_private/runtime_env/agent/runtime_env_agent.py

scv119 · 2023-07-11T02:23:59Z

Nice work!

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Bhavpreet Singh <[email protected]>

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: 久龙 <[email protected]>

#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]>

Part of issue #35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in #37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Bhavpreet Singh <[email protected]>

ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]>

…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: NripeshN <[email protected]>

ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: NripeshN <[email protected]>

…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: NripeshN <[email protected]>

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: harborn <[email protected]>

ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: harborn <[email protected]>

…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: harborn <[email protected]>

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]>

ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]>

…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: e428265 <[email protected]>

ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: e428265 <[email protected]>

…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: e428265 <[email protected]>

…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: Victor <[email protected]>

Added new runtime_env_agent.

a910b2f

Signed-off-by: Ruiyang Wang <[email protected]>

rynewang requested review from jjyao, rickyyx and rkooo567 July 6, 2023 20:17

rynewang requested review from richardliaw, ericl and edoakes as code owners July 6, 2023 20:17

rynewang assigned scv119, jjyao, rickyyx and rkooo567 Jul 6, 2023

lint

9c45382

Signed-off-by: Ruiyang Wang <[email protected]>

rynewang requested a review from pcmoritz July 6, 2023 20:35

lint 2

1323c73

Signed-off-by: Ruiyang Wang <[email protected]>

rkooo567 reviewed Jul 7, 2023

View reviewed changes

rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 7, 2023

add type annotation

c9de5c8

Signed-off-by: Ruiyang Wang <[email protected]>

rynewang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 7, 2023

rkooo567 approved these changes Jul 10, 2023

View reviewed changes

rynewang assigned edoakes and richardliaw Jul 10, 2023

edoakes approved these changes Jul 10, 2023

View reviewed changes

pcmoritz reviewed Jul 10, 2023

View reviewed changes

python/ray/_private/runtime_env/agent/runtime_env_agent.py Show resolved Hide resolved

rynewang mentioned this pull request Jul 11, 2023

[core] [2/N] Add async C++ http client for runtime env agent. #37280

Merged

8 tasks

scv119 merged commit 2e64c1d into ray-project:master Jul 11, 2023
118 of 122 checks passed

bveeramani mentioned this pull request Jul 12, 2023

[CI][Docs] Fix Doctest (CPU) #37358

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][1/N] Add standalone runtime_env_agent. #37158

[core][1/N] Add standalone runtime_env_agent. #37158

rynewang commented Jul 6, 2023 •

edited

Loading

rkooo567 left a comment

rkooo567 Jul 7, 2023

rynewang Jul 7, 2023 •

edited

Loading

rkooo567 Jul 7, 2023

rynewang Jul 7, 2023

rkooo567 Jul 7, 2023

rynewang Jul 7, 2023

rkooo567 Jul 7, 2023

rynewang Jul 7, 2023

rkooo567 left a comment •

edited

Loading

scv119 commented Jul 11, 2023

		@@ -0,0 +1,20 @@
		import ray._private.ray_constants as ray_constants

[core][1/N] Add standalone runtime_env_agent. #37158

[core][1/N] Add standalone runtime_env_agent. #37158

Conversation

rynewang commented Jul 6, 2023 • edited Loading

Why are these changes needed?

Checks

rkooo567 left a comment

Choose a reason for hiding this comment

rkooo567 Jul 7, 2023

Choose a reason for hiding this comment

rynewang Jul 7, 2023 • edited Loading

Choose a reason for hiding this comment

rkooo567 Jul 7, 2023

Choose a reason for hiding this comment

rynewang Jul 7, 2023

Choose a reason for hiding this comment

rkooo567 Jul 7, 2023

Choose a reason for hiding this comment

rynewang Jul 7, 2023

Choose a reason for hiding this comment

rkooo567 Jul 7, 2023

Choose a reason for hiding this comment

rynewang Jul 7, 2023

Choose a reason for hiding this comment

rkooo567 left a comment • edited Loading

Choose a reason for hiding this comment

scv119 commented Jul 11, 2023

rynewang commented Jul 6, 2023 •

edited

Loading

rynewang Jul 7, 2023 •

edited

Loading

rkooo567 left a comment •

edited

Loading