Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][1/N] Add standalone runtime_env_agent. #37158

Merged
merged 4 commits into from
Jul 11, 2023

Conversation

rynewang
Copy link
Contributor

@rynewang rynewang commented Jul 6, 2023

Why are these changes needed?

Breaks runtime env agent from dashboard agent into a standalone process. Part of issue #35472.

The whole initiative is very big (see PR #36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

  • import paths
  • removal of grpc server
  • extra loggings

The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
@rynewang rynewang requested a review from pcmoritz July 6, 2023 20:35
Signed-off-by: Ruiyang Wang <[email protected]>
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question!

Any particular reason why we copy & paste runtime_env_agent.py now? Is it difficult to just use the modules under dashboard and change it later? (so that we can avoid having accidental code into runtime_env_agent.py while we are working on refactoring). In this way, we can avoid creating a new test file now (and if we want to change the path, we can do it right before we enable it by default)?

@@ -0,0 +1,560 @@
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I assume this code is 100% copy/paste?

Copy link
Contributor Author

@rynewang rynewang Jul 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not 100% copy paste. I have these changes:

  • imports, change of *_consts.py; introduced GcsAioClient; remove runtime_env_agent_pb2_grpc
  • removed each function's grpc context arg
  • added logs

The diff is pasted here:

diff --git a/tmp/1.py b/python/ray/_private/runtime_env/agent/runtime_env_agent.py
index babaa97eb0..2d5052a633 100644
--- a/tmp/1.py
+++ b/python/ray/_private/runtime_env/agent/runtime_env_agent.py
@@ -11,15 +11,14 @@ from ray._private.ray_constants import (
     DEFAULT_RUNTIME_ENV_TIMEOUT_SECONDS,
 )
 
-import ray.dashboard.consts as dashboard_consts
-import ray.dashboard.modules.runtime_env.runtime_env_consts as runtime_env_consts
-import ray.dashboard.utils as dashboard_utils
+import ray._private.runtime_env.agent.runtime_env_consts as runtime_env_consts
 from ray._private.ray_logging import setup_component_logger
 from ray._private.runtime_env.conda import CondaPlugin
 from ray._private.runtime_env.container import ContainerManager
 from ray._private.runtime_env.context import RuntimeEnvContext
 from ray._private.runtime_env.java_jars import JavaJarsPlugin
 from ray._private.runtime_env.pip import PipPlugin
+from ray._private.gcs_utils import GcsAioClient
 from ray._private.runtime_env.plugin import (
     RuntimeEnvPlugin,
     create_for_plugin_if_needed,
@@ -29,9 +28,8 @@ from ray._private.runtime_env.plugin import RuntimeEnvPluginManager
 from ray._private.runtime_env.py_modules import PyModulesPlugin
 from ray._private.runtime_env.working_dir import WorkingDirPlugin
 from ray.core.generated import (
-    agent_manager_pb2,
     runtime_env_agent_pb2,
-    runtime_env_agent_pb2_grpc,
+    agent_manager_pb2,
 )
 from ray.core.generated.runtime_env_common_pb2 import (
     RuntimeEnvState as ProtoRuntimeEnvState,
@@ -158,10 +156,7 @@ class ReferenceTable:
         return self._runtime_env_reference
 
 
-class RuntimeEnvAgent(
-    dashboard_utils.DashboardAgentModule,
-    runtime_env_agent_pb2_grpc.RuntimeEnvServiceServicer,
-):
+class RuntimeEnvAgent:
     """An RPC server to create and delete runtime envs.
 
     Attributes:
@@ -170,10 +165,19 @@ class RuntimeEnvAgent(
 
     LOG_FILENAME = "runtime_env_agent.log"
 
-    def __init__(self, dashboard_agent):
-        super().__init__(dashboard_agent)
-        self._runtime_env_dir = dashboard_agent.runtime_env_dir
-        self._logging_params = dashboard_agent.logging_params
+    def __init__(
+        self,
+        runtime_env_dir,
+        logging_params,
+        gcs_address,
+        temp_dir,
+        address,
+        runtime_env_agent_port,
+    ):
+        super().__init__()
+        self._runtime_env_dir = runtime_env_dir
+        self._logging_params = logging_params
+        self._gcs_address = gcs_address
         self._per_job_logger_cache = dict()
         # Cache the results of creating envs to avoid repeatedly calling into
         # conda and other slow calls.
@@ -181,7 +185,7 @@ class RuntimeEnvAgent(
         # Maps a serialized runtime env to a lock that is used
         # to prevent multiple concurrent installs of the same env.
         self._env_locks: Dict[str, asyncio.Lock] = dict()
-        self._gcs_aio_client = self._dashboard_agent.gcs_aio_client
+        self._gcs_aio_client = GcsAioClient(address=self._gcs_address)
 
         self._pip_plugin = PipPlugin(self._runtime_env_dir)
         self._conda_plugin = CondaPlugin(self._runtime_env_dir)
@@ -194,7 +198,7 @@ class RuntimeEnvAgent(
         self._working_dir_plugin = WorkingDirPlugin(
             self._runtime_env_dir, self._gcs_aio_client
         )
-        self._container_manager = ContainerManager(dashboard_agent.temp_dir)
+        self._container_manager = ContainerManager(temp_dir)
 
         # TODO(architkulkarni): "base plugins" and third-party plugins should all go
         # through the same code path.  We should never need to refer to
@@ -226,6 +230,12 @@ class RuntimeEnvAgent(
         # be confined to the runtime env agent log file `self.LOG_FILENAME`.
         self._logger.propagate = False
 
+        self._logger.info("Starting runtime env agent at pid %s", os.getpid())
+        self._logger.info("Parent raylet pid is %s", int(os.environ["RAY_RAYLET_PID"]))
+        self._logger.info(
+            "Listening to address %s, port %d", address, runtime_env_agent_port
+        )
+
     def uris_parser(self, runtime_env):
         result = list()
         for name, plugin_setup_context in self._plugin_manager.plugins.items():
@@ -251,7 +261,7 @@ class RuntimeEnvAgent(
                 loop = get_or_create_event_loop()
                 # Cache the bad runtime env result by ttl seconds.
                 loop.call_later(
-                    dashboard_consts.BAD_RUNTIME_ENV_CACHE_TTL_SECONDS,
+                    runtime_env_consts.BAD_RUNTIME_ENV_CACHE_TTL_SECONDS,
                     delete_runtime_env,
                 )
             else:
@@ -268,7 +278,7 @@ class RuntimeEnvAgent(
             self._per_job_logger_cache[job_id] = per_job_logger
         return self._per_job_logger_cache[job_id]
 
-    async def GetOrCreateRuntimeEnv(self, request, context):
+    async def GetOrCreateRuntimeEnv(self, request):
         self._logger.debug(
             f"Got request from {request.source_process} to increase "
             "reference for runtime env: "
@@ -487,7 +497,7 @@ class RuntimeEnvAgent(
                 error_message=error_message,
             )
 
-    async def DeleteRuntimeEnvIfPossible(self, request, context):
+    async def DeleteRuntimeEnvIfPossible(self, request):
         self._logger.info(
             f"Got request from {request.source_process} to decrease "
             "reference for runtime env: "
@@ -516,7 +526,7 @@ class RuntimeEnvAgent(
             status=agent_manager_pb2.AGENT_RPC_STATUS_OK
         )
 
-    async def GetRuntimeEnvsInfo(self, request, context):
+    async def GetRuntimeEnvsInfo(self, request):
         """Return the runtime env information of the node."""
         # TODO(sang): Currently, it only includes runtime_env information.
         # We should include the URI information which includes,
@@ -548,13 +558,3 @@ class RuntimeEnvAgent(
             reply.runtime_env_states.append(runtime_env_state)
         reply.total = len(runtime_env_states)
         return reply
-
-    async def run(self, server):
-        if server:
-            runtime_env_agent_pb2_grpc.add_RuntimeEnvServiceServicer_to_server(
-                self, server
-            )
-
-    @staticmethod
-    def is_minimal_module():
-        return True

@@ -0,0 +1,20 @@
import ray._private.ray_constants as ray_constants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question as ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replied above

python/ray/_private/runtime_env/agent/main.py Outdated Show resolved Hide resolved
# reply is serialzied protobuf GetRuntimeEnvsInfoReply
async def get_runtime_envs_info(request):
data = await request.read()
request = runtime_env_agent_pb2.GetRuntimeEnvsInfoRequest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: if the parsing fails here, what's going to happen? Is there any good way to do validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it raises (protobuf's) message.DecodeError. aiohttp handles this by returning a 500:

< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< Content-Length: 55
< Date: Fri, 07 Jul 2023 18:40:54 GMT
< Server: Python/3.9 aiohttp/3.8.4
< Connection: close
<
500 Internal Server Error

* Closing connection 0
Server got itself in trouble%


import runtime_env_consts # noqa: E402
from runtime_env_agent import RuntimeEnvAgent # noqa: E402
from aiohttp import web # noqa: E402
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: how does it work with aiohttp dependency downloaded by ray[default]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we prepend the sys path so it will always use vendored aiohttp regardless of the customer's environment

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 7, 2023
Signed-off-by: Ruiyang Wang <[email protected]>
@rynewang rynewang removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 7, 2023
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I prefer to not create a new file runtime_env_agent.py (any way we just add some if/else to the original file?). But if it is necessary, let's just make sure to keep a close eye on agent/runtime_env_agent.py (so that we don't miss bug fixes and stuff like that added to that file).

@scv119 scv119 merged commit 2e64c1d into ray-project:master Jul 11, 2023
118 of 122 checks passed
@scv119
Copy link
Contributor

scv119 commented Jul 11, 2023

Nice work!

Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 11, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Bhavpreet Singh <[email protected]>
SongGuyang pushed a commit to alipay/ant-ray that referenced this pull request Jul 12, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: 久龙 <[email protected]>
@bveeramani bveeramani mentioned this pull request Jul 12, 2023
8 tasks
c21 pushed a commit that referenced this pull request Jul 12, 2023
#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files.

Signed-off-by: Balaji Veeramani <[email protected]>
rkooo567 pushed a commit that referenced this pull request Jul 19, 2023
Part of issue #35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in #37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 24, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Bhavpreet Singh <[email protected]>
Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 24, 2023
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files.

Signed-off-by: Balaji Veeramani <[email protected]>
Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 24, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
Bhav00 pushed a commit to Bhav00/ray that referenced this pull request Jul 28, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: NripeshN <[email protected]>
NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files.

Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: NripeshN <[email protected]>
NripeshN pushed a commit to NripeshN/ray that referenced this pull request Aug 15, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Signed-off-by: NripeshN <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: harborn <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files.

Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: harborn <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Signed-off-by: harborn <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files.

Signed-off-by: Balaji Veeramani <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472.

The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces.

This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints.

The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes:

import paths
removal of grpc server
extra loggings
The main.py is the "driver" which serves arg parsing and http server.

To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: e428265 <[email protected]>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files.

Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: e428265 <[email protected]>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Signed-off-by: e428265 <[email protected]>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
…oject#37280)

Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158.

This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.

Signed-off-by: Victor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants