-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core][1/N] Add standalone runtime_env_agent. #37158
[core][1/N] Add standalone runtime_env_agent. #37158
Conversation
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one question!
Any particular reason why we copy & paste runtime_env_agent.py now? Is it difficult to just use the modules under dashboard and change it later? (so that we can avoid having accidental code into runtime_env_agent.py while we are working on refactoring). In this way, we can avoid creating a new test file now (and if we want to change the path, we can do it right before we enable it by default)?
@@ -0,0 +1,560 @@ | |||
import asyncio |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I assume this code is 100% copy/paste?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not 100% copy paste. I have these changes:
- imports, change of
*_consts.py
; introducedGcsAioClient
; removeruntime_env_agent_pb2_grpc
- removed each function's grpc
context
arg - added logs
The diff is pasted here:
diff --git a/tmp/1.py b/python/ray/_private/runtime_env/agent/runtime_env_agent.py
index babaa97eb0..2d5052a633 100644
--- a/tmp/1.py
+++ b/python/ray/_private/runtime_env/agent/runtime_env_agent.py
@@ -11,15 +11,14 @@ from ray._private.ray_constants import (
DEFAULT_RUNTIME_ENV_TIMEOUT_SECONDS,
)
-import ray.dashboard.consts as dashboard_consts
-import ray.dashboard.modules.runtime_env.runtime_env_consts as runtime_env_consts
-import ray.dashboard.utils as dashboard_utils
+import ray._private.runtime_env.agent.runtime_env_consts as runtime_env_consts
from ray._private.ray_logging import setup_component_logger
from ray._private.runtime_env.conda import CondaPlugin
from ray._private.runtime_env.container import ContainerManager
from ray._private.runtime_env.context import RuntimeEnvContext
from ray._private.runtime_env.java_jars import JavaJarsPlugin
from ray._private.runtime_env.pip import PipPlugin
+from ray._private.gcs_utils import GcsAioClient
from ray._private.runtime_env.plugin import (
RuntimeEnvPlugin,
create_for_plugin_if_needed,
@@ -29,9 +28,8 @@ from ray._private.runtime_env.plugin import RuntimeEnvPluginManager
from ray._private.runtime_env.py_modules import PyModulesPlugin
from ray._private.runtime_env.working_dir import WorkingDirPlugin
from ray.core.generated import (
- agent_manager_pb2,
runtime_env_agent_pb2,
- runtime_env_agent_pb2_grpc,
+ agent_manager_pb2,
)
from ray.core.generated.runtime_env_common_pb2 import (
RuntimeEnvState as ProtoRuntimeEnvState,
@@ -158,10 +156,7 @@ class ReferenceTable:
return self._runtime_env_reference
-class RuntimeEnvAgent(
- dashboard_utils.DashboardAgentModule,
- runtime_env_agent_pb2_grpc.RuntimeEnvServiceServicer,
-):
+class RuntimeEnvAgent:
"""An RPC server to create and delete runtime envs.
Attributes:
@@ -170,10 +165,19 @@ class RuntimeEnvAgent(
LOG_FILENAME = "runtime_env_agent.log"
- def __init__(self, dashboard_agent):
- super().__init__(dashboard_agent)
- self._runtime_env_dir = dashboard_agent.runtime_env_dir
- self._logging_params = dashboard_agent.logging_params
+ def __init__(
+ self,
+ runtime_env_dir,
+ logging_params,
+ gcs_address,
+ temp_dir,
+ address,
+ runtime_env_agent_port,
+ ):
+ super().__init__()
+ self._runtime_env_dir = runtime_env_dir
+ self._logging_params = logging_params
+ self._gcs_address = gcs_address
self._per_job_logger_cache = dict()
# Cache the results of creating envs to avoid repeatedly calling into
# conda and other slow calls.
@@ -181,7 +185,7 @@ class RuntimeEnvAgent(
# Maps a serialized runtime env to a lock that is used
# to prevent multiple concurrent installs of the same env.
self._env_locks: Dict[str, asyncio.Lock] = dict()
- self._gcs_aio_client = self._dashboard_agent.gcs_aio_client
+ self._gcs_aio_client = GcsAioClient(address=self._gcs_address)
self._pip_plugin = PipPlugin(self._runtime_env_dir)
self._conda_plugin = CondaPlugin(self._runtime_env_dir)
@@ -194,7 +198,7 @@ class RuntimeEnvAgent(
self._working_dir_plugin = WorkingDirPlugin(
self._runtime_env_dir, self._gcs_aio_client
)
- self._container_manager = ContainerManager(dashboard_agent.temp_dir)
+ self._container_manager = ContainerManager(temp_dir)
# TODO(architkulkarni): "base plugins" and third-party plugins should all go
# through the same code path. We should never need to refer to
@@ -226,6 +230,12 @@ class RuntimeEnvAgent(
# be confined to the runtime env agent log file `self.LOG_FILENAME`.
self._logger.propagate = False
+ self._logger.info("Starting runtime env agent at pid %s", os.getpid())
+ self._logger.info("Parent raylet pid is %s", int(os.environ["RAY_RAYLET_PID"]))
+ self._logger.info(
+ "Listening to address %s, port %d", address, runtime_env_agent_port
+ )
+
def uris_parser(self, runtime_env):
result = list()
for name, plugin_setup_context in self._plugin_manager.plugins.items():
@@ -251,7 +261,7 @@ class RuntimeEnvAgent(
loop = get_or_create_event_loop()
# Cache the bad runtime env result by ttl seconds.
loop.call_later(
- dashboard_consts.BAD_RUNTIME_ENV_CACHE_TTL_SECONDS,
+ runtime_env_consts.BAD_RUNTIME_ENV_CACHE_TTL_SECONDS,
delete_runtime_env,
)
else:
@@ -268,7 +278,7 @@ class RuntimeEnvAgent(
self._per_job_logger_cache[job_id] = per_job_logger
return self._per_job_logger_cache[job_id]
- async def GetOrCreateRuntimeEnv(self, request, context):
+ async def GetOrCreateRuntimeEnv(self, request):
self._logger.debug(
f"Got request from {request.source_process} to increase "
"reference for runtime env: "
@@ -487,7 +497,7 @@ class RuntimeEnvAgent(
error_message=error_message,
)
- async def DeleteRuntimeEnvIfPossible(self, request, context):
+ async def DeleteRuntimeEnvIfPossible(self, request):
self._logger.info(
f"Got request from {request.source_process} to decrease "
"reference for runtime env: "
@@ -516,7 +526,7 @@ class RuntimeEnvAgent(
status=agent_manager_pb2.AGENT_RPC_STATUS_OK
)
- async def GetRuntimeEnvsInfo(self, request, context):
+ async def GetRuntimeEnvsInfo(self, request):
"""Return the runtime env information of the node."""
# TODO(sang): Currently, it only includes runtime_env information.
# We should include the URI information which includes,
@@ -548,13 +558,3 @@ class RuntimeEnvAgent(
reply.runtime_env_states.append(runtime_env_state)
reply.total = len(runtime_env_states)
return reply
-
- async def run(self, server):
- if server:
- runtime_env_agent_pb2_grpc.add_RuntimeEnvServiceServicer_to_server(
- self, server
- )
-
- @staticmethod
- def is_minimal_module():
- return True
@@ -0,0 +1,20 @@ | |||
import ray._private.ray_constants as ray_constants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question as ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replied above
# reply is serialzied protobuf GetRuntimeEnvsInfoReply | ||
async def get_runtime_envs_info(request): | ||
data = await request.read() | ||
request = runtime_env_agent_pb2.GetRuntimeEnvsInfoRequest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: if the parsing fails here, what's going to happen? Is there any good way to do validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it raises (protobuf's) message.DecodeError. aiohttp handles this by returning a 500:
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< Content-Length: 55
< Date: Fri, 07 Jul 2023 18:40:54 GMT
< Server: Python/3.9 aiohttp/3.8.4
< Connection: close
<
500 Internal Server Error
* Closing connection 0
Server got itself in trouble%
|
||
import runtime_env_consts # noqa: E402 | ||
from runtime_env_agent import RuntimeEnvAgent # noqa: E402 | ||
from aiohttp import web # noqa: E402 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: how does it work with aiohttp dependency downloaded by ray[default]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we prepend the sys path so it will always use vendored aiohttp
regardless of the customer's environment
Signed-off-by: Ruiyang Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I prefer to not create a new file runtime_env_agent.py
(any way we just add some if/else to the original file?). But if it is necessary, let's just make sure to keep a close eye on agent/runtime_env_agent.py (so that we don't miss bug fixes and stuff like that added to that file).
Nice work! |
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Bhavpreet Singh <[email protected]>
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: 久龙 <[email protected]>
#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]>
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: Bhavpreet Singh <[email protected]>
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]>
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: NripeshN <[email protected]>
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: NripeshN <[email protected]>
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: NripeshN <[email protected]>
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: harborn <[email protected]>
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: harborn <[email protected]>
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: harborn <[email protected]>
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]>
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]>
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager.
Why are these changes needed? Breaks runtime env agent from dashboard agent into a standalone process. Part of issue ray-project#35472. The whole initiative is very big (see PR ray-project#36917) so we break it down into smaller pieces. This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses aiohttp so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files) are .gitignored and excluded from lints. The file test_runtime_env_agent.py is almost identical to the pre-existing dashboard/modules/runtime_env/tests/test_runtime_env_agent.py, only with these changes: import paths removal of grpc server extra loggings The main.py is the "driver" which serves arg parsing and http server. To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test). Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: e428265 <[email protected]>
ray-project#37158 broke `Doctest (CPU)`. It introduced vendored files that are casuing import issues during doctest. This PR fixes the issue by ignoring the vendored files. Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: e428265 <[email protected]>
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: e428265 <[email protected]>
…oject#37280) Part of issue ray-project#35472. This PR adds a boost beast based HTTP client, and uses it to talk to runtime env agent as introduced in ray-project#37158. This new library is purely additive, and is not used anywhere. In the next PR it will be used to replace agent_manager. Signed-off-by: Victor <[email protected]>
Why are these changes needed?
Breaks runtime env agent from dashboard agent into a standalone process. Part of issue #35472.
The whole initiative is very big (see PR #36917) so we break it down into smaller pieces.
This PR introduces a new standalone runtime_env_agent that is a HTTP server and is independent from dashboard_agent. It uses
aiohttp
so we vendor it in wheel building time. The vendor path (ray/_private/runtime_env/thirdparty_files
) are.gitignore
d and excluded from lints.The file
test_runtime_env_agent.py
is almost identical to the pre-existingdashboard/modules/runtime_env/tests/test_runtime_env_agent.py
, only with these changes:The
main.py
is the "driver" which serves arg parsing and http server.To limit the PR size, this change does NOT invoke or run the agent anywhere (other than the unit test).
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.