Skip to content

Commit

Permalink
[Jobs] Check if runtime env config is None before using it as dict (#…
Browse files Browse the repository at this point in the history
…44742)

Followup to #44405

runtime_env fields can be reset to None at runtime, which causes the following error:

2024-04-15 13:04:56,871	WARNING job_manager.py:1009 -- Failed to start supervisor actor for job raysubmit_UTb99vaR1DmJ9rkw: ''NoneType' object does not support item assignment'. Full traceback:
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_manager.py", line 989, in submit_job
    runtime_env=self._get_supervisor_runtime_env(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_manager.py", line 818, in _get_supervisor_runtime_env
    config["log_files"] = [self._log_client.get_log_file_path(submission_id)]
TypeError: 'NoneType' object does not support item assignment
This PR explicitly checks for None before using config as a dict, fixing the above error.

It also includes the full traceback in the error log to make this kind of error easier to debug in the future.

---------

Signed-off-by: Archit Kulkarni <[email protected]>
  • Loading branch information
architkulkarni committed Apr 17, 2024
1 parent d07a48c commit fe63d7c
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions dashboard/modules/job/job_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -814,7 +814,10 @@ def _get_supervisor_runtime_env(
runtime_env["env_vars"] = env_vars

if os.getenv(RAY_STREAM_RUNTIME_ENV_LOG_TO_JOB_DRIVER_LOG_ENV_VAR, "0") == "1":
config = runtime_env.get("config", RuntimeEnvConfig())
config = runtime_env.get("config")
# Empty fields may be set to None, so we need to check for None explicitly.
if config is None:
config = RuntimeEnvConfig()
config["log_files"] = [self._log_client.get_log_file_path(submission_id)]
runtime_env["config"] = config
return runtime_env
Expand Down Expand Up @@ -1002,13 +1005,19 @@ async def submit_job(
self._monitor_job(submission_id, job_supervisor=supervisor)
)
except Exception as e:
tb_str = traceback.format_exc()

logger.warning(
f"Failed to start supervisor actor for job {submission_id}: '{e}'"
f". Full traceback:\n{tb_str}"
)
await self._job_info_client.put_status(
submission_id,
JobStatus.FAILED,
message=f"Failed to start supervisor actor {submission_id}: '{e}'",
message=(
f"Failed to start supervisor actor {submission_id}: '{e}'"
f". Full traceback:\n{tb_str}"
),
)

return submission_id
Expand Down

0 comments on commit fe63d7c

Please sign in to comment.