Skip to content

Commit

Permalink
[Data] Skip recording memory spilled stats when get_memory_info_reply…
Browse files Browse the repository at this point in the history
… is failed (#42824)

User found the issue the call of `get_memory_info_reply` throws GRPC error if memory load on cluster is heavy.

Signed-off-by: Cheng Su <[email protected]>
  • Loading branch information
c21 committed Jan 30, 2024
1 parent 2e066d1 commit 43631f9
Showing 1 changed file with 16 additions and 8 deletions.
24 changes: 16 additions & 8 deletions python/ray/data/_internal/plan.py
Original file line number Diff line number Diff line change
Expand Up @@ -591,14 +591,22 @@ def execute(
blocks._owned_by_consumer = False

# Retrieve memory-related stats from ray.
reply = get_memory_info_reply(
get_state_from_address(ray.get_runtime_context().gcs_address)
)
if reply.store_stats.spill_time_total_s > 0:
stats.global_bytes_spilled = int(reply.store_stats.spilled_bytes_total)
if reply.store_stats.restore_time_total_s > 0:
stats.global_bytes_restored = int(
reply.store_stats.restored_bytes_total
try:
reply = get_memory_info_reply(
get_state_from_address(ray.get_runtime_context().gcs_address)
)
if reply.store_stats.spill_time_total_s > 0:
stats.global_bytes_spilled = int(
reply.store_stats.spilled_bytes_total
)
if reply.store_stats.restore_time_total_s > 0:
stats.global_bytes_restored = int(
reply.store_stats.restored_bytes_total
)
except Exception as e:
logger.get_logger(log_to_stdout=False).warning(
"Skipping recording memory spilled and restored statistics due to "
f"exception: {e}"
)

stats.dataset_bytes_spilled = 0
Expand Down

0 comments on commit 43631f9

Please sign in to comment.