Skip to content

Commit

Permalink
[release][tune] [build_base] Fix tune_scalability_durable_trainable
Browse files Browse the repository at this point in the history
… configuration to force experiment syncs less often (ray-project#31649)

Deflakes `tune_scalability_durable_trainable` release test by configuring it so that forced experiment checkpoint syncs happen less frequently, cutting down on the test runtime.

Signed-off-by: Justin Yu <[email protected]>
  • Loading branch information
justinvyu committed Jan 17, 2023
1 parent f37b1d5 commit a205fda
Showing 1 changed file with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
"""Durable trainable (16 trials, checkpoint to cloud)
In this run, we will start 16 trials on a cluster. The trials create
10 MB checkpoints every 10 seconds and should only keep 2 of these. This test
10 MB checkpoints every 12 seconds and should only keep 2 of these. This test
ensures that durable checkpoints don't slow down experiment progress too much.
Cluster: cluster_16x2.yaml
Expand Down Expand Up @@ -62,7 +62,7 @@ def main(bucket):
ray.init(address="auto")

num_samples = 16
results_per_second = 10 / 60
results_per_second = 5 / 60 # 5 results per minute = 1 every 12 seconds
trial_length_s = 300

max_runtime = 650
Expand All @@ -73,7 +73,7 @@ def main(bucket):
results_per_second=results_per_second,
trial_length_s=trial_length_s,
max_runtime=max_runtime,
checkpoint_freq_s=10, # Once every 10 seconds
checkpoint_freq_s=12, # Once every 12 seconds (once per result)
checkpoint_size_b=int(10 * 1000**2), # 10 MB
keep_checkpoints_num=2,
resources_per_trial={"cpu": 2},
Expand Down

0 comments on commit a205fda

Please sign in to comment.