[release][tune] [build_base] Fix tune_scalability_durable_trainable…

… configuration to force experiment syncs less often (ray-project#31649) Deflakes `tune_scalability_durable_trainable` release test by configuring it so that forced experiment checkpoint syncs happen less frequently, cutting down on the test runtime. Signed-off-by: Justin Yu <[email protected]>
charleslai2000 · Jan 17, 2023 · a205fda · a205fda
1 parent f37b1d5
commit a205fda
Showing 1 changed file with 3 additions and 3 deletions.
diff --git a/release/tune_tests/scalability_tests/workloads/test_durable_trainable.py b/release/tune_tests/scalability_tests/workloads/test_durable_trainable.py
@@ -1,7 +1,7 @@
 """Durable trainable (16 trials, checkpoint to cloud)
 
 In this run, we will start 16 trials on a cluster. The trials create
-10 MB checkpoints every 10 seconds and should only keep 2 of these. This test
+10 MB checkpoints every 12 seconds and should only keep 2 of these. This test
 ensures that durable checkpoints don't slow down experiment progress too much.
 
 Cluster: cluster_16x2.yaml
@@ -62,7 +62,7 @@ def main(bucket):
  ray.init(address="auto")
 
  num_samples = 16
- results_per_second = 10 / 60
+ results_per_second = 5 / 60 # 5 results per minute = 1 every 12 seconds
  trial_length_s = 300
 
  max_runtime = 650
@@ -73,7 +73,7 @@ def main(bucket):
  results_per_second=results_per_second,
  trial_length_s=trial_length_s,
  max_runtime=max_runtime,
- checkpoint_freq_s=10, # Once every 10 seconds
+ checkpoint_freq_s=12, # Once every 12 seconds (once per result)
  checkpoint_size_b=int(10 * 1000**2), # 10 MB
  keep_checkpoints_num=2,
  resources_per_trial={"cpu": 2},