Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix strict length in PRB+SliceSampler #2202

Merged
merged 17 commits into from
Jun 7, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jun 5, 2024

I wrote dedicated tests under test_slice_sampler_prioritized

TODO:

  • Check caching
  • Add left-right span options
  • Aggregate/reduce priorities

Copy link

pytorch-bot bot commented Jun 5, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2202

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 5 Unrelated Failures

As of commit caa258f with merge base 726e959 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 5, 2024
Copy link

github-actions bot commented Jun 5, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1208s 60.2936ms 16.5855 Ops/s 17.5085 Ops/s $\textbf{\color{#d91a1a}-5.27\%}$
test_sync 32.9541ms 31.4586ms 31.7878 Ops/s 30.9688 Ops/s $\color{#35bf28}+2.64\%$
test_async 51.6862ms 28.7645ms 34.7650 Ops/s 32.7835 Ops/s $\textbf{\color{#35bf28}+6.04\%}$
test_simple 0.4663s 0.3988s 2.5075 Ops/s 2.6015 Ops/s $\color{#d91a1a}-3.61\%$
test_transformed 0.5343s 0.5330s 1.8763 Ops/s 1.8496 Ops/s $\color{#35bf28}+1.45\%$
test_serial 1.3462s 1.2768s 0.7832 Ops/s 0.7638 Ops/s $\color{#35bf28}+2.54\%$
test_parallel 1.1475s 1.0701s 0.9345 Ops/s 0.9178 Ops/s $\color{#35bf28}+1.82\%$
test_step_mdp_speed[True-True-True-True-True] 0.1519ms 21.9332μs 45.5930 KOps/s 44.9654 KOps/s $\color{#35bf28}+1.40\%$
test_step_mdp_speed[True-True-True-True-False] 43.7410μs 13.2627μs 75.3995 KOps/s 72.9356 KOps/s $\color{#35bf28}+3.38\%$
test_step_mdp_speed[True-True-True-False-True] 50.7950μs 12.8951μs 77.5489 KOps/s 75.8634 KOps/s $\color{#35bf28}+2.22\%$
test_step_mdp_speed[True-True-True-False-False] 28.5830μs 7.7329μs 129.3171 KOps/s 124.0830 KOps/s $\color{#35bf28}+4.22\%$
test_step_mdp_speed[True-True-False-True-True] 50.2040μs 23.1766μs 43.1469 KOps/s 41.9803 KOps/s $\color{#35bf28}+2.78\%$
test_step_mdp_speed[True-True-False-True-False] 42.7900μs 14.5901μs 68.5395 KOps/s 66.8154 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[True-True-False-False-True] 61.5250μs 14.1012μs 70.9159 KOps/s 70.2381 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-True-False-False-False] 0.1097ms 9.1591μs 109.1805 KOps/s 109.1860 KOps/s $-0.01\%$
test_step_mdp_speed[True-False-True-True-True] 77.8150μs 24.3531μs 41.0626 KOps/s 39.8014 KOps/s $\color{#35bf28}+3.17\%$
test_step_mdp_speed[True-False-True-True-False] 43.4310μs 15.6751μs 63.7954 KOps/s 60.6316 KOps/s $\textbf{\color{#35bf28}+5.22\%}$
test_step_mdp_speed[True-False-True-False-True] 46.3070μs 14.0626μs 71.1108 KOps/s 70.4271 KOps/s $\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-False-True-False-False] 34.5850μs 8.9889μs 111.2481 KOps/s 107.6932 KOps/s $\color{#35bf28}+3.30\%$
test_step_mdp_speed[True-False-False-True-True] 70.9120μs 25.3908μs 39.3843 KOps/s 37.7643 KOps/s $\color{#35bf28}+4.29\%$
test_step_mdp_speed[True-False-False-True-False] 63.1180μs 16.9371μs 59.0421 KOps/s 56.5947 KOps/s $\color{#35bf28}+4.32\%$
test_step_mdp_speed[True-False-False-False-True] 0.1140ms 15.1232μs 66.1237 KOps/s 64.0829 KOps/s $\color{#35bf28}+3.18\%$
test_step_mdp_speed[True-False-False-False-False] 62.8170μs 10.1684μs 98.3439 KOps/s 95.6654 KOps/s $\color{#35bf28}+2.80\%$
test_step_mdp_speed[False-True-True-True-True] 57.5880μs 24.6384μs 40.5870 KOps/s 39.7984 KOps/s $\color{#35bf28}+1.98\%$
test_step_mdp_speed[False-True-True-True-False] 49.3320μs 15.6409μs 63.9350 KOps/s 61.4561 KOps/s $\color{#35bf28}+4.03\%$
test_step_mdp_speed[False-True-True-False-True] 43.3210μs 16.3187μs 61.2793 KOps/s 60.0312 KOps/s $\color{#35bf28}+2.08\%$
test_step_mdp_speed[False-True-True-False-False] 47.6690μs 10.2250μs 97.7997 KOps/s 94.6493 KOps/s $\color{#35bf28}+3.33\%$
test_step_mdp_speed[False-True-False-True-True] 0.1425ms 25.8084μs 38.7471 KOps/s 38.2912 KOps/s $\color{#35bf28}+1.19\%$
test_step_mdp_speed[False-True-False-True-False] 47.5590μs 16.9487μs 59.0016 KOps/s 56.7464 KOps/s $\color{#35bf28}+3.97\%$
test_step_mdp_speed[False-True-False-False-True] 50.3240μs 17.4797μs 57.2092 KOps/s 55.4890 KOps/s $\color{#35bf28}+3.10\%$
test_step_mdp_speed[False-True-False-False-False] 49.1920μs 11.2218μs 89.1125 KOps/s 84.7316 KOps/s $\textbf{\color{#35bf28}+5.17\%}$
test_step_mdp_speed[False-False-True-True-True] 73.1660μs 26.7529μs 37.3792 KOps/s 36.1225 KOps/s $\color{#35bf28}+3.48\%$
test_step_mdp_speed[False-False-True-True-False] 49.6030μs 18.2185μs 54.8893 KOps/s 52.0644 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_step_mdp_speed[False-False-True-False-True] 44.0930μs 17.3467μs 57.6480 KOps/s 56.1967 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-False-True-False-False] 53.7200μs 11.3586μs 88.0387 KOps/s 85.2963 KOps/s $\color{#35bf28}+3.22\%$
test_step_mdp_speed[False-False-False-True-True] 64.5010μs 28.2642μs 35.3804 KOps/s 33.9895 KOps/s $\color{#35bf28}+4.09\%$
test_step_mdp_speed[False-False-False-True-False] 50.0530μs 19.2955μs 51.8255 KOps/s 49.5738 KOps/s $\color{#35bf28}+4.54\%$
test_step_mdp_speed[False-False-False-False-True] 61.0140μs 18.5483μs 53.9134 KOps/s 52.7306 KOps/s $\color{#35bf28}+2.24\%$
test_step_mdp_speed[False-False-False-False-False] 36.4580μs 12.4880μs 80.0772 KOps/s 77.6497 KOps/s $\color{#35bf28}+3.13\%$
test_values[generalized_advantage_estimate-True-True] 9.8847ms 9.5006ms 105.2565 Ops/s 103.7266 Ops/s $\color{#35bf28}+1.47\%$
test_values[vec_generalized_advantage_estimate-True-True] 37.1368ms 33.5250ms 29.8285 Ops/s 28.0170 Ops/s $\textbf{\color{#35bf28}+6.47\%}$
test_values[td0_return_estimate-False-False] 0.2409ms 0.1869ms 5.3492 KOps/s 5.6202 KOps/s $\color{#d91a1a}-4.82\%$
test_values[td1_return_estimate-False-False] 24.1805ms 23.7064ms 42.1828 Ops/s 41.7577 Ops/s $\color{#35bf28}+1.02\%$
test_values[vec_td1_return_estimate-False-False] 34.8153ms 33.5480ms 29.8081 Ops/s 27.8852 Ops/s $\textbf{\color{#35bf28}+6.90\%}$
test_values[td_lambda_return_estimate-True-False] 37.6744ms 34.4483ms 29.0290 Ops/s 28.4918 Ops/s $\color{#35bf28}+1.89\%$
test_values[vec_td_lambda_return_estimate-True-False] 35.4968ms 33.6040ms 29.7584 Ops/s 27.8863 Ops/s $\textbf{\color{#35bf28}+6.71\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512] 10.5303ms 8.3183ms 120.2176 Ops/s 119.2288 Ops/s $\color{#35bf28}+0.83\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 2.4555ms 2.0548ms 486.6554 Ops/s 518.5369 Ops/s $\textbf{\color{#d91a1a}-6.15\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.4698ms 0.3572ms 2.7999 KOps/s 2.8273 KOps/s $\color{#d91a1a}-0.97\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 61.0930ms 40.8234ms 24.4957 Ops/s 23.4359 Ops/s $\color{#35bf28}+4.52\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 4.1481ms 3.1758ms 314.8775 Ops/s 327.8606 Ops/s $\color{#d91a1a}-3.96\%$
test_dqn_speed 1.9663ms 1.3549ms 738.0556 Ops/s 730.6994 Ops/s $\color{#35bf28}+1.01\%$
test_ddpg_speed 3.1856ms 2.8627ms 349.3255 Ops/s 348.8161 Ops/s $\color{#35bf28}+0.15\%$
test_sac_speed 9.8374ms 8.6555ms 115.5339 Ops/s 115.8569 Ops/s $\color{#d91a1a}-0.28\%$
test_redq_speed 15.1003ms 14.2028ms 70.4085 Ops/s 69.4642 Ops/s $\color{#35bf28}+1.36\%$
test_redq_deprec_speed 15.7243ms 14.3517ms 69.6781 Ops/s 71.1845 Ops/s $\color{#d91a1a}-2.12\%$
test_td3_speed 17.8033ms 8.6953ms 115.0052 Ops/s 116.0640 Ops/s $\color{#d91a1a}-0.91\%$
test_cql_speed 39.2086ms 37.5990ms 26.5965 Ops/s 26.9204 Ops/s $\color{#d91a1a}-1.20\%$
test_a2c_speed 8.7024ms 7.9579ms 125.6609 Ops/s 130.7055 Ops/s $\color{#d91a1a}-3.86\%$
test_ppo_speed 9.3618ms 8.2551ms 121.1373 Ops/s 125.9343 Ops/s $\color{#d91a1a}-3.81\%$
test_reinforce_speed 7.4765ms 6.9735ms 143.4001 Ops/s 148.3900 Ops/s $\color{#d91a1a}-3.36\%$
test_iql_speed 35.3114ms 34.3890ms 29.0790 Ops/s 29.7416 Ops/s $\color{#d91a1a}-2.23\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.0108ms 3.7705ms 265.2179 Ops/s 266.6471 Ops/s $\color{#d91a1a}-0.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0144ms 0.5163ms 1.9367 KOps/s 1.9385 KOps/s $\color{#d91a1a}-0.09\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7091ms 0.4879ms 2.0494 KOps/s 2.0553 KOps/s $\color{#d91a1a}-0.28\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.0112ms 3.7191ms 268.8841 Ops/s 259.8266 Ops/s $\color{#35bf28}+3.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 1.0148ms 0.5118ms 1.9539 KOps/s 1.9383 KOps/s $\color{#35bf28}+0.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 0.7012ms 0.4835ms 2.0682 KOps/s 2.0343 KOps/s $\color{#35bf28}+1.67\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.2814ms 1.7204ms 581.2438 Ops/s 581.8873 Ops/s $\color{#d91a1a}-0.11\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 5.0558ms 1.6452ms 607.8377 Ops/s 613.8492 Ops/s $\color{#d91a1a}-0.98\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.1939ms 3.9316ms 254.3467 Ops/s 256.6209 Ops/s $\color{#d91a1a}-0.89\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.1987ms 0.6328ms 1.5803 KOps/s 1.3562 KOps/s $\textbf{\color{#35bf28}+16.53\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8193ms 0.6125ms 1.6327 KOps/s 1.6377 KOps/s $\color{#d91a1a}-0.30\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.2727ms 3.8243ms 261.4838 Ops/s 264.5320 Ops/s $\color{#d91a1a}-1.15\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.0514ms 0.5294ms 1.8889 KOps/s 1.9197 KOps/s $\color{#d91a1a}-1.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7031ms 0.4972ms 2.0111 KOps/s 2.0199 KOps/s $\color{#d91a1a}-0.43\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.3938ms 3.8791ms 257.7899 Ops/s 266.5017 Ops/s $\color{#d91a1a}-3.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.6315ms 0.5161ms 1.9377 KOps/s 1.9552 KOps/s $\color{#d91a1a}-0.90\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3.7940ms 0.5009ms 1.9964 KOps/s 2.0517 KOps/s $\color{#d91a1a}-2.69\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 5.7865ms 3.8960ms 256.6732 Ops/s 257.0392 Ops/s $\color{#d91a1a}-0.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.2009ms 0.6393ms 1.5643 KOps/s 1.5582 KOps/s $\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.9206ms 0.6181ms 1.6178 KOps/s 1.6517 KOps/s $\color{#d91a1a}-2.05\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1231s 6.1826ms 161.7436 Ops/s 117.9617 Ops/s $\textbf{\color{#35bf28}+37.12\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 15.4418ms 12.8410ms 77.8759 Ops/s 77.6478 Ops/s $\color{#35bf28}+0.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.2044ms 1.0592ms 944.1233 Ops/s 931.8748 Ops/s $\color{#35bf28}+1.31\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1179s 8.2859ms 120.6875 Ops/s 164.9117 Ops/s $\textbf{\color{#d91a1a}-26.82\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 15.5485ms 12.7168ms 78.6362 Ops/s 77.5598 Ops/s $\color{#35bf28}+1.39\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 4.6624ms 1.1597ms 862.3105 Ops/s 936.3912 Ops/s $\textbf{\color{#d91a1a}-7.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1187s 6.1988ms 161.3225 Ops/s 159.1374 Ops/s $\color{#35bf28}+1.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 0.1251s 15.2164ms 65.7186 Ops/s 64.1972 Ops/s $\color{#35bf28}+2.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 3.9092ms 1.2809ms 780.6856 Ops/s 712.6531 Ops/s $\textbf{\color{#35bf28}+9.55\%}$

Copy link

github-actions bot commented Jun 5, 2024

$\color{#35bf28}\textsf{\Large✔\kern{0.2cm}\normalsize OK}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}0$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_single 0.1166s 0.1161s 8.6124 Ops/s 8.4287 Ops/s $\color{#35bf28}+2.18\%$
test_sync 0.1061s 0.1053s 9.4927 Ops/s 9.5058 Ops/s $\color{#d91a1a}-0.14\%$
test_async 0.1993s 98.0876ms 10.1950 Ops/s 10.2892 Ops/s $\color{#d91a1a}-0.92\%$
test_single_pixels 0.1277s 0.1276s 7.8357 Ops/s 7.7913 Ops/s $\color{#35bf28}+0.57\%$
test_sync_pixels 85.4421ms 84.0634ms 11.8958 Ops/s 12.1972 Ops/s $\color{#d91a1a}-2.47\%$
test_async_pixels 0.1548s 67.1312ms 14.8962 Ops/s 14.5560 Ops/s $\color{#35bf28}+2.34\%$
test_simple 0.8811s 0.8256s 1.2112 Ops/s 1.2082 Ops/s $\color{#35bf28}+0.24\%$
test_transformed 1.1270s 1.0683s 0.9361 Ops/s 0.9229 Ops/s $\color{#35bf28}+1.42\%$
test_serial 2.5263s 2.4739s 0.4042 Ops/s 0.3970 Ops/s $\color{#35bf28}+1.81\%$
test_parallel 2.4255s 2.3646s 0.4229 Ops/s 0.4253 Ops/s $\color{#d91a1a}-0.57\%$
test_step_mdp_speed[True-True-True-True-True] 59.7110μs 34.6468μs 28.8627 KOps/s 29.2076 KOps/s $\color{#d91a1a}-1.18\%$
test_step_mdp_speed[True-True-True-True-False] 38.9610μs 20.2622μs 49.3530 KOps/s 49.1212 KOps/s $\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-True-True-False-True] 36.4211μs 19.3670μs 51.6342 KOps/s 49.8248 KOps/s $\color{#35bf28}+3.63\%$
test_step_mdp_speed[True-True-True-False-False] 38.3200μs 11.5087μs 86.8905 KOps/s 86.6357 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-True-False-True-True] 58.2710μs 35.8018μs 27.9315 KOps/s 27.8685 KOps/s $\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-True-False-True-False] 37.6810μs 21.9199μs 45.6207 KOps/s 44.6396 KOps/s $\color{#35bf28}+2.20\%$
test_step_mdp_speed[True-True-False-False-True] 37.9210μs 21.3047μs 46.9381 KOps/s 45.9864 KOps/s $\color{#35bf28}+2.07\%$
test_step_mdp_speed[True-True-False-False-False] 30.3800μs 13.3508μs 74.9019 KOps/s 74.5921 KOps/s $\color{#35bf28}+0.42\%$
test_step_mdp_speed[True-False-True-True-True] 54.8610μs 37.4351μs 26.7129 KOps/s 26.0808 KOps/s $\color{#35bf28}+2.42\%$
test_step_mdp_speed[True-False-True-True-False] 51.9010μs 24.1130μs 41.4715 KOps/s 40.6057 KOps/s $\color{#35bf28}+2.13\%$
test_step_mdp_speed[True-False-True-False-True] 42.2010μs 21.0845μs 47.4282 KOps/s 46.5951 KOps/s $\color{#35bf28}+1.79\%$
test_step_mdp_speed[True-False-True-False-False] 31.1410μs 13.3728μs 74.7789 KOps/s 74.5029 KOps/s $\color{#35bf28}+0.37\%$
test_step_mdp_speed[True-False-False-True-True] 60.2710μs 39.4791μs 25.3298 KOps/s 24.8911 KOps/s $\color{#35bf28}+1.76\%$
test_step_mdp_speed[True-False-False-True-False] 56.8200μs 25.7680μs 38.8078 KOps/s 38.5652 KOps/s $\color{#35bf28}+0.63\%$
test_step_mdp_speed[True-False-False-False-True] 50.6800μs 23.0355μs 43.4113 KOps/s 42.7017 KOps/s $\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-False-False-False-False] 37.0700μs 15.1842μs 65.8578 KOps/s 65.0887 KOps/s $\color{#35bf28}+1.18\%$
test_step_mdp_speed[False-True-True-True-True] 57.0610μs 38.0404μs 26.2879 KOps/s 26.1861 KOps/s $\color{#35bf28}+0.39\%$
test_step_mdp_speed[False-True-True-True-False] 41.7210μs 23.8194μs 41.9826 KOps/s 40.7780 KOps/s $\color{#35bf28}+2.95\%$
test_step_mdp_speed[False-True-True-False-True] 46.5010μs 25.6234μs 39.0269 KOps/s 39.1101 KOps/s $\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-False-False] 33.2700μs 15.2911μs 65.3974 KOps/s 65.2395 KOps/s $\color{#35bf28}+0.24\%$
test_step_mdp_speed[False-True-False-True-True] 66.6410μs 39.6752μs 25.2047 KOps/s 25.0429 KOps/s $\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-True-False-True-False] 43.4600μs 25.5977μs 39.0660 KOps/s 38.1326 KOps/s $\color{#35bf28}+2.45\%$
test_step_mdp_speed[False-True-False-False-True] 97.8220μs 26.7900μs 37.3274 KOps/s 36.3473 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[False-True-False-False-False] 34.7610μs 17.0514μs 58.6462 KOps/s 57.8477 KOps/s $\color{#35bf28}+1.38\%$
test_step_mdp_speed[False-False-True-True-True] 63.8710μs 41.5830μs 24.0483 KOps/s 23.9781 KOps/s $\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-False-True-True-False] 53.4710μs 27.7401μs 36.0489 KOps/s 35.1416 KOps/s $\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-False-True-False-True] 44.7010μs 26.7685μs 37.3573 KOps/s 36.4047 KOps/s $\color{#35bf28}+2.62\%$
test_step_mdp_speed[False-False-True-False-False] 35.5010μs 17.0483μs 58.6570 KOps/s 58.1891 KOps/s $\color{#35bf28}+0.80\%$
test_step_mdp_speed[False-False-False-True-True] 67.8710μs 43.3012μs 23.0940 KOps/s 22.4879 KOps/s $\color{#35bf28}+2.70\%$
test_step_mdp_speed[False-False-False-True-False] 50.0600μs 29.8225μs 33.5317 KOps/s 33.4874 KOps/s $\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-False-False-False-True] 49.9310μs 28.7825μs 34.7434 KOps/s 33.9468 KOps/s $\color{#35bf28}+2.35\%$
test_step_mdp_speed[False-False-False-False-False] 45.4010μs 19.0299μs 52.5488 KOps/s 52.7134 KOps/s $\color{#d91a1a}-0.31\%$
test_values[generalized_advantage_estimate-True-True] 25.2427ms 24.7372ms 40.4250 Ops/s 39.8450 Ops/s $\color{#35bf28}+1.46\%$
test_values[vec_generalized_advantage_estimate-True-True] 91.3658ms 3.3956ms 294.4959 Ops/s 309.4991 Ops/s $\color{#d91a1a}-4.85\%$
test_values[td0_return_estimate-False-False] 91.8820μs 64.6287μs 15.4730 KOps/s 15.4549 KOps/s $\color{#35bf28}+0.12\%$
test_values[td1_return_estimate-False-False] 56.7127ms 53.0992ms 18.8327 Ops/s 18.4039 Ops/s $\color{#35bf28}+2.33\%$
test_values[vec_td1_return_estimate-False-False] 2.0526ms 1.7660ms 566.2365 Ops/s 565.6514 Ops/s $\color{#35bf28}+0.10\%$
test_values[td_lambda_return_estimate-True-False] 89.6315ms 84.6277ms 11.8165 Ops/s 11.4845 Ops/s $\color{#35bf28}+2.89\%$
test_values[vec_td_lambda_return_estimate-True-False] 2.0833ms 1.7632ms 567.1461 Ops/s 566.5285 Ops/s $\color{#35bf28}+0.11\%$
test_gae_speed[generalized_advantage_estimate-False-1-512] 24.0156ms 23.8697ms 41.8940 Ops/s 39.0933 Ops/s $\textbf{\color{#35bf28}+7.16\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 0.8923ms 0.6967ms 1.4354 KOps/s 1.4297 KOps/s $\color{#35bf28}+0.40\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 0.7075ms 0.6509ms 1.5362 KOps/s 1.5261 KOps/s $\color{#35bf28}+0.67\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 1.4914ms 1.4513ms 689.0322 Ops/s 687.1838 Ops/s $\color{#35bf28}+0.27\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 0.9343ms 0.6682ms 1.4966 KOps/s 1.4887 KOps/s $\color{#35bf28}+0.53\%$
test_dqn_speed 1.5873ms 1.4131ms 707.6404 Ops/s 696.6794 Ops/s $\color{#35bf28}+1.57\%$
test_ddpg_speed 3.1244ms 2.9420ms 339.9097 Ops/s 339.8437 Ops/s $\color{#35bf28}+0.02\%$
test_sac_speed 9.4216ms 8.4242ms 118.7051 Ops/s 117.6530 Ops/s $\color{#35bf28}+0.89\%$
test_redq_speed 12.5831ms 10.6424ms 93.9641 Ops/s 84.7764 Ops/s $\textbf{\color{#35bf28}+10.84\%}$
test_redq_deprec_speed 12.1425ms 11.5552ms 86.5412 Ops/s 85.0523 Ops/s $\color{#35bf28}+1.75\%$
test_td3_speed 17.2305ms 8.4392ms 118.4940 Ops/s 118.9092 Ops/s $\color{#d91a1a}-0.35\%$
test_cql_speed 26.1661ms 25.7269ms 38.8698 Ops/s 38.5555 Ops/s $\color{#35bf28}+0.82\%$
test_a2c_speed 5.9002ms 5.6844ms 175.9209 Ops/s 175.9801 Ops/s $\color{#d91a1a}-0.03\%$
test_ppo_speed 6.2760ms 5.9751ms 167.3625 Ops/s 166.5778 Ops/s $\color{#35bf28}+0.47\%$
test_reinforce_speed 4.8890ms 4.6115ms 216.8510 Ops/s 214.5543 Ops/s $\color{#35bf28}+1.07\%$
test_iql_speed 20.7991ms 19.9293ms 50.1775 Ops/s 50.0486 Ops/s $\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.8619ms 4.6641ms 214.4055 Ops/s 211.0310 Ops/s $\color{#35bf28}+1.60\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 0.8282ms 0.5948ms 1.6813 KOps/s 1.6769 KOps/s $\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 4.5270ms 0.5723ms 1.7473 KOps/s 1.7420 KOps/s $\color{#35bf28}+0.31\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.8455ms 4.6329ms 215.8464 Ops/s 214.1774 Ops/s $\color{#35bf28}+0.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7000ms 0.5846ms 1.7105 KOps/s 1.6978 KOps/s $\color{#35bf28}+0.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.4426ms 0.5616ms 1.7807 KOps/s 1.7557 KOps/s $\color{#35bf28}+1.42\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 2.2017ms 2.0705ms 482.9797 Ops/s 477.2070 Ops/s $\color{#35bf28}+1.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 5.8449ms 1.9710ms 507.3608 Ops/s 498.6768 Ops/s $\color{#35bf28}+1.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.9417ms 4.7774ms 209.3197 Ops/s 206.3803 Ops/s $\color{#35bf28}+1.42\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.7529ms 0.7148ms 1.3990 KOps/s 1.3766 KOps/s $\color{#35bf28}+1.63\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8606ms 0.6879ms 1.4537 KOps/s 1.4337 KOps/s $\color{#35bf28}+1.39\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 4.7542ms 4.6384ms 215.5905 Ops/s 211.9812 Ops/s $\color{#35bf28}+1.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 1.4314ms 0.5937ms 1.6843 KOps/s 1.6632 KOps/s $\color{#35bf28}+1.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 0.7527ms 0.5667ms 1.7645 KOps/s 1.7237 KOps/s $\color{#35bf28}+2.37\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000] 4.8780ms 4.6029ms 217.2555 Ops/s 213.6178 Ops/s $\color{#35bf28}+1.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 0.7056ms 0.5866ms 1.7047 KOps/s 1.6842 KOps/s $\color{#35bf28}+1.22\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 4.7734ms 0.5721ms 1.7479 KOps/s 1.7344 KOps/s $\color{#35bf28}+0.78\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000] 4.8422ms 4.7710ms 209.6008 Ops/s 206.1215 Ops/s $\color{#35bf28}+1.69\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1.6769ms 0.7221ms 1.3848 KOps/s 1.3602 KOps/s $\color{#35bf28}+1.81\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 0.8345ms 0.6956ms 1.4376 KOps/s 1.4199 KOps/s $\color{#35bf28}+1.25\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 0.1286s 7.4051ms 135.0417 Ops/s 131.3146 Ops/s $\color{#35bf28}+2.84\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 17.6501ms 15.5218ms 64.4255 Ops/s 60.4447 Ops/s $\textbf{\color{#35bf28}+6.59\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1.3517ms 1.2615ms 792.6760 Ops/s 754.4610 Ops/s $\textbf{\color{#35bf28}+5.07\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 0.1188s 9.4649ms 105.6537 Ops/s 104.0167 Ops/s $\color{#35bf28}+1.57\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 17.6056ms 15.4722ms 64.6320 Ops/s 61.7308 Ops/s $\color{#35bf28}+4.70\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400] 7.6730ms 1.4016ms 713.4716 Ops/s 702.2936 Ops/s $\color{#35bf28}+1.59\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 0.1188s 7.3807ms 135.4891 Ops/s 131.4956 Ops/s $\color{#35bf28}+3.04\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 18.0116ms 15.6351ms 63.9587 Ops/s 60.3444 Ops/s $\textbf{\color{#35bf28}+5.99\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 2.3989ms 1.4338ms 697.4481 Ops/s 587.7458 Ops/s $\textbf{\color{#35bf28}+18.66\%}$

@vmoens vmoens added the bug Something isn't working label Jun 6, 2024
@vmoens
Copy link
Contributor Author

vmoens commented Jun 7, 2024

@wertyuilife2

Should we reduce the priorities of each traj while we're at it?
I don't think it'd require much compute and it would make sure that all items are equally weighted within a traj

Take the following 2 trajs with associated priorities

Item: [0, 1, 2, 3, 4, 5, 6, 7]
Traj: [0, 0, 0, 1, 1, 1, 1, 1]
Priority: [10, 1, 1, 10, 1, 2, 1, 1]

Currently, item 0 and 3 having a higher priority they have more chances of being sampled as start points and hence you will get more trajs starting with these. If we reduce, we will have (10 + 1 + 1)/3=4 for the first and (10 + 1 + 2 + 1 +1)/5=3 for the second.

Priority: [4, 4, 4, 3, 3, 3, 3, 3]

At this point, the start point is equally likely within a traj but some trajs have a higher prob of being sampled (which seems to make more sense to me?)

I guess any solution will make someone unhappy...

@wertyuilife2
Copy link

wertyuilife2 commented Jun 7, 2024

@vmoens I believe that when discussing PrioritizedSampler, there is only one correct approach: we should not reduce the priorities of each trajectory while we are at it.

The core idea of PER is that certain samples (not trajectories) are important (such as a critical action) and need to be learned frequently. Reducing the priorities of each trajectory would make it difficult for PER to focus on updating specific important samples.

When discussing PrioritizedSliceSampler, we face the choice of whether to reduce the priorities of each slice while we are at it. My suggestion is to leave this choice to the user, as the calculation of priorities and the calling of update_priority() are both handled by the user. In other words, we still should not reduce the priorities of each slice.

I think your thoughts are more likely associated with an "episodic buffer", but in my view, the current implementation of ReplayBuffer is not episodic, so there is no need to unify the priority of the entire trajectory.

@vmoens
Copy link
Contributor Author

vmoens commented Jun 7, 2024

The core idea of PER is that certain samples (not trajectories) are important (such as a critical action) and need to be learned frequently. Reducing the priorities of each trajectory would make it difficult for PER to focus on updating specific important samples.

Got it thanks for that, indeed that's how I edited the docstring (users should be in charge of setting the proper priority).
But when we say prioritized, slice sampler I can imagine someone imagining: I have a transition with high priority therefore there is a chance to find it anywhere (not just at the beginning) in my sample -- whereas now there is a higher chance to find it at the beginning of a slice than at the end

Comment on lines 424 to 454
# if p_sum <= 0:
# index, *_ = RandomSampler.sample(self, storage, batch_size)
# device = index[0].device if isinstance(index, tuple) else index.device
# weight = torch.ones(batch_size, device=device)
# else:
# if p_min <= 0:
# p_min = 1
#
# # For some undefined reason, only np.random works here.
# # All PT attempts fail, even when subsequently transformed into numpy
# mass = np.random.uniform(0.0, p_sum, size=batch_size)
# # mass = torch.zeros(batch_size, dtype=torch.double).uniform_(0.0, p_sum)
# # mass = torch.rand(batch_size).mul_(p_sum)
# index = self._sum_tree.scan_lower_bound(mass)
# index = torch.as_tensor(index)
# if not index.ndim:
# index = index.unsqueeze(0)
# index.clamp_max_(len(storage) - 1)
# weight = torch.as_tensor(self._sum_tree[index])
#
# # Importance sampling weight formula:
# # w_i = (p_i / sum(p) * N) ^ (-beta)
# # weight_i = w_i / max(w)
# # weight_i = (p_i / sum(p) * N) ^ (-beta) /
# # ((min(p) / sum(p) * N) ^ (-beta))
# # weight_i = ((p_i / sum(p) * N) / (min(p) / sum(p) * N)) ^ (-beta)
# # weight_i = (p_i / min(p)) ^ (-beta)
# # weight = np.power(weight / (p_min + self._eps), -self._beta)
# weight = torch.pow(weight / p_min, -self._beta)
# if storage.ndim > 1:
# index = torch.unravel_index(index, storage._storage.shape)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wertyuilife2 Happy to hear your thoughts about this

Basically the idea would be to sample uniformly if a priority hasn't been passed. I won't make it part of this PR but we could consider this in the future.

Copy link

@wertyuilife2 wertyuilife2 Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmoens I believe we should ensure that priorities are initialized at all times and throw an error when p_sum <= 0 or p_min <= 0. We should design around default_priority and update_priority to handle potential issues rather than ignoring them.

Even considering silence bug detection, I don't recommend such an implementation.

For example, if some user's incorrect behavior or a bug causes len(storage) to be larger than it should be, this implementation could result in undetected silent bugs.

@vmoens vmoens merged commit 332499a into main Jun 7, 2024
39 of 52 checks passed
@vmoens vmoens deleted the fix-strict-length-prb-slice branch June 7, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
3 participants