[BugFix] Fix strict length in PRB+SliceSampler #2202

vmoens · 2024-06-05T16:54:17Z

I wrote dedicated tests under test_slice_sampler_prioritized

TODO:

Check caching
Add left-right span options
~~Aggregate/reduce priorities~~

pytorch-bot · 2024-06-05T16:54:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2202

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 5 Unrelated Failures

As of commit caa258f with merge base 726e959 ():

NEW FAILURES - The following jobs have failed:

Habitat Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t bd9b40a89edf1030bfb9fa47d73d81cd2323326f53d7ab9a303f5d9b850d2d11 /exec failed with exit code 1
Libs Tests on Linux / unittests-gym (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t 385f35062da772aa27699b3a16718b96fe1d99318dcc6cc84d6f6955b3c1a1e2 /exec failed with exit code 1
Libs Tests on Linux / unittests-sklearn (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t 2ce6862202943d37144a99377297a686139fead1154668847367d9096482dd23 /exec failed with exit code 1
RLHF Tests on Linux / unittests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t 659b77bfcaa7450ba47a9aca3dcd6d5226f3fa4a21385f40e89dc76fab1ee83e /exec failed with exit code 1
Unit-tests on Linux / tests-optdeps (3.10, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t f456252507690dea75b915ee7addfc29e39362dd61da55c82a7fc37506654739 /exec failed with exit code 1
Unit-tests on Windows / unittests-cpu / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128
Wheels / test-wheel (linux, ubuntu-20.04, 3.10) (gh)
Wheels / test-wheel (linux, ubuntu-20.04, 3.11) (gh)
##[error]The operation was canceled.
Wheels / test-wheel (linux, ubuntu-20.04, 3.8) (gh)
##[error]The operation was canceled.
Wheels / test-wheel (linux, ubuntu-20.04, 3.9) (gh)
ModuleNotFoundError: No module named 'dm_env'
Wheels / test-wheel-windows (3.11) (gh)
ModuleNotFoundError: No module named 'dm_env'
Wheels / test-wheel-windows (3.8) (gh)

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Lint / c-source / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128
Lint / python-source-and-configs / linux-job (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 128
Wheels / test-wheel-windows (3.10) (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
Wheels / test-wheel-windows (3.9) (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh) (trunk failure)
test/test_transforms.py::TestVecNorm::test_state_dict_vecnorm

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2024-06-05T17:04:08Z

$\color{#D29922}\textsf{\Large&#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 91. Improved: $\large\color{#35bf28}10$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1208s	60.2936ms	16.5855 Ops/s	17.5085 Ops/s	$\textbf{\color{#d91a1a}-5.27\%}$
test_sync	32.9541ms	31.4586ms	31.7878 Ops/s	30.9688 Ops/s	$\color{#35bf28}+2.64\%$
test_async	51.6862ms	28.7645ms	34.7650 Ops/s	32.7835 Ops/s	$\textbf{\color{#35bf28}+6.04\%}$
test_simple	0.4663s	0.3988s	2.5075 Ops/s	2.6015 Ops/s	$\color{#d91a1a}-3.61\%$
test_transformed	0.5343s	0.5330s	1.8763 Ops/s	1.8496 Ops/s	$\color{#35bf28}+1.45\%$
test_serial	1.3462s	1.2768s	0.7832 Ops/s	0.7638 Ops/s	$\color{#35bf28}+2.54\%$
test_parallel	1.1475s	1.0701s	0.9345 Ops/s	0.9178 Ops/s	$\color{#35bf28}+1.82\%$
test_step_mdp_speed[True-True-True-True-True]	0.1519ms	21.9332μs	45.5930 KOps/s	44.9654 KOps/s	$\color{#35bf28}+1.40\%$
test_step_mdp_speed[True-True-True-True-False]	43.7410μs	13.2627μs	75.3995 KOps/s	72.9356 KOps/s	$\color{#35bf28}+3.38\%$
test_step_mdp_speed[True-True-True-False-True]	50.7950μs	12.8951μs	77.5489 KOps/s	75.8634 KOps/s	$\color{#35bf28}+2.22\%$
test_step_mdp_speed[True-True-True-False-False]	28.5830μs	7.7329μs	129.3171 KOps/s	124.0830 KOps/s	$\color{#35bf28}+4.22\%$
test_step_mdp_speed[True-True-False-True-True]	50.2040μs	23.1766μs	43.1469 KOps/s	41.9803 KOps/s	$\color{#35bf28}+2.78\%$
test_step_mdp_speed[True-True-False-True-False]	42.7900μs	14.5901μs	68.5395 KOps/s	66.8154 KOps/s	$\color{#35bf28}+2.58\%$
test_step_mdp_speed[True-True-False-False-True]	61.5250μs	14.1012μs	70.9159 KOps/s	70.2381 KOps/s	$\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-True-False-False-False]	0.1097ms	9.1591μs	109.1805 KOps/s	109.1860 KOps/s	$-0.01\%$
test_step_mdp_speed[True-False-True-True-True]	77.8150μs	24.3531μs	41.0626 KOps/s	39.8014 KOps/s	$\color{#35bf28}+3.17\%$
test_step_mdp_speed[True-False-True-True-False]	43.4310μs	15.6751μs	63.7954 KOps/s	60.6316 KOps/s	$\textbf{\color{#35bf28}+5.22\%}$
test_step_mdp_speed[True-False-True-False-True]	46.3070μs	14.0626μs	71.1108 KOps/s	70.4271 KOps/s	$\color{#35bf28}+0.97\%$
test_step_mdp_speed[True-False-True-False-False]	34.5850μs	8.9889μs	111.2481 KOps/s	107.6932 KOps/s	$\color{#35bf28}+3.30\%$
test_step_mdp_speed[True-False-False-True-True]	70.9120μs	25.3908μs	39.3843 KOps/s	37.7643 KOps/s	$\color{#35bf28}+4.29\%$
test_step_mdp_speed[True-False-False-True-False]	63.1180μs	16.9371μs	59.0421 KOps/s	56.5947 KOps/s	$\color{#35bf28}+4.32\%$
test_step_mdp_speed[True-False-False-False-True]	0.1140ms	15.1232μs	66.1237 KOps/s	64.0829 KOps/s	$\color{#35bf28}+3.18\%$
test_step_mdp_speed[True-False-False-False-False]	62.8170μs	10.1684μs	98.3439 KOps/s	95.6654 KOps/s	$\color{#35bf28}+2.80\%$
test_step_mdp_speed[False-True-True-True-True]	57.5880μs	24.6384μs	40.5870 KOps/s	39.7984 KOps/s	$\color{#35bf28}+1.98\%$
test_step_mdp_speed[False-True-True-True-False]	49.3320μs	15.6409μs	63.9350 KOps/s	61.4561 KOps/s	$\color{#35bf28}+4.03\%$
test_step_mdp_speed[False-True-True-False-True]	43.3210μs	16.3187μs	61.2793 KOps/s	60.0312 KOps/s	$\color{#35bf28}+2.08\%$
test_step_mdp_speed[False-True-True-False-False]	47.6690μs	10.2250μs	97.7997 KOps/s	94.6493 KOps/s	$\color{#35bf28}+3.33\%$
test_step_mdp_speed[False-True-False-True-True]	0.1425ms	25.8084μs	38.7471 KOps/s	38.2912 KOps/s	$\color{#35bf28}+1.19\%$
test_step_mdp_speed[False-True-False-True-False]	47.5590μs	16.9487μs	59.0016 KOps/s	56.7464 KOps/s	$\color{#35bf28}+3.97\%$
test_step_mdp_speed[False-True-False-False-True]	50.3240μs	17.4797μs	57.2092 KOps/s	55.4890 KOps/s	$\color{#35bf28}+3.10\%$
test_step_mdp_speed[False-True-False-False-False]	49.1920μs	11.2218μs	89.1125 KOps/s	84.7316 KOps/s	$\textbf{\color{#35bf28}+5.17\%}$
test_step_mdp_speed[False-False-True-True-True]	73.1660μs	26.7529μs	37.3792 KOps/s	36.1225 KOps/s	$\color{#35bf28}+3.48\%$
test_step_mdp_speed[False-False-True-True-False]	49.6030μs	18.2185μs	54.8893 KOps/s	52.0644 KOps/s	$\textbf{\color{#35bf28}+5.43\%}$
test_step_mdp_speed[False-False-True-False-True]	44.0930μs	17.3467μs	57.6480 KOps/s	56.1967 KOps/s	$\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-False-True-False-False]	53.7200μs	11.3586μs	88.0387 KOps/s	85.2963 KOps/s	$\color{#35bf28}+3.22\%$
test_step_mdp_speed[False-False-False-True-True]	64.5010μs	28.2642μs	35.3804 KOps/s	33.9895 KOps/s	$\color{#35bf28}+4.09\%$
test_step_mdp_speed[False-False-False-True-False]	50.0530μs	19.2955μs	51.8255 KOps/s	49.5738 KOps/s	$\color{#35bf28}+4.54\%$
test_step_mdp_speed[False-False-False-False-True]	61.0140μs	18.5483μs	53.9134 KOps/s	52.7306 KOps/s	$\color{#35bf28}+2.24\%$
test_step_mdp_speed[False-False-False-False-False]	36.4580μs	12.4880μs	80.0772 KOps/s	77.6497 KOps/s	$\color{#35bf28}+3.13\%$
test_values[generalized_advantage_estimate-True-True]	9.8847ms	9.5006ms	105.2565 Ops/s	103.7266 Ops/s	$\color{#35bf28}+1.47\%$
test_values[vec_generalized_advantage_estimate-True-True]	37.1368ms	33.5250ms	29.8285 Ops/s	28.0170 Ops/s	$\textbf{\color{#35bf28}+6.47\%}$
test_values[td0_return_estimate-False-False]	0.2409ms	0.1869ms	5.3492 KOps/s	5.6202 KOps/s	$\color{#d91a1a}-4.82\%$
test_values[td1_return_estimate-False-False]	24.1805ms	23.7064ms	42.1828 Ops/s	41.7577 Ops/s	$\color{#35bf28}+1.02\%$
test_values[vec_td1_return_estimate-False-False]	34.8153ms	33.5480ms	29.8081 Ops/s	27.8852 Ops/s	$\textbf{\color{#35bf28}+6.90\%}$
test_values[td_lambda_return_estimate-True-False]	37.6744ms	34.4483ms	29.0290 Ops/s	28.4918 Ops/s	$\color{#35bf28}+1.89\%$
test_values[vec_td_lambda_return_estimate-True-False]	35.4968ms	33.6040ms	29.7584 Ops/s	27.8863 Ops/s	$\textbf{\color{#35bf28}+6.71\%}$
test_gae_speed[generalized_advantage_estimate-False-1-512]	10.5303ms	8.3183ms	120.2176 Ops/s	119.2288 Ops/s	$\color{#35bf28}+0.83\%$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	2.4555ms	2.0548ms	486.6554 Ops/s	518.5369 Ops/s	$\textbf{\color{#d91a1a}-6.15\%}$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.4698ms	0.3572ms	2.7999 KOps/s	2.8273 KOps/s	$\color{#d91a1a}-0.97\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	61.0930ms	40.8234ms	24.4957 Ops/s	23.4359 Ops/s	$\color{#35bf28}+4.52\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	4.1481ms	3.1758ms	314.8775 Ops/s	327.8606 Ops/s	$\color{#d91a1a}-3.96\%$
test_dqn_speed	1.9663ms	1.3549ms	738.0556 Ops/s	730.6994 Ops/s	$\color{#35bf28}+1.01\%$
test_ddpg_speed	3.1856ms	2.8627ms	349.3255 Ops/s	348.8161 Ops/s	$\color{#35bf28}+0.15\%$
test_sac_speed	9.8374ms	8.6555ms	115.5339 Ops/s	115.8569 Ops/s	$\color{#d91a1a}-0.28\%$
test_redq_speed	15.1003ms	14.2028ms	70.4085 Ops/s	69.4642 Ops/s	$\color{#35bf28}+1.36\%$
test_redq_deprec_speed	15.7243ms	14.3517ms	69.6781 Ops/s	71.1845 Ops/s	$\color{#d91a1a}-2.12\%$
test_td3_speed	17.8033ms	8.6953ms	115.0052 Ops/s	116.0640 Ops/s	$\color{#d91a1a}-0.91\%$
test_cql_speed	39.2086ms	37.5990ms	26.5965 Ops/s	26.9204 Ops/s	$\color{#d91a1a}-1.20\%$
test_a2c_speed	8.7024ms	7.9579ms	125.6609 Ops/s	130.7055 Ops/s	$\color{#d91a1a}-3.86\%$
test_ppo_speed	9.3618ms	8.2551ms	121.1373 Ops/s	125.9343 Ops/s	$\color{#d91a1a}-3.81\%$
test_reinforce_speed	7.4765ms	6.9735ms	143.4001 Ops/s	148.3900 Ops/s	$\color{#d91a1a}-3.36\%$
test_iql_speed	35.3114ms	34.3890ms	29.0790 Ops/s	29.7416 Ops/s	$\color{#d91a1a}-2.23\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.0108ms	3.7705ms	265.2179 Ops/s	266.6471 Ops/s	$\color{#d91a1a}-0.54\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.0144ms	0.5163ms	1.9367 KOps/s	1.9385 KOps/s	$\color{#d91a1a}-0.09\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.7091ms	0.4879ms	2.0494 KOps/s	2.0553 KOps/s	$\color{#d91a1a}-0.28\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.0112ms	3.7191ms	268.8841 Ops/s	259.8266 Ops/s	$\color{#35bf28}+3.49\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	1.0148ms	0.5118ms	1.9539 KOps/s	1.9383 KOps/s	$\color{#35bf28}+0.80\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	0.7012ms	0.4835ms	2.0682 KOps/s	2.0343 KOps/s	$\color{#35bf28}+1.67\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	2.2814ms	1.7204ms	581.2438 Ops/s	581.8873 Ops/s	$\color{#d91a1a}-0.11\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	5.0558ms	1.6452ms	607.8377 Ops/s	613.8492 Ops/s	$\color{#d91a1a}-0.98\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	5.1939ms	3.9316ms	254.3467 Ops/s	256.6209 Ops/s	$\color{#d91a1a}-0.89\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.1987ms	0.6328ms	1.5803 KOps/s	1.3562 KOps/s	$\textbf{\color{#35bf28}+16.53\%}$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8193ms	0.6125ms	1.6327 KOps/s	1.6377 KOps/s	$\color{#d91a1a}-0.30\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.2727ms	3.8243ms	261.4838 Ops/s	264.5320 Ops/s	$\color{#d91a1a}-1.15\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.0514ms	0.5294ms	1.8889 KOps/s	1.9197 KOps/s	$\color{#d91a1a}-1.60\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.7031ms	0.4972ms	2.0111 KOps/s	2.0199 KOps/s	$\color{#d91a1a}-0.43\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.3938ms	3.8791ms	257.7899 Ops/s	266.5017 Ops/s	$\color{#d91a1a}-3.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.6315ms	0.5161ms	1.9377 KOps/s	1.9552 KOps/s	$\color{#d91a1a}-0.90\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	3.7940ms	0.5009ms	1.9964 KOps/s	2.0517 KOps/s	$\color{#d91a1a}-2.69\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	5.7865ms	3.8960ms	256.6732 Ops/s	257.0392 Ops/s	$\color{#d91a1a}-0.14\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.2009ms	0.6393ms	1.5643 KOps/s	1.5582 KOps/s	$\color{#35bf28}+0.39\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.9206ms	0.6181ms	1.6178 KOps/s	1.6517 KOps/s	$\color{#d91a1a}-2.05\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1231s	6.1826ms	161.7436 Ops/s	117.9617 Ops/s	$\textbf{\color{#35bf28}+37.12\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	15.4418ms	12.8410ms	77.8759 Ops/s	77.6478 Ops/s	$\color{#35bf28}+0.29\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	1.2044ms	1.0592ms	944.1233 Ops/s	931.8748 Ops/s	$\color{#35bf28}+1.31\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1179s	8.2859ms	120.6875 Ops/s	164.9117 Ops/s	$\textbf{\color{#d91a1a}-26.82\%}$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	15.5485ms	12.7168ms	78.6362 Ops/s	77.5598 Ops/s	$\color{#35bf28}+1.39\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	4.6624ms	1.1597ms	862.3105 Ops/s	936.3912 Ops/s	$\textbf{\color{#d91a1a}-7.91\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1187s	6.1988ms	161.3225 Ops/s	159.1374 Ops/s	$\color{#35bf28}+1.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	0.1251s	15.2164ms	65.7186 Ops/s	64.1972 Ops/s	$\color{#35bf28}+2.37\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	3.9092ms	1.2809ms	780.6856 Ops/s	712.6531 Ops/s	$\textbf{\color{#35bf28}+9.55\%}$

github-actions · 2024-06-05T17:45:39Z

$\color{#35bf28}\textsf{\Large&#x2714;\kern{0.2cm}\normalsize OK}$ Result of GPU Benchmark Tests

Total Benchmarks: 94. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}0$.

Expand to view detailed results

Name	Max	Mean	Ops	Ops on Repo `HEAD`	Change
test_single	0.1166s	0.1161s	8.6124 Ops/s	8.4287 Ops/s	$\color{#35bf28}+2.18\%$
test_sync	0.1061s	0.1053s	9.4927 Ops/s	9.5058 Ops/s	$\color{#d91a1a}-0.14\%$
test_async	0.1993s	98.0876ms	10.1950 Ops/s	10.2892 Ops/s	$\color{#d91a1a}-0.92\%$
test_single_pixels	0.1277s	0.1276s	7.8357 Ops/s	7.7913 Ops/s	$\color{#35bf28}+0.57\%$
test_sync_pixels	85.4421ms	84.0634ms	11.8958 Ops/s	12.1972 Ops/s	$\color{#d91a1a}-2.47\%$
test_async_pixels	0.1548s	67.1312ms	14.8962 Ops/s	14.5560 Ops/s	$\color{#35bf28}+2.34\%$
test_simple	0.8811s	0.8256s	1.2112 Ops/s	1.2082 Ops/s	$\color{#35bf28}+0.24\%$
test_transformed	1.1270s	1.0683s	0.9361 Ops/s	0.9229 Ops/s	$\color{#35bf28}+1.42\%$
test_serial	2.5263s	2.4739s	0.4042 Ops/s	0.3970 Ops/s	$\color{#35bf28}+1.81\%$
test_parallel	2.4255s	2.3646s	0.4229 Ops/s	0.4253 Ops/s	$\color{#d91a1a}-0.57\%$
test_step_mdp_speed[True-True-True-True-True]	59.7110μs	34.6468μs	28.8627 KOps/s	29.2076 KOps/s	$\color{#d91a1a}-1.18\%$
test_step_mdp_speed[True-True-True-True-False]	38.9610μs	20.2622μs	49.3530 KOps/s	49.1212 KOps/s	$\color{#35bf28}+0.47\%$
test_step_mdp_speed[True-True-True-False-True]	36.4211μs	19.3670μs	51.6342 KOps/s	49.8248 KOps/s	$\color{#35bf28}+3.63\%$
test_step_mdp_speed[True-True-True-False-False]	38.3200μs	11.5087μs	86.8905 KOps/s	86.6357 KOps/s	$\color{#35bf28}+0.29\%$
test_step_mdp_speed[True-True-False-True-True]	58.2710μs	35.8018μs	27.9315 KOps/s	27.8685 KOps/s	$\color{#35bf28}+0.23\%$
test_step_mdp_speed[True-True-False-True-False]	37.6810μs	21.9199μs	45.6207 KOps/s	44.6396 KOps/s	$\color{#35bf28}+2.20\%$
test_step_mdp_speed[True-True-False-False-True]	37.9210μs	21.3047μs	46.9381 KOps/s	45.9864 KOps/s	$\color{#35bf28}+2.07\%$
test_step_mdp_speed[True-True-False-False-False]	30.3800μs	13.3508μs	74.9019 KOps/s	74.5921 KOps/s	$\color{#35bf28}+0.42\%$
test_step_mdp_speed[True-False-True-True-True]	54.8610μs	37.4351μs	26.7129 KOps/s	26.0808 KOps/s	$\color{#35bf28}+2.42\%$
test_step_mdp_speed[True-False-True-True-False]	51.9010μs	24.1130μs	41.4715 KOps/s	40.6057 KOps/s	$\color{#35bf28}+2.13\%$
test_step_mdp_speed[True-False-True-False-True]	42.2010μs	21.0845μs	47.4282 KOps/s	46.5951 KOps/s	$\color{#35bf28}+1.79\%$
test_step_mdp_speed[True-False-True-False-False]	31.1410μs	13.3728μs	74.7789 KOps/s	74.5029 KOps/s	$\color{#35bf28}+0.37\%$
test_step_mdp_speed[True-False-False-True-True]	60.2710μs	39.4791μs	25.3298 KOps/s	24.8911 KOps/s	$\color{#35bf28}+1.76\%$
test_step_mdp_speed[True-False-False-True-False]	56.8200μs	25.7680μs	38.8078 KOps/s	38.5652 KOps/s	$\color{#35bf28}+0.63\%$
test_step_mdp_speed[True-False-False-False-True]	50.6800μs	23.0355μs	43.4113 KOps/s	42.7017 KOps/s	$\color{#35bf28}+1.66\%$
test_step_mdp_speed[True-False-False-False-False]	37.0700μs	15.1842μs	65.8578 KOps/s	65.0887 KOps/s	$\color{#35bf28}+1.18\%$
test_step_mdp_speed[False-True-True-True-True]	57.0610μs	38.0404μs	26.2879 KOps/s	26.1861 KOps/s	$\color{#35bf28}+0.39\%$
test_step_mdp_speed[False-True-True-True-False]	41.7210μs	23.8194μs	41.9826 KOps/s	40.7780 KOps/s	$\color{#35bf28}+2.95\%$
test_step_mdp_speed[False-True-True-False-True]	46.5010μs	25.6234μs	39.0269 KOps/s	39.1101 KOps/s	$\color{#d91a1a}-0.21\%$
test_step_mdp_speed[False-True-True-False-False]	33.2700μs	15.2911μs	65.3974 KOps/s	65.2395 KOps/s	$\color{#35bf28}+0.24\%$
test_step_mdp_speed[False-True-False-True-True]	66.6410μs	39.6752μs	25.2047 KOps/s	25.0429 KOps/s	$\color{#35bf28}+0.65\%$
test_step_mdp_speed[False-True-False-True-False]	43.4600μs	25.5977μs	39.0660 KOps/s	38.1326 KOps/s	$\color{#35bf28}+2.45\%$
test_step_mdp_speed[False-True-False-False-True]	97.8220μs	26.7900μs	37.3274 KOps/s	36.3473 KOps/s	$\color{#35bf28}+2.70\%$
test_step_mdp_speed[False-True-False-False-False]	34.7610μs	17.0514μs	58.6462 KOps/s	57.8477 KOps/s	$\color{#35bf28}+1.38\%$
test_step_mdp_speed[False-False-True-True-True]	63.8710μs	41.5830μs	24.0483 KOps/s	23.9781 KOps/s	$\color{#35bf28}+0.29\%$
test_step_mdp_speed[False-False-True-True-False]	53.4710μs	27.7401μs	36.0489 KOps/s	35.1416 KOps/s	$\color{#35bf28}+2.58\%$
test_step_mdp_speed[False-False-True-False-True]	44.7010μs	26.7685μs	37.3573 KOps/s	36.4047 KOps/s	$\color{#35bf28}+2.62\%$
test_step_mdp_speed[False-False-True-False-False]	35.5010μs	17.0483μs	58.6570 KOps/s	58.1891 KOps/s	$\color{#35bf28}+0.80\%$
test_step_mdp_speed[False-False-False-True-True]	67.8710μs	43.3012μs	23.0940 KOps/s	22.4879 KOps/s	$\color{#35bf28}+2.70\%$
test_step_mdp_speed[False-False-False-True-False]	50.0600μs	29.8225μs	33.5317 KOps/s	33.4874 KOps/s	$\color{#35bf28}+0.13\%$
test_step_mdp_speed[False-False-False-False-True]	49.9310μs	28.7825μs	34.7434 KOps/s	33.9468 KOps/s	$\color{#35bf28}+2.35\%$
test_step_mdp_speed[False-False-False-False-False]	45.4010μs	19.0299μs	52.5488 KOps/s	52.7134 KOps/s	$\color{#d91a1a}-0.31\%$
test_values[generalized_advantage_estimate-True-True]	25.2427ms	24.7372ms	40.4250 Ops/s	39.8450 Ops/s	$\color{#35bf28}+1.46\%$
test_values[vec_generalized_advantage_estimate-True-True]	91.3658ms	3.3956ms	294.4959 Ops/s	309.4991 Ops/s	$\color{#d91a1a}-4.85\%$
test_values[td0_return_estimate-False-False]	91.8820μs	64.6287μs	15.4730 KOps/s	15.4549 KOps/s	$\color{#35bf28}+0.12\%$
test_values[td1_return_estimate-False-False]	56.7127ms	53.0992ms	18.8327 Ops/s	18.4039 Ops/s	$\color{#35bf28}+2.33\%$
test_values[vec_td1_return_estimate-False-False]	2.0526ms	1.7660ms	566.2365 Ops/s	565.6514 Ops/s	$\color{#35bf28}+0.10\%$
test_values[td_lambda_return_estimate-True-False]	89.6315ms	84.6277ms	11.8165 Ops/s	11.4845 Ops/s	$\color{#35bf28}+2.89\%$
test_values[vec_td_lambda_return_estimate-True-False]	2.0833ms	1.7632ms	567.1461 Ops/s	566.5285 Ops/s	$\color{#35bf28}+0.11\%$
test_gae_speed[generalized_advantage_estimate-False-1-512]	24.0156ms	23.8697ms	41.8940 Ops/s	39.0933 Ops/s	$\textbf{\color{#35bf28}+7.16\%}$
test_gae_speed[vec_generalized_advantage_estimate-True-1-512]	0.8923ms	0.6967ms	1.4354 KOps/s	1.4297 KOps/s	$\color{#35bf28}+0.40\%$
test_gae_speed[vec_generalized_advantage_estimate-False-1-512]	0.7075ms	0.6509ms	1.5362 KOps/s	1.5261 KOps/s	$\color{#35bf28}+0.67\%$
test_gae_speed[vec_generalized_advantage_estimate-True-32-512]	1.4914ms	1.4513ms	689.0322 Ops/s	687.1838 Ops/s	$\color{#35bf28}+0.27\%$
test_gae_speed[vec_generalized_advantage_estimate-False-32-512]	0.9343ms	0.6682ms	1.4966 KOps/s	1.4887 KOps/s	$\color{#35bf28}+0.53\%$
test_dqn_speed	1.5873ms	1.4131ms	707.6404 Ops/s	696.6794 Ops/s	$\color{#35bf28}+1.57\%$
test_ddpg_speed	3.1244ms	2.9420ms	339.9097 Ops/s	339.8437 Ops/s	$\color{#35bf28}+0.02\%$
test_sac_speed	9.4216ms	8.4242ms	118.7051 Ops/s	117.6530 Ops/s	$\color{#35bf28}+0.89\%$
test_redq_speed	12.5831ms	10.6424ms	93.9641 Ops/s	84.7764 Ops/s	$\textbf{\color{#35bf28}+10.84\%}$
test_redq_deprec_speed	12.1425ms	11.5552ms	86.5412 Ops/s	85.0523 Ops/s	$\color{#35bf28}+1.75\%$
test_td3_speed	17.2305ms	8.4392ms	118.4940 Ops/s	118.9092 Ops/s	$\color{#d91a1a}-0.35\%$
test_cql_speed	26.1661ms	25.7269ms	38.8698 Ops/s	38.5555 Ops/s	$\color{#35bf28}+0.82\%$
test_a2c_speed	5.9002ms	5.6844ms	175.9209 Ops/s	175.9801 Ops/s	$\color{#d91a1a}-0.03\%$
test_ppo_speed	6.2760ms	5.9751ms	167.3625 Ops/s	166.5778 Ops/s	$\color{#35bf28}+0.47\%$
test_reinforce_speed	4.8890ms	4.6115ms	216.8510 Ops/s	214.5543 Ops/s	$\color{#35bf28}+1.07\%$
test_iql_speed	20.7991ms	19.9293ms	50.1775 Ops/s	50.0486 Ops/s	$\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.8619ms	4.6641ms	214.4055 Ops/s	211.0310 Ops/s	$\color{#35bf28}+1.60\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	0.8282ms	0.5948ms	1.6813 KOps/s	1.6769 KOps/s	$\color{#35bf28}+0.26\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	4.5270ms	0.5723ms	1.7473 KOps/s	1.7420 KOps/s	$\color{#35bf28}+0.31\%$
test_rb_sample[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.8455ms	4.6329ms	215.8464 Ops/s	214.1774 Ops/s	$\color{#35bf28}+0.78\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.7000ms	0.5846ms	1.7105 KOps/s	1.6978 KOps/s	$\color{#35bf28}+0.75\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.4426ms	0.5616ms	1.7807 KOps/s	1.7557 KOps/s	$\color{#35bf28}+1.42\%$
test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000]	2.2017ms	2.0705ms	482.9797 Ops/s	477.2070 Ops/s	$\color{#35bf28}+1.21\%$
test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000]	5.8449ms	1.9710ms	507.3608 Ops/s	498.6768 Ops/s	$\color{#35bf28}+1.74\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	4.9417ms	4.7774ms	209.3197 Ops/s	206.3803 Ops/s	$\color{#35bf28}+1.42\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.7529ms	0.7148ms	1.3990 KOps/s	1.3766 KOps/s	$\color{#35bf28}+1.63\%$
test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8606ms	0.6879ms	1.4537 KOps/s	1.4337 KOps/s	$\color{#35bf28}+1.39\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000]	4.7542ms	4.6384ms	215.5905 Ops/s	211.9812 Ops/s	$\color{#35bf28}+1.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000]	1.4314ms	0.5937ms	1.6843 KOps/s	1.6632 KOps/s	$\color{#35bf28}+1.27\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000]	0.7527ms	0.5667ms	1.7645 KOps/s	1.7237 KOps/s	$\color{#35bf28}+2.37\%$
test_rb_iterate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-4000]	4.8780ms	4.6029ms	217.2555 Ops/s	213.6178 Ops/s	$\color{#35bf28}+1.70\%$
test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000]	0.7056ms	0.5866ms	1.7047 KOps/s	1.6842 KOps/s	$\color{#35bf28}+1.22\%$
test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000]	4.7734ms	0.5721ms	1.7479 KOps/s	1.7344 KOps/s	$\color{#35bf28}+0.78\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-ListStorage-None-4000]	4.8422ms	4.7710ms	209.6008 Ops/s	206.1215 Ops/s	$\color{#35bf28}+1.69\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000]	1.6769ms	0.7221ms	1.3848 KOps/s	1.3602 KOps/s	$\color{#35bf28}+1.81\%$
test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000]	0.8345ms	0.6956ms	1.4376 KOps/s	1.4199 KOps/s	$\color{#35bf28}+1.25\%$
test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400]	0.1286s	7.4051ms	135.0417 Ops/s	131.3146 Ops/s	$\color{#35bf28}+2.84\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400]	17.6501ms	15.5218ms	64.4255 Ops/s	60.4447 Ops/s	$\textbf{\color{#35bf28}+6.59\%}$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400]	1.3517ms	1.2615ms	792.6760 Ops/s	754.4610 Ops/s	$\textbf{\color{#35bf28}+5.07\%}$
test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400]	0.1188s	9.4649ms	105.6537 Ops/s	104.0167 Ops/s	$\color{#35bf28}+1.57\%$
test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400]	17.6056ms	15.4722ms	64.6320 Ops/s	61.7308 Ops/s	$\color{#35bf28}+4.70\%$
test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-400]	7.6730ms	1.4016ms	713.4716 Ops/s	702.2936 Ops/s	$\color{#35bf28}+1.59\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400]	0.1188s	7.3807ms	135.4891 Ops/s	131.4956 Ops/s	$\color{#35bf28}+3.04\%$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400]	18.0116ms	15.6351ms	63.9587 Ops/s	60.3444 Ops/s	$\textbf{\color{#35bf28}+5.99\%}$
test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400]	2.3989ms	1.4338ms	697.4481 Ops/s	587.7458 Ops/s	$\textbf{\color{#35bf28}+18.66\%}$

…-slice

vmoens · 2024-06-07T07:15:51Z

@wertyuilife2

Should we reduce the priorities of each traj while we're at it?
I don't think it'd require much compute and it would make sure that all items are equally weighted within a traj

Take the following 2 trajs with associated priorities

Item: [0, 1, 2, 3, 4, 5, 6, 7]
Traj: [0, 0, 0, 1, 1, 1, 1, 1]
Priority: [10, 1, 1, 10, 1, 2, 1, 1]

Currently, item 0 and 3 having a higher priority they have more chances of being sampled as start points and hence you will get more trajs starting with these. If we reduce, we will have (10 + 1 + 1)/3=4 for the first and (10 + 1 + 2 + 1 +1)/5=3 for the second.

Priority: [4, 4, 4, 3, 3, 3, 3, 3]

At this point, the start point is equally likely within a traj but some trajs have a higher prob of being sampled (which seems to make more sense to me?)

I guess any solution will make someone unhappy...

wertyuilife2 · 2024-06-07T08:18:36Z

@vmoens I believe that when discussing PrioritizedSampler, there is only one correct approach: we should not reduce the priorities of each trajectory while we are at it.

The core idea of PER is that certain samples (not trajectories) are important (such as a critical action) and need to be learned frequently. Reducing the priorities of each trajectory would make it difficult for PER to focus on updating specific important samples.

When discussing PrioritizedSliceSampler, we face the choice of whether to reduce the priorities of each slice while we are at it. My suggestion is to leave this choice to the user, as the calculation of priorities and the calling of update_priority() are both handled by the user. In other words, we still should not reduce the priorities of each slice.

I think your thoughts are more likely associated with an "episodic buffer", but in my view, the current implementation of ReplayBuffer is not episodic, so there is no need to unify the priority of the entire trajectory.

vmoens · 2024-06-07T09:21:29Z

The core idea of PER is that certain samples (not trajectories) are important (such as a critical action) and need to be learned frequently. Reducing the priorities of each trajectory would make it difficult for PER to focus on updating specific important samples.

Got it thanks for that, indeed that's how I edited the docstring (users should be in charge of setting the proper priority).
But when we say prioritized, slice sampler I can imagine someone imagining: I have a transition with high priority therefore there is a chance to find it anywhere (not just at the beginning) in my sample -- whereas now there is a higher chance to find it at the beginning of a slice than at the end

torchrl/data/replay_buffers/replay_buffers.py

vmoens · 2024-06-07T10:12:06Z

torchrl/data/replay_buffers/samplers.py

+ # if p_sum <= 0:
+ # index, *_ = RandomSampler.sample(self, storage, batch_size)
+ # device = index[0].device if isinstance(index, tuple) else index.device
+ # weight = torch.ones(batch_size, device=device)
+ # else:
+ # if p_min <= 0:
+ # p_min = 1
+ #
+ # # For some undefined reason, only np.random works here.
+ # # All PT attempts fail, even when subsequently transformed into numpy
+ # mass = np.random.uniform(0.0, p_sum, size=batch_size)
+ # # mass = torch.zeros(batch_size, dtype=torch.double).uniform_(0.0, p_sum)
+ # # mass = torch.rand(batch_size).mul_(p_sum)
+ # index = self._sum_tree.scan_lower_bound(mass)
+ # index = torch.as_tensor(index)
+ # if not index.ndim:
+ # index = index.unsqueeze(0)
+ # index.clamp_max_(len(storage) - 1)
+ # weight = torch.as_tensor(self._sum_tree[index])
+ #
+ # # Importance sampling weight formula:
+ # # w_i = (p_i / sum(p) * N) ^ (-beta)
+ # # weight_i = w_i / max(w)
+ # # weight_i = (p_i / sum(p) * N) ^ (-beta) /
+ # # ((min(p) / sum(p) * N) ^ (-beta))
+ # # weight_i = ((p_i / sum(p) * N) / (min(p) / sum(p) * N)) ^ (-beta)
+ # # weight_i = (p_i / min(p)) ^ (-beta)
+ # # weight = np.power(weight / (p_min + self._eps), -self._beta)
+ # weight = torch.pow(weight / p_min, -self._beta)
+ # if storage.ndim > 1:
+ # index = torch.unravel_index(index, storage._storage.shape)


@wertyuilife2 Happy to hear your thoughts about this

Basically the idea would be to sample uniformly if a priority hasn't been passed. I won't make it part of this PR but we could consider this in the future.

@vmoens I believe we should ensure that priorities are initialized at all times and throw an error when p_sum <= 0 or p_min <= 0. We should design around default_priority and update_priority to handle potential issues rather than ignoring them.

Even considering silence bug detection, I don't recommend such an implementation.

For example, if some user's incorrect behavior or a bug causes len(storage) to be larger than it should be, this implementation could result in undetected silent bugs.

torchrl/data/replay_buffers/samplers.py

…-slice

init

00b01b5

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 5, 2024

vmoens mentioned this pull request Jun 6, 2024

[BUG] Multiple Issues in Samplers and Buffers Affecting Stability and Expected Behavior #2205

Open

10 tasks

vmoens added 2 commits June 6, 2024 15:24

tmp fixes

135dc2f

Merge remote-tracking branch 'origin/main' into fix-strict-length-prb…

e7b910b

…-slice

vmoens added the bug Something isn't working label Jun 6, 2024

amend

96ff257

This was referenced Jun 6, 2024

[BUG] Unintended Cross-Trajectory Sampling in PrioritizedSliceSampler.sample() #2208

Closed

[BUG] Segmentation Fault in PrioritizedSliceSampler.sample() #2206

Closed

[BUG] Double Initialization of Priority for New Samples in PrioritizedSampler #2211

Closed

This was linked to issues Jun 6, 2024

[BUG] Segmentation Fault in PrioritizedSliceSampler.sample() #2206

Closed

[BUG] Unintended Cross-Trajectory Sampling in PrioritizedSliceSampler.sample() #2208

Closed

[BUG] Double Initialization of Priority for New Samples in PrioritizedSampler #2211

Closed

Merge remote-tracking branch 'origin/main' into fix-strict-length-prb…

7ef4ff4

…-slice

amend

361c251