Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Dense stack lazy tds defaults to dense_stack_tds #713

Merged
merged 1 commit into from
Mar 19, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 19, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 19, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}4$. Worsened: $\large\color{#d91a1a}14$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 32.4510μs 16.7689μs 59.6343 KOps/s 61.5867 KOps/s $\color{#d91a1a}-3.17\%$
test_plain_set_stack_nested 39.5430μs 17.2201μs 58.0718 KOps/s 60.9053 KOps/s $\color{#d91a1a}-4.65\%$
test_plain_set_nested_inplace 52.6290μs 19.5169μs 51.2376 KOps/s 53.3871 KOps/s $\color{#d91a1a}-4.03\%$
test_plain_set_stack_nested_inplace 56.8460μs 19.6305μs 50.9413 KOps/s 53.7197 KOps/s $\textbf{\color{#d91a1a}-5.17\%}$
test_items 19.6570μs 2.5023μs 399.6399 KOps/s 392.1389 KOps/s $\color{#35bf28}+1.91\%$
test_items_nested 0.4531ms 0.2812ms 3.5566 KOps/s 3.6638 KOps/s $\color{#d91a1a}-2.93\%$
test_items_nested_locked 0.4299ms 0.2784ms 3.5921 KOps/s 3.7451 KOps/s $\color{#d91a1a}-4.08\%$
test_items_nested_leaf 0.5272ms 0.1747ms 5.7225 KOps/s 6.0873 KOps/s $\textbf{\color{#d91a1a}-5.99\%}$
test_items_stack_nested 0.3974ms 0.2834ms 3.5287 KOps/s 3.7657 KOps/s $\textbf{\color{#d91a1a}-6.29\%}$
test_items_stack_nested_leaf 0.3021ms 0.1757ms 5.6918 KOps/s 6.0306 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_items_stack_nested_locked 0.4317ms 0.2824ms 3.5412 KOps/s 3.7390 KOps/s $\textbf{\color{#d91a1a}-5.29\%}$
test_keys 18.4140μs 4.2147μs 237.2661 KOps/s 262.7548 KOps/s $\textbf{\color{#d91a1a}-9.70\%}$
test_keys_nested 2.1981ms 0.1408ms 7.1032 KOps/s 6.8415 KOps/s $\color{#35bf28}+3.82\%$
test_keys_nested_locked 0.2607ms 0.1475ms 6.7804 KOps/s 6.6442 KOps/s $\color{#35bf28}+2.05\%$
test_keys_nested_leaf 32.2948ms 0.1311ms 7.6300 KOps/s 7.8473 KOps/s $\color{#d91a1a}-2.77\%$
test_keys_stack_nested 0.2544ms 0.1457ms 6.8634 KOps/s 6.7590 KOps/s $\color{#35bf28}+1.54\%$
test_keys_stack_nested_leaf 0.2390ms 0.1262ms 7.9217 KOps/s 7.8197 KOps/s $\color{#35bf28}+1.31\%$
test_keys_stack_nested_locked 0.2725ms 0.1494ms 6.6926 KOps/s 6.6488 KOps/s $\color{#35bf28}+0.66\%$
test_values 5.7557μs 1.1841μs 844.4905 KOps/s 849.6892 KOps/s $\color{#d91a1a}-0.61\%$
test_values_nested 92.3430μs 50.7690μs 19.6971 KOps/s 19.2871 KOps/s $\color{#35bf28}+2.13\%$
test_values_nested_locked 0.1277ms 50.8439μs 19.6681 KOps/s 19.2454 KOps/s $\color{#35bf28}+2.20\%$
test_values_nested_leaf 91.8010μs 46.0273μs 21.7262 KOps/s 21.2276 KOps/s $\color{#35bf28}+2.35\%$
test_values_stack_nested 0.1053ms 52.5170μs 19.0415 KOps/s 19.3807 KOps/s $\color{#d91a1a}-1.75\%$
test_values_stack_nested_leaf 94.5270μs 46.4102μs 21.5470 KOps/s 21.6106 KOps/s $\color{#d91a1a}-0.29\%$
test_values_stack_nested_locked 87.8040μs 51.4787μs 19.4255 KOps/s 19.3263 KOps/s $\color{#35bf28}+0.51\%$
test_membership 18.1730μs 1.3158μs 759.9866 KOps/s 731.5704 KOps/s $\color{#35bf28}+3.88\%$
test_membership_nested 19.8470μs 3.5375μs 282.6823 KOps/s 290.9425 KOps/s $\color{#d91a1a}-2.84\%$
test_membership_nested_leaf 21.9410μs 3.5737μs 279.8200 KOps/s 283.9054 KOps/s $\color{#d91a1a}-1.44\%$
test_membership_stacked_nested 21.5400μs 3.4468μs 290.1205 KOps/s 293.3940 KOps/s $\color{#d91a1a}-1.12\%$
test_membership_stacked_nested_leaf 21.4400μs 3.5073μs 285.1216 KOps/s 293.5607 KOps/s $\color{#d91a1a}-2.87\%$
test_membership_nested_last 29.8560μs 4.3218μs 231.3870 KOps/s 236.4181 KOps/s $\color{#d91a1a}-2.13\%$
test_membership_nested_leaf_last 30.0960μs 4.3195μs 231.5077 KOps/s 236.4525 KOps/s $\color{#d91a1a}-2.09\%$
test_membership_stacked_nested_last 24.1750μs 4.2843μs 233.4082 KOps/s 237.0272 KOps/s $\color{#d91a1a}-1.53\%$
test_membership_stacked_nested_leaf_last 19.7670μs 4.3586μs 229.4296 KOps/s 237.5944 KOps/s $\color{#d91a1a}-3.44\%$
test_nested_getleaf 40.9080μs 10.9506μs 91.3192 KOps/s 91.6583 KOps/s $\color{#d91a1a}-0.37\%$
test_nested_get 53.9400μs 10.3606μs 96.5195 KOps/s 96.8899 KOps/s $\color{#d91a1a}-0.38\%$
test_stacked_getleaf 50.4540μs 10.7293μs 93.2026 KOps/s 92.6183 KOps/s $\color{#35bf28}+0.63\%$
test_stacked_get 31.3080μs 10.2181μs 97.8658 KOps/s 97.5124 KOps/s $\color{#35bf28}+0.36\%$
test_nested_getitemleaf 39.8540μs 11.6230μs 86.0360 KOps/s 87.2166 KOps/s $\color{#d91a1a}-1.35\%$
test_nested_getitem 26.3380μs 10.6838μs 93.5994 KOps/s 94.6439 KOps/s $\color{#d91a1a}-1.10\%$
test_stacked_getitemleaf 28.6130μs 11.2069μs 89.2306 KOps/s 88.6031 KOps/s $\color{#35bf28}+0.71\%$
test_stacked_getitem 27.1510μs 10.4937μs 95.2954 KOps/s 95.9476 KOps/s $\color{#d91a1a}-0.68\%$
test_lock_nested 0.7095ms 0.3396ms 2.9448 KOps/s 2.9601 KOps/s $\color{#d91a1a}-0.52\%$
test_lock_stack_nested 0.4773ms 0.3038ms 3.2916 KOps/s 3.3445 KOps/s $\color{#d91a1a}-1.58\%$
test_unlock_nested 73.6032ms 0.4205ms 2.3783 KOps/s 2.4641 KOps/s $\color{#d91a1a}-3.48\%$
test_unlock_stack_nested 0.4402ms 0.3114ms 3.2108 KOps/s 3.2262 KOps/s $\color{#d91a1a}-0.48\%$
test_flatten_speed 0.5369ms 0.2632ms 3.7988 KOps/s 3.7540 KOps/s $\color{#35bf28}+1.19\%$
test_unflatten_speed 0.6925ms 0.4111ms 2.4325 KOps/s 2.4124 KOps/s $\color{#35bf28}+0.83\%$
test_common_ops 5.6286ms 0.6784ms 1.4741 KOps/s 1.5050 KOps/s $\color{#d91a1a}-2.06\%$
test_creation 23.3440μs 1.7916μs 558.1757 KOps/s 548.4405 KOps/s $\color{#35bf28}+1.78\%$
test_creation_empty 32.9010μs 9.9751μs 100.2501 KOps/s 107.2028 KOps/s $\textbf{\color{#d91a1a}-6.49\%}$
test_creation_nested_1 39.4730μs 12.5343μs 79.7811 KOps/s 83.8246 KOps/s $\color{#d91a1a}-4.82\%$
test_creation_nested_2 41.7380μs 15.9139μs 62.8381 KOps/s 65.4408 KOps/s $\color{#d91a1a}-3.98\%$
test_clone 80.7010μs 13.1087μs 76.2851 KOps/s 77.9077 KOps/s $\color{#d91a1a}-2.08\%$
test_getitem[int] 42.1680μs 10.9261μs 91.5237 KOps/s 91.4791 KOps/s $\color{#35bf28}+0.05\%$
test_getitem[slice_int] 57.3870μs 21.9441μs 45.5704 KOps/s 45.8787 KOps/s $\color{#d91a1a}-0.67\%$
test_getitem[range] 0.1686ms 41.4194μs 24.1433 KOps/s 25.4796 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_getitem[tuple] 49.4220μs 18.6828μs 53.5250 KOps/s 55.7582 KOps/s $\color{#d91a1a}-4.01\%$
test_getitem[list] 0.1532ms 35.8408μs 27.9012 KOps/s 29.0263 KOps/s $\color{#d91a1a}-3.88\%$
test_setitem_dim[int] 54.4210μs 29.7900μs 33.5683 KOps/s 29.3016 KOps/s $\textbf{\color{#35bf28}+14.56\%}$
test_setitem_dim[slice_int] 0.1062ms 58.5745μs 17.0723 KOps/s 15.6356 KOps/s $\textbf{\color{#35bf28}+9.19\%}$
test_setitem_dim[range] 0.1584ms 77.9228μs 12.8332 KOps/s 13.0244 KOps/s $\color{#d91a1a}-1.47\%$
test_setitem_dim[tuple] 0.1148ms 48.0313μs 20.8197 KOps/s 20.2403 KOps/s $\color{#35bf28}+2.86\%$
test_setitem 75.5110μs 19.5864μs 51.0558 KOps/s 53.3915 KOps/s $\color{#d91a1a}-4.37\%$
test_set 70.5820μs 18.7045μs 53.4631 KOps/s 54.7727 KOps/s $\color{#d91a1a}-2.39\%$
test_set_shared 3.4906ms 0.1372ms 7.2874 KOps/s 7.2357 KOps/s $\color{#35bf28}+0.71\%$
test_update 0.1315ms 20.7100μs 48.2859 KOps/s 48.9295 KOps/s $\color{#d91a1a}-1.32\%$
test_update_nested 75.1700μs 29.5297μs 33.8642 KOps/s 35.6915 KOps/s $\textbf{\color{#d91a1a}-5.12\%}$
test_update__nested 76.1520μs 24.7835μs 40.3494 KOps/s 40.6721 KOps/s $\color{#d91a1a}-0.79\%$
test_set_nested 70.7020μs 20.3711μs 49.0891 KOps/s 49.0587 KOps/s $\color{#35bf28}+0.06\%$
test_set_nested_new 68.0070μs 24.2548μs 41.2290 KOps/s 41.8593 KOps/s $\color{#d91a1a}-1.51\%$
test_select 0.1046ms 39.5434μs 25.2887 KOps/s 25.0951 KOps/s $\color{#35bf28}+0.77\%$
test_select_nested 0.1210ms 58.9488μs 16.9639 KOps/s 16.8435 KOps/s $\color{#35bf28}+0.71\%$
test_exclude_nested 0.2083ms 0.1178ms 8.4873 KOps/s 8.4071 KOps/s $\color{#35bf28}+0.95\%$
test_empty[True] 0.6575ms 0.4144ms 2.4134 KOps/s 2.4806 KOps/s $\color{#d91a1a}-2.71\%$
test_empty[False] 4.5024μs 1.0442μs 957.6542 KOps/s 952.4021 KOps/s $\color{#35bf28}+0.55\%$
test_unbind_speed 0.4801ms 0.2565ms 3.8983 KOps/s 4.0125 KOps/s $\color{#d91a1a}-2.85\%$
test_unbind_speed_stack0 0.4246ms 0.2449ms 4.0827 KOps/s 4.1576 KOps/s $\color{#d91a1a}-1.80\%$
test_unbind_speed_stack1 0.1220s 0.6891ms 1.4512 KOps/s 1.4757 KOps/s $\color{#d91a1a}-1.67\%$
test_split 0.1114s 1.6284ms 614.0950 Ops/s 604.9768 Ops/s $\color{#35bf28}+1.51\%$
test_chunk 2.4034ms 1.4542ms 687.6505 Ops/s 682.3408 Ops/s $\color{#35bf28}+0.78\%$
test_creation[device0] 0.1859ms 0.1044ms 9.5816 KOps/s 9.9065 KOps/s $\color{#d91a1a}-3.28\%$
test_creation_from_tensor 3.9124ms 84.5905μs 11.8217 KOps/s 12.1088 KOps/s $\color{#d91a1a}-2.37\%$
test_add_one[memmap_tensor0] 92.9340μs 5.3003μs 188.6691 KOps/s 180.6669 KOps/s $\color{#35bf28}+4.43\%$
test_contiguous[memmap_tensor0] 17.2420μs 0.6100μs 1.6395 MOps/s 1.5999 MOps/s $\color{#35bf28}+2.47\%$
test_stack[memmap_tensor0] 35.1160μs 3.5027μs 285.4937 KOps/s 294.8732 KOps/s $\color{#d91a1a}-3.18\%$
test_memmaptd_index 4.0722ms 0.2395ms 4.1756 KOps/s 4.3226 KOps/s $\color{#d91a1a}-3.40\%$
test_memmaptd_index_astensor 0.6013ms 0.3023ms 3.3083 KOps/s 3.4249 KOps/s $\color{#d91a1a}-3.40\%$
test_memmaptd_index_op 0.9681ms 0.5927ms 1.6872 KOps/s 1.7623 KOps/s $\color{#d91a1a}-4.26\%$
test_serialize_model 0.2032s 0.1119s 8.9389 Ops/s 8.8074 Ops/s $\color{#35bf28}+1.49\%$
test_serialize_model_pickle 0.4483s 0.3750s 2.6668 Ops/s 2.6470 Ops/s $\color{#35bf28}+0.75\%$
test_serialize_weights 0.1029s 97.1462ms 10.2938 Ops/s 10.2214 Ops/s $\color{#35bf28}+0.71\%$
test_serialize_weights_returnearly 0.2396s 0.1294s 7.7302 Ops/s 7.2094 Ops/s $\textbf{\color{#35bf28}+7.22\%}$
test_serialize_weights_pickle 0.8995s 0.5725s 1.7468 Ops/s 2.4209 Ops/s $\textbf{\color{#d91a1a}-27.84\%}$
test_serialize_weights_filesystem 0.1010s 90.6958ms 11.0259 Ops/s 10.6215 Ops/s $\color{#35bf28}+3.81\%$
test_serialize_model_filesystem 0.1032s 93.3581ms 10.7114 Ops/s 10.7197 Ops/s $\color{#d91a1a}-0.08\%$
test_reshape_pytree 45.0240μs 21.2518μs 47.0549 KOps/s 48.1033 KOps/s $\color{#d91a1a}-2.18\%$
test_reshape_td 79.6980μs 31.9277μs 31.3208 KOps/s 32.3257 KOps/s $\color{#d91a1a}-3.11\%$
test_view_pytree 54.1210μs 21.0468μs 47.5132 KOps/s 48.1436 KOps/s $\color{#d91a1a}-1.31\%$
test_view_td 0.1159s 58.6393μs 17.0534 KOps/s 17.7419 KOps/s $\color{#d91a1a}-3.88\%$
test_unbind_pytree 68.1570μs 24.9099μs 40.1446 KOps/s 43.0985 KOps/s $\textbf{\color{#d91a1a}-6.85\%}$
test_unbind_td 99.2550μs 37.2963μs 26.8123 KOps/s 28.0390 KOps/s $\color{#d91a1a}-4.37\%$
test_split_pytree 51.5860μs 24.1369μs 41.4304 KOps/s 43.0760 KOps/s $\color{#d91a1a}-3.82\%$
test_split_td 0.1134ms 39.2684μs 25.4658 KOps/s 25.5922 KOps/s $\color{#d91a1a}-0.49\%$
test_add_pytree 83.3260μs 29.1311μs 34.3275 KOps/s 33.1374 KOps/s $\color{#35bf28}+3.59\%$
test_add_td 0.1193ms 51.6153μs 19.3741 KOps/s 18.6787 KOps/s $\color{#35bf28}+3.72\%$
test_distributed 1.3770ms 0.1016ms 9.8457 KOps/s 9.6503 KOps/s $\color{#35bf28}+2.02\%$
test_tdmodule 62.6570μs 16.9232μs 59.0905 KOps/s 62.3601 KOps/s $\textbf{\color{#d91a1a}-5.24\%}$
test_tdmodule_dispatch 52.9790μs 34.9052μs 28.6490 KOps/s 30.5365 KOps/s $\textbf{\color{#d91a1a}-6.18\%}$
test_tdseq 35.9870μs 19.6674μs 50.8456 KOps/s 49.3662 KOps/s $\color{#35bf28}+3.00\%$
test_tdseq_dispatch 55.9350μs 39.6648μs 25.2112 KOps/s 26.7817 KOps/s $\textbf{\color{#d91a1a}-5.86\%}$
test_instantiation_functorch 1.9343ms 1.3288ms 752.5325 Ops/s 785.7575 Ops/s $\color{#d91a1a}-4.23\%$
test_instantiation_td 1.8450ms 1.0218ms 978.6367 Ops/s 1.0115 KOps/s $\color{#d91a1a}-3.25\%$
test_exec_functorch 0.3413ms 0.1587ms 6.3009 KOps/s 6.4301 KOps/s $\color{#d91a1a}-2.01\%$
test_exec_functional_call 0.2281ms 0.1445ms 6.9195 KOps/s 6.9411 KOps/s $\color{#d91a1a}-0.31\%$
test_exec_td 0.2159ms 0.1422ms 7.0323 KOps/s 7.1763 KOps/s $\color{#d91a1a}-2.01\%$
test_exec_td_decorator 0.2877ms 0.1933ms 5.1740 KOps/s 5.2835 KOps/s $\color{#d91a1a}-2.07\%$
test_vmap_mlp_speed[True-True] 0.7368ms 0.4581ms 2.1831 KOps/s 2.1489 KOps/s $\color{#35bf28}+1.59\%$
test_vmap_mlp_speed[True-False] 0.7599ms 0.4557ms 2.1946 KOps/s 2.1520 KOps/s $\color{#35bf28}+1.98\%$
test_vmap_mlp_speed[False-True] 0.6674ms 0.3721ms 2.6876 KOps/s 2.5236 KOps/s $\textbf{\color{#35bf28}+6.50\%}$
test_vmap_mlp_speed[False-False] 0.6411ms 0.3770ms 2.6524 KOps/s 2.6524 KOps/s $-0.00\%$
test_vmap_mlp_speed_decorator[True-True] 0.6218ms 0.4882ms 2.0484 KOps/s 2.0843 KOps/s $\color{#d91a1a}-1.72\%$
test_vmap_mlp_speed_decorator[True-False] 0.7906ms 0.5080ms 1.9686 KOps/s 2.0622 KOps/s $\color{#d91a1a}-4.54\%$
test_vmap_mlp_speed_decorator[False-True] 0.5757ms 0.3951ms 2.5307 KOps/s 2.5155 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed_decorator[False-False] 0.7433ms 0.3907ms 2.5594 KOps/s 2.5560 KOps/s $\color{#35bf28}+0.13\%$
test_to_module_speed[True] 1.4807ms 1.3599ms 735.3558 Ops/s 720.9602 Ops/s $\color{#35bf28}+2.00\%$
test_to_module_speed[False] 2.0830ms 1.3439ms 744.0945 Ops/s 742.3716 Ops/s $\color{#35bf28}+0.23\%$

@vmoens vmoens added the bug Something isn't working label Mar 19, 2024
@vmoens vmoens merged commit b8e6c6b into main Mar 19, 2024
44 of 48 checks passed
@vmoens vmoens deleted the dense-stack-default branch March 19, 2024 17:22
vmoens added a commit that referenced this pull request Mar 24, 2024
vmoens added a commit that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants