Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix name gathering with tensor indices #690

Merged
merged 1 commit into from
Feb 24, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 24, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 24, 2024
@vmoens vmoens marked this pull request as ready for review February 24, 2024 01:07
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}16$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.3500μs 16.3749μs 61.0692 KOps/s 61.9580 KOps/s $\color{#d91a1a}-1.43\%$
test_plain_set_stack_nested 50.2140μs 16.5453μs 60.4401 KOps/s 61.1355 KOps/s $\color{#d91a1a}-1.14\%$
test_plain_set_nested_inplace 44.3530μs 18.4361μs 54.2415 KOps/s 53.4918 KOps/s $\color{#35bf28}+1.40\%$
test_plain_set_stack_nested_inplace 52.7380μs 18.4782μs 54.1177 KOps/s 53.3807 KOps/s $\color{#35bf28}+1.38\%$
test_items 33.3120μs 2.4627μs 406.0503 KOps/s 413.8068 KOps/s $\color{#d91a1a}-1.87\%$
test_items_nested 0.8299ms 0.2736ms 3.6552 KOps/s 3.6543 KOps/s $\color{#35bf28}+0.02\%$
test_items_nested_locked 0.3971ms 0.2751ms 3.6350 KOps/s 3.6441 KOps/s $\color{#d91a1a}-0.25\%$
test_items_nested_leaf 0.5192ms 0.1690ms 5.9161 KOps/s 5.9225 KOps/s $\color{#d91a1a}-0.11\%$
test_items_stack_nested 0.5666ms 0.2765ms 3.6166 KOps/s 3.6602 KOps/s $\color{#d91a1a}-1.19\%$
test_items_stack_nested_leaf 0.3347ms 0.1688ms 5.9234 KOps/s 5.9394 KOps/s $\color{#d91a1a}-0.27\%$
test_items_stack_nested_locked 0.9057ms 0.2774ms 3.6050 KOps/s 3.6104 KOps/s $\color{#d91a1a}-0.15\%$
test_keys 18.8050μs 3.8155μs 262.0897 KOps/s 258.6918 KOps/s $\color{#35bf28}+1.31\%$
test_keys_nested 2.1079ms 0.1512ms 6.6130 KOps/s 6.7069 KOps/s $\color{#d91a1a}-1.40\%$
test_keys_nested_locked 0.3322ms 0.1567ms 6.3820 KOps/s 6.5405 KOps/s $\color{#d91a1a}-2.42\%$
test_keys_nested_leaf 32.4515ms 0.1349ms 7.4108 KOps/s 7.7031 KOps/s $\color{#d91a1a}-3.79\%$
test_keys_stack_nested 0.2615ms 0.1521ms 6.5766 KOps/s 6.6750 KOps/s $\color{#d91a1a}-1.47\%$
test_keys_stack_nested_leaf 0.2584ms 0.1309ms 7.6416 KOps/s 7.5636 KOps/s $\color{#35bf28}+1.03\%$
test_keys_stack_nested_locked 0.2708ms 0.1555ms 6.4311 KOps/s 6.4029 KOps/s $\color{#35bf28}+0.44\%$
test_values 5.1745μs 1.1508μs 868.9616 KOps/s 859.0754 KOps/s $\color{#35bf28}+1.15\%$
test_values_nested 0.1082ms 51.8368μs 19.2913 KOps/s 19.2901 KOps/s $+0.01\%$
test_values_nested_locked 95.7790μs 51.6313μs 19.3681 KOps/s 19.0939 KOps/s $\color{#35bf28}+1.44\%$
test_values_nested_leaf 92.9730μs 46.8050μs 21.3652 KOps/s 21.6343 KOps/s $\color{#d91a1a}-1.24\%$
test_values_stack_nested 93.0840μs 52.9033μs 18.9024 KOps/s 19.0528 KOps/s $\color{#d91a1a}-0.79\%$
test_values_stack_nested_leaf 84.8490μs 46.2272μs 21.6323 KOps/s 21.6126 KOps/s $\color{#35bf28}+0.09\%$
test_values_stack_nested_locked 0.1074ms 52.9348μs 18.8912 KOps/s 18.9046 KOps/s $\color{#d91a1a}-0.07\%$
test_membership 33.2020μs 1.3115μs 762.4841 KOps/s 755.6010 KOps/s $\color{#35bf28}+0.91\%$
test_membership_nested 27.0710μs 3.3858μs 295.3506 KOps/s 297.0401 KOps/s $\color{#d91a1a}-0.57\%$
test_membership_nested_leaf 34.8850μs 3.4045μs 293.7265 KOps/s 293.2304 KOps/s $\color{#35bf28}+0.17\%$
test_membership_stacked_nested 20.6980μs 3.3945μs 294.5944 KOps/s 297.0277 KOps/s $\color{#d91a1a}-0.82\%$
test_membership_stacked_nested_leaf 29.8850μs 3.4158μs 292.7531 KOps/s 294.6518 KOps/s $\color{#d91a1a}-0.64\%$
test_membership_nested_last 31.4190μs 6.7341μs 148.4976 KOps/s 152.0650 KOps/s $\color{#d91a1a}-2.35\%$
test_membership_nested_leaf_last 28.3230μs 6.8770μs 145.4133 KOps/s 151.3641 KOps/s $\color{#d91a1a}-3.93\%$
test_membership_stacked_nested_last 31.8900μs 9.6239μs 103.9076 KOps/s 132.5995 KOps/s $\textbf{\color{#d91a1a}-21.64\%}$
test_membership_stacked_nested_leaf_last 32.5110μs 9.5550μs 104.6577 KOps/s 132.2822 KOps/s $\textbf{\color{#d91a1a}-20.88\%}$
test_nested_getleaf 33.5330μs 10.5099μs 95.1481 KOps/s 94.0534 KOps/s $\color{#35bf28}+1.16\%$
test_nested_get 32.6710μs 9.8883μs 101.1293 KOps/s 99.9378 KOps/s $\color{#35bf28}+1.19\%$
test_stacked_getleaf 37.6000μs 10.5526μs 94.7632 KOps/s 94.2998 KOps/s $\color{#35bf28}+0.49\%$
test_stacked_get 36.9290μs 9.9184μs 100.8225 KOps/s 100.5936 KOps/s $\color{#35bf28}+0.23\%$
test_nested_getitemleaf 49.6730μs 11.9106μs 83.9591 KOps/s 82.5527 KOps/s $\color{#35bf28}+1.70\%$
test_nested_getitem 38.7320μs 11.3471μs 88.1284 KOps/s 86.8046 KOps/s $\color{#35bf28}+1.52\%$
test_stacked_getitemleaf 32.5910μs 11.9695μs 83.5458 KOps/s 83.6439 KOps/s $\color{#d91a1a}-0.12\%$
test_stacked_getitem 34.8760μs 11.2546μs 88.8523 KOps/s 87.6650 KOps/s $\color{#35bf28}+1.35\%$
test_lock_nested 0.6940ms 0.3300ms 3.0306 KOps/s 3.0029 KOps/s $\color{#35bf28}+0.92\%$
test_lock_stack_nested 0.5282ms 0.2928ms 3.4149 KOps/s 3.3319 KOps/s $\color{#35bf28}+2.49\%$
test_unlock_nested 76.9517ms 0.4095ms 2.4420 KOps/s 2.4183 KOps/s $\color{#35bf28}+0.98\%$
test_unlock_stack_nested 0.4943ms 0.3021ms 3.3104 KOps/s 3.2416 KOps/s $\color{#35bf28}+2.12\%$
test_flatten_speed 0.6697ms 0.3584ms 2.7900 KOps/s 2.7340 KOps/s $\color{#35bf28}+2.05\%$
test_unflatten_speed 0.7984ms 0.4553ms 2.1963 KOps/s 2.1973 KOps/s $\color{#d91a1a}-0.04\%$
test_common_ops 3.8587ms 0.6529ms 1.5316 KOps/s 1.4726 KOps/s $\color{#35bf28}+4.00\%$
test_creation 18.8150μs 1.8748μs 533.3774 KOps/s 524.9869 KOps/s $\color{#35bf28}+1.60\%$
test_creation_empty 24.0550μs 8.6640μs 115.4203 KOps/s 111.0387 KOps/s $\color{#35bf28}+3.95\%$
test_creation_nested_1 35.7670μs 11.3332μs 88.2362 KOps/s 85.6662 KOps/s $\color{#35bf28}+3.00\%$
test_creation_nested_2 41.0660μs 14.4897μs 69.0144 KOps/s 67.1171 KOps/s $\color{#35bf28}+2.83\%$
test_clone 1.4283ms 12.8330μs 77.9243 KOps/s 76.4794 KOps/s $\color{#35bf28}+1.89\%$
test_getitem[int] 31.1280μs 11.0567μs 90.4425 KOps/s 90.1742 KOps/s $\color{#35bf28}+0.30\%$
test_getitem[slice_int] 64.6000μs 22.3876μs 44.6676 KOps/s 43.3412 KOps/s $\color{#35bf28}+3.06\%$
test_getitem[range] 0.1256ms 41.9495μs 23.8382 KOps/s 24.4231 KOps/s $\color{#d91a1a}-2.39\%$
test_getitem[tuple] 52.0070μs 18.1148μs 55.2035 KOps/s 52.3193 KOps/s $\textbf{\color{#35bf28}+5.51\%}$
test_getitem[list] 0.1411ms 36.0352μs 27.7506 KOps/s 27.0920 KOps/s $\color{#35bf28}+2.43\%$
test_setitem_dim[int] 45.5350μs 27.0684μs 36.9435 KOps/s 34.7113 KOps/s $\textbf{\color{#35bf28}+6.43\%}$
test_setitem_dim[slice_int] 96.9000μs 51.5685μs 19.3917 KOps/s 18.1591 KOps/s $\textbf{\color{#35bf28}+6.79\%}$
test_setitem_dim[range] 99.1650μs 70.3564μs 14.2133 KOps/s 13.9967 KOps/s $\color{#35bf28}+1.55\%$
test_setitem_dim[tuple] 59.6910μs 41.1420μs 24.3061 KOps/s 22.8797 KOps/s $\textbf{\color{#35bf28}+6.23\%}$
test_setitem 68.5680μs 18.2929μs 54.6660 KOps/s 51.6777 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_set 80.8510μs 17.8825μs 55.9206 KOps/s 54.4271 KOps/s $\color{#35bf28}+2.74\%$
test_set_shared 3.9820ms 0.1381ms 7.2392 KOps/s 7.0934 KOps/s $\color{#35bf28}+2.06\%$
test_update 0.1012ms 20.1144μs 49.7155 KOps/s 46.4685 KOps/s $\textbf{\color{#35bf28}+6.99\%}$
test_update_nested 86.2010μs 27.2773μs 36.6605 KOps/s 32.8259 KOps/s $\textbf{\color{#35bf28}+11.68\%}$
test_set_nested 63.3890μs 19.8326μs 50.4222 KOps/s 48.0322 KOps/s $\color{#35bf28}+4.98\%$
test_set_nested_new 78.3560μs 23.4701μs 42.6074 KOps/s 39.2378 KOps/s $\textbf{\color{#35bf28}+8.59\%}$
test_select 99.0040μs 36.5105μs 27.3894 KOps/s 25.2921 KOps/s $\textbf{\color{#35bf28}+8.29\%}$
test_select_nested 0.1156ms 57.2549μs 17.4657 KOps/s 17.1736 KOps/s $\color{#35bf28}+1.70\%$
test_exclude_nested 0.1803ms 0.1173ms 8.5243 KOps/s 8.3865 KOps/s $\color{#35bf28}+1.64\%$
test_empty[True] 0.8773ms 0.4000ms 2.4999 KOps/s 2.4980 KOps/s $\color{#35bf28}+0.07\%$
test_empty[False] 6.4520μs 1.0220μs 978.4730 KOps/s 950.8689 KOps/s $\color{#35bf28}+2.90\%$
test_unbind_speed 0.2802ms 0.2411ms 4.1468 KOps/s 4.1058 KOps/s $\color{#35bf28}+1.00\%$
test_unbind_speed_stack0 0.3345ms 0.2376ms 4.2081 KOps/s 4.1542 KOps/s $\color{#35bf28}+1.30\%$
test_unbind_speed_stack1 0.1195s 0.6593ms 1.5167 KOps/s 1.5149 KOps/s $\color{#35bf28}+0.12\%$
test_split 0.1117s 1.6423ms 608.9162 Ops/s 603.5326 Ops/s $\color{#35bf28}+0.89\%$
test_chunk 2.2722ms 1.4792ms 676.0569 Ops/s 679.8521 Ops/s $\color{#d91a1a}-0.56\%$
test_creation[device0] 0.1747ms 0.1012ms 9.8848 KOps/s 9.7935 KOps/s $\color{#35bf28}+0.93\%$
test_creation_from_tensor 4.0933ms 79.8954μs 12.5164 KOps/s 12.2748 KOps/s $\color{#35bf28}+1.97\%$
test_add_one[memmap_tensor0] 0.1739ms 5.6349μs 177.4654 KOps/s 185.1513 KOps/s $\color{#d91a1a}-4.15\%$
test_contiguous[memmap_tensor0] 34.1640μs 0.6359μs 1.5726 MOps/s 1.5617 MOps/s $\color{#35bf28}+0.70\%$
test_stack[memmap_tensor0] 22.0910μs 3.5942μs 278.2278 KOps/s 279.1515 KOps/s $\color{#d91a1a}-0.33\%$
test_memmaptd_index 0.9578ms 0.2367ms 4.2244 KOps/s 4.2329 KOps/s $\color{#d91a1a}-0.20\%$
test_memmaptd_index_astensor 0.6800ms 0.2958ms 3.3802 KOps/s 3.3447 KOps/s $\color{#35bf28}+1.06\%$
test_memmaptd_index_op 0.8198ms 0.5653ms 1.7690 KOps/s 1.7533 KOps/s $\color{#35bf28}+0.89\%$
test_serialize_model 0.2094s 0.1148s 8.7076 Ops/s 8.6174 Ops/s $\color{#35bf28}+1.05\%$
test_serialize_model_pickle 0.5813s 0.3790s 2.6388 Ops/s 2.6226 Ops/s $\color{#35bf28}+0.62\%$
test_serialize_weights 99.4019ms 95.3655ms 10.4860 Ops/s 10.0666 Ops/s $\color{#35bf28}+4.17\%$
test_serialize_weights_returnearly 0.2310s 0.1320s 7.5765 Ops/s 7.0817 Ops/s $\textbf{\color{#35bf28}+6.99\%}$
test_serialize_weights_pickle 0.9932s 0.5854s 1.7082 Ops/s 2.3880 Ops/s $\textbf{\color{#d91a1a}-28.47\%}$
test_serialize_weights_filesystem 95.4222ms 90.7363ms 11.0210 Ops/s 10.7371 Ops/s $\color{#35bf28}+2.64\%$
test_serialize_model_filesystem 96.3376ms 92.3473ms 10.8287 Ops/s 10.7239 Ops/s $\color{#35bf28}+0.98\%$
test_reshape_pytree 57.7880μs 20.6549μs 48.4147 KOps/s 46.2931 KOps/s $\color{#35bf28}+4.58\%$
test_reshape_td 64.1000μs 30.3850μs 32.9110 KOps/s 31.6299 KOps/s $\color{#35bf28}+4.05\%$
test_view_pytree 48.3000μs 20.3592μs 49.1179 KOps/s 46.4275 KOps/s $\textbf{\color{#35bf28}+5.79\%}$
test_view_td 0.1184s 58.3790μs 17.1295 KOps/s 15.8240 KOps/s $\textbf{\color{#35bf28}+8.25\%}$
test_unbind_pytree 52.4680μs 23.8859μs 41.8656 KOps/s 40.1723 KOps/s $\color{#35bf28}+4.22\%$
test_unbind_td 0.1121ms 35.4282μs 28.2261 KOps/s 27.9018 KOps/s $\color{#35bf28}+1.16\%$
test_split_pytree 56.0450μs 23.3412μs 42.8427 KOps/s 40.7435 KOps/s $\textbf{\color{#35bf28}+5.15\%}$
test_split_td 0.1157ms 38.6215μs 25.8923 KOps/s 25.1240 KOps/s $\color{#35bf28}+3.06\%$
test_add_pytree 64.8910μs 29.3850μs 34.0309 KOps/s 31.9599 KOps/s $\textbf{\color{#35bf28}+6.48\%}$
test_add_td 0.1015ms 49.6750μs 20.1309 KOps/s 19.0176 KOps/s $\textbf{\color{#35bf28}+5.85\%}$
test_distributed 0.1847ms 0.1022ms 9.7848 KOps/s 9.7765 KOps/s $\color{#35bf28}+0.08\%$
test_tdmodule 0.1817ms 20.5680μs 48.6192 KOps/s 46.9587 KOps/s $\color{#35bf28}+3.54\%$
test_tdmodule_dispatch 0.1721ms 40.3280μs 24.7967 KOps/s 24.9908 KOps/s $\color{#d91a1a}-0.78\%$
test_tdseq 0.1141ms 23.8358μs 41.9537 KOps/s 41.4335 KOps/s $\color{#35bf28}+1.26\%$
test_tdseq_dispatch 0.4357ms 43.9845μs 22.7353 KOps/s 22.4858 KOps/s $\color{#35bf28}+1.11\%$
test_instantiation_functorch 1.8978ms 1.3103ms 763.1724 Ops/s 755.4559 Ops/s $\color{#35bf28}+1.02\%$
test_instantiation_td 1.4534ms 0.9899ms 1.0102 KOps/s 983.9282 Ops/s $\color{#35bf28}+2.67\%$
test_exec_functorch 0.2214ms 0.1600ms 6.2501 KOps/s 6.2820 KOps/s $\color{#d91a1a}-0.51\%$
test_exec_functional_call 0.2179ms 0.1467ms 6.8151 KOps/s 6.4688 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_exec_td 0.2631ms 0.1453ms 6.8828 KOps/s 6.6586 KOps/s $\color{#35bf28}+3.37\%$
test_exec_td_decorator 0.6407ms 0.1941ms 5.1509 KOps/s 5.0394 KOps/s $\color{#35bf28}+2.21\%$
test_vmap_mlp_speed[True-True] 0.7103ms 0.4612ms 2.1683 KOps/s 2.1360 KOps/s $\color{#35bf28}+1.51\%$
test_vmap_mlp_speed[True-False] 0.7139ms 0.4602ms 2.1728 KOps/s 2.1403 KOps/s $\color{#35bf28}+1.52\%$
test_vmap_mlp_speed[False-True] 0.6828ms 0.3766ms 2.6553 KOps/s 2.5660 KOps/s $\color{#35bf28}+3.48\%$
test_vmap_mlp_speed[False-False] 0.5831ms 0.3785ms 2.6418 KOps/s 2.5804 KOps/s $\color{#35bf28}+2.38\%$
test_vmap_mlp_speed_decorator[True-True] 1.0072ms 0.5070ms 1.9722 KOps/s 1.9488 KOps/s $\color{#35bf28}+1.20\%$
test_vmap_mlp_speed_decorator[True-False] 0.9649ms 0.5094ms 1.9630 KOps/s 1.9470 KOps/s $\color{#35bf28}+0.82\%$
test_vmap_mlp_speed_decorator[False-True] 0.6340ms 0.3956ms 2.5281 KOps/s 2.4754 KOps/s $\color{#35bf28}+2.13\%$
test_vmap_mlp_speed_decorator[False-False] 0.7015ms 0.3944ms 2.5353 KOps/s 2.4888 KOps/s $\color{#35bf28}+1.87\%$
test_to_module_speed[True] 1.4771ms 1.3848ms 722.1056 Ops/s 721.4228 Ops/s $\color{#35bf28}+0.09\%$
test_to_module_speed[False] 2.1131ms 1.3778ms 725.7983 Ops/s 732.0167 Ops/s $\color{#d91a1a}-0.85\%$

Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}28$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 0.7641ms 13.5440μs 73.8333 KOps/s 69.9380 KOps/s $\textbf{\color{#35bf28}+5.57\%}$
test_plain_set_stack_nested 32.5400μs 13.6511μs 73.2541 KOps/s 68.6942 KOps/s $\textbf{\color{#35bf28}+6.64\%}$
test_plain_set_nested_inplace 41.4300μs 14.9247μs 67.0031 KOps/s 63.2544 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_plain_set_stack_nested_inplace 32.1110μs 14.9405μs 66.9324 KOps/s 63.1931 KOps/s $\textbf{\color{#35bf28}+5.92\%}$
test_items 19.9300μs 4.7879μs 208.8607 KOps/s 210.9928 KOps/s $\color{#d91a1a}-1.01\%$
test_items_nested 0.4086ms 0.3372ms 2.9652 KOps/s 2.9298 KOps/s $\color{#35bf28}+1.21\%$
test_items_nested_locked 0.3646ms 0.3412ms 2.9308 KOps/s 2.9111 KOps/s $\color{#35bf28}+0.68\%$
test_items_nested_leaf 0.2186ms 0.1998ms 5.0054 KOps/s 4.9534 KOps/s $\color{#35bf28}+1.05\%$
test_items_stack_nested 0.3909ms 0.3403ms 2.9385 KOps/s 2.9131 KOps/s $\color{#35bf28}+0.87\%$
test_items_stack_nested_leaf 0.2390ms 0.1997ms 5.0079 KOps/s 4.9780 KOps/s $\color{#35bf28}+0.60\%$
test_items_stack_nested_locked 0.3800ms 0.3422ms 2.9222 KOps/s 2.9118 KOps/s $\color{#35bf28}+0.36\%$
test_keys 22.2810μs 4.5858μs 218.0637 KOps/s 216.8913 KOps/s $\color{#35bf28}+0.54\%$
test_keys_nested 46.0634ms 0.1001ms 9.9897 KOps/s 10.4616 KOps/s $\color{#d91a1a}-4.51\%$
test_keys_nested_locked 0.1526ms 97.7539μs 10.2298 KOps/s 10.1996 KOps/s $\color{#35bf28}+0.30\%$
test_keys_nested_leaf 0.1508ms 77.1270μs 12.9656 KOps/s 12.7755 KOps/s $\color{#35bf28}+1.49\%$
test_keys_stack_nested 0.1225ms 93.6761μs 10.6751 KOps/s 10.5000 KOps/s $\color{#35bf28}+1.67\%$
test_keys_stack_nested_leaf 0.1006ms 78.0683μs 12.8093 KOps/s 12.8274 KOps/s $\color{#d91a1a}-0.14\%$
test_keys_stack_nested_locked 0.1311ms 98.2117μs 10.1821 KOps/s 9.9606 KOps/s $\color{#35bf28}+2.22\%$
test_values 18.5137μs 1.8839μs 530.8015 KOps/s 527.2698 KOps/s $\color{#35bf28}+0.67\%$
test_values_nested 63.1210μs 45.6480μs 21.9068 KOps/s 21.9437 KOps/s $\color{#d91a1a}-0.17\%$
test_values_nested_locked 93.0010μs 48.0390μs 20.8164 KOps/s 20.8876 KOps/s $\color{#d91a1a}-0.34\%$
test_values_nested_leaf 65.5320μs 40.0714μs 24.9554 KOps/s 25.2353 KOps/s $\color{#d91a1a}-1.11\%$
test_values_stack_nested 77.8700μs 46.6377μs 21.4419 KOps/s 21.5638 KOps/s $\color{#d91a1a}-0.57\%$
test_values_stack_nested_leaf 66.6910μs 40.5254μs 24.6759 KOps/s 25.0295 KOps/s $\color{#d91a1a}-1.41\%$
test_values_stack_nested_locked 72.5920μs 48.5727μs 20.5877 KOps/s 20.5256 KOps/s $\color{#35bf28}+0.30\%$
test_membership 3.9440μs 0.9446μs 1.0586 MOps/s 1.0416 MOps/s $\color{#35bf28}+1.63\%$
test_membership_nested 24.8410μs 2.9016μs 344.6333 KOps/s 347.8517 KOps/s $\color{#d91a1a}-0.93\%$
test_membership_nested_leaf 18.2555μs 2.8384μs 352.3109 KOps/s 346.7337 KOps/s $\color{#35bf28}+1.61\%$
test_membership_stacked_nested 24.6710μs 2.8908μs 345.9263 KOps/s 347.5393 KOps/s $\color{#d91a1a}-0.46\%$
test_membership_stacked_nested_leaf 36.8900μs 2.9086μs 343.8086 KOps/s 342.5794 KOps/s $\color{#35bf28}+0.36\%$
test_membership_nested_last 20.2210μs 5.3291μs 187.6495 KOps/s 188.5812 KOps/s $\color{#d91a1a}-0.49\%$
test_membership_nested_leaf_last 45.4510μs 5.2793μs 189.4175 KOps/s 187.6558 KOps/s $\color{#35bf28}+0.94\%$
test_membership_stacked_nested_last 42.4220μs 6.1280μs 163.1845 KOps/s 108.4949 KOps/s $\textbf{\color{#35bf28}+50.41\%}$
test_membership_stacked_nested_leaf_last 40.2200μs 6.0422μs 165.5030 KOps/s 108.7297 KOps/s $\textbf{\color{#35bf28}+52.22\%}$
test_nested_getleaf 37.1300μs 8.3781μs 119.3591 KOps/s 118.2863 KOps/s $\color{#35bf28}+0.91\%$
test_nested_get 29.3000μs 7.8909μs 126.7287 KOps/s 125.9277 KOps/s $\color{#35bf28}+0.64\%$
test_stacked_getleaf 26.9500μs 8.4081μs 118.9330 KOps/s 118.5441 KOps/s $\color{#35bf28}+0.33\%$
test_stacked_get 30.0110μs 7.9358μs 126.0110 KOps/s 125.5489 KOps/s $\color{#35bf28}+0.37\%$
test_nested_getitemleaf 43.1600μs 9.7105μs 102.9815 KOps/s 102.0100 KOps/s $\color{#35bf28}+0.95\%$
test_nested_getitem 36.7010μs 9.3017μs 107.5073 KOps/s 106.9695 KOps/s $\color{#35bf28}+0.50\%$
test_stacked_getitemleaf 33.0210μs 9.7368μs 102.7034 KOps/s 101.7847 KOps/s $\color{#35bf28}+0.90\%$
test_stacked_getitem 42.9720μs 9.3075μs 107.4403 KOps/s 106.8830 KOps/s $\color{#35bf28}+0.52\%$
test_lock_nested 2.1006ms 0.3520ms 2.8408 KOps/s 2.8173 KOps/s $\color{#35bf28}+0.83\%$
test_lock_stack_nested 0.3647ms 0.3069ms 3.2580 KOps/s 3.3052 KOps/s $\color{#d91a1a}-1.43\%$
test_unlock_nested 0.7317ms 0.3515ms 2.8448 KOps/s 2.8701 KOps/s $\color{#d91a1a}-0.88\%$
test_unlock_stack_nested 0.3558ms 0.3157ms 3.1677 KOps/s 3.2027 KOps/s $\color{#d91a1a}-1.10\%$
test_flatten_speed 0.4775ms 0.2605ms 3.8391 KOps/s 3.8372 KOps/s $\color{#35bf28}+0.05\%$
test_unflatten_speed 0.3931ms 0.3569ms 2.8015 KOps/s 2.8330 KOps/s $\color{#d91a1a}-1.11\%$
test_common_ops 1.0306ms 0.6020ms 1.6611 KOps/s 1.5971 KOps/s $\color{#35bf28}+4.01\%$
test_creation 15.0900μs 1.5633μs 639.6745 KOps/s 636.0905 KOps/s $\color{#35bf28}+0.56\%$
test_creation_empty 39.8500μs 8.2950μs 120.5548 KOps/s 102.1440 KOps/s $\textbf{\color{#35bf28}+18.02\%}$
test_creation_nested_1 72.4620μs 10.0210μs 99.7902 KOps/s 86.7409 KOps/s $\textbf{\color{#35bf28}+15.04\%}$
test_creation_nested_2 27.3810μs 12.4501μs 80.3207 KOps/s 71.2992 KOps/s $\textbf{\color{#35bf28}+12.65\%}$
test_clone 33.5000μs 13.9478μs 71.6958 KOps/s 74.2525 KOps/s $\color{#d91a1a}-3.44\%$
test_getitem[int] 51.6710μs 10.7567μs 92.9649 KOps/s 93.7777 KOps/s $\color{#d91a1a}-0.87\%$
test_getitem[slice_int] 42.6310μs 20.8434μs 47.9769 KOps/s 48.0160 KOps/s $\color{#d91a1a}-0.08\%$
test_getitem[range] 66.8810μs 50.3668μs 19.8543 KOps/s 18.7165 KOps/s $\textbf{\color{#35bf28}+6.08\%}$
test_getitem[tuple] 57.2810μs 18.9783μs 52.6917 KOps/s 53.4325 KOps/s $\color{#d91a1a}-1.39\%$
test_getitem[list] 0.1304ms 36.7984μs 27.1751 KOps/s 27.1474 KOps/s $\color{#35bf28}+0.10\%$
test_setitem_dim[int] 42.0110μs 26.3339μs 37.9739 KOps/s 35.8577 KOps/s $\textbf{\color{#35bf28}+5.90\%}$
test_setitem_dim[slice_int] 64.4320μs 46.9276μs 21.3094 KOps/s 19.9135 KOps/s $\textbf{\color{#35bf28}+7.01\%}$
test_setitem_dim[range] 89.3320μs 67.2526μs 14.8693 KOps/s 14.5317 KOps/s $\color{#35bf28}+2.32\%$
test_setitem_dim[tuple] 66.2510μs 41.0494μs 24.3609 KOps/s 23.6686 KOps/s $\color{#35bf28}+2.92\%$
test_setitem 56.8310μs 18.5393μs 53.9393 KOps/s 51.9098 KOps/s $\color{#35bf28}+3.91\%$
test_set 54.6410μs 18.3410μs 54.5228 KOps/s 53.9159 KOps/s $\color{#35bf28}+1.13\%$
test_set_shared 0.1319s 0.1271ms 7.8674 KOps/s 7.7450 KOps/s $\color{#35bf28}+1.58\%$
test_update 66.9420μs 20.4625μs 48.8698 KOps/s 45.4216 KOps/s $\textbf{\color{#35bf28}+7.59\%}$
test_update_nested 64.2410μs 27.4171μs 36.4736 KOps/s 34.9361 KOps/s $\color{#35bf28}+4.40\%$
test_set_nested 69.2210μs 19.5751μs 51.0852 KOps/s 50.1961 KOps/s $\color{#35bf28}+1.77\%$
test_set_nested_new 66.4120μs 23.1389μs 43.2172 KOps/s 44.6246 KOps/s $\color{#d91a1a}-3.15\%$
test_select 78.5220μs 35.1491μs 28.4502 KOps/s 27.9600 KOps/s $\color{#35bf28}+1.75\%$
test_select_nested 89.2520μs 53.8918μs 18.5557 KOps/s 18.3389 KOps/s $\color{#35bf28}+1.18\%$
test_exclude_nested 0.2111ms 0.1158ms 8.6381 KOps/s 8.7619 KOps/s $\color{#d91a1a}-1.41\%$
test_empty[True] 1.0793ms 0.3905ms 2.5608 KOps/s 2.5533 KOps/s $\color{#35bf28}+0.30\%$
test_empty[False] 2.2271μs 0.8494μs 1.1773 MOps/s 1.1537 MOps/s $\color{#35bf28}+2.05\%$
test_to 72.2100μs 53.0048μs 18.8662 KOps/s 17.3145 KOps/s $\textbf{\color{#35bf28}+8.96\%}$
test_to_nonblocking 66.6920μs 35.4567μs 28.2034 KOps/s 28.5494 KOps/s $\color{#d91a1a}-1.21\%$
test_unbind_speed 0.3037ms 0.2703ms 3.6996 KOps/s 3.8274 KOps/s $\color{#d91a1a}-3.34\%$
test_unbind_speed_stack0 0.3538ms 0.2697ms 3.7081 KOps/s 3.8102 KOps/s $\color{#d91a1a}-2.68\%$
test_unbind_speed_stack1 0.7110ms 0.6884ms 1.4526 KOps/s 1.3111 KOps/s $\textbf{\color{#35bf28}+10.79\%}$
test_split 1.5770ms 1.5070ms 663.5608 Ops/s 663.7582 Ops/s $\color{#d91a1a}-0.03\%$
test_chunk 0.1331s 1.7205ms 581.2345 Ops/s 665.0877 Ops/s $\textbf{\color{#d91a1a}-12.61\%}$
test_creation[device0] 0.1241ms 77.6556μs 12.8774 KOps/s 13.8628 KOps/s $\textbf{\color{#d91a1a}-7.11\%}$
test_creation_from_tensor 0.1452ms 58.1733μs 17.1900 KOps/s 17.8574 KOps/s $\color{#d91a1a}-3.74\%$
test_add_one[memmap_tensor0] 0.1138ms 7.1187μs 140.4755 KOps/s 138.2935 KOps/s $\color{#35bf28}+1.58\%$
test_contiguous[memmap_tensor0] 26.8900μs 0.6279μs 1.5927 MOps/s 1.5574 MOps/s $\color{#35bf28}+2.27\%$
test_stack[memmap_tensor0] 46.9620μs 4.5917μs 217.7864 KOps/s 228.5165 KOps/s $\color{#d91a1a}-4.70\%$
test_memmaptd_index 1.0869ms 0.2658ms 3.7620 KOps/s 3.8621 KOps/s $\color{#d91a1a}-2.59\%$
test_memmaptd_index_astensor 0.6343ms 0.3207ms 3.1178 KOps/s 3.1510 KOps/s $\color{#d91a1a}-1.05\%$
test_memmaptd_index_op 0.8963ms 0.6249ms 1.6003 KOps/s 1.5671 KOps/s $\color{#35bf28}+2.12\%$
test_serialize_model 92.6643ms 88.5790ms 11.2894 Ops/s 8.9798 Ops/s $\textbf{\color{#35bf28}+25.72\%}$
test_serialize_model_pickle 1.3533s 1.2363s 0.8089 Ops/s 0.8085 Ops/s $\color{#35bf28}+0.05\%$
test_serialize_weights 90.7596ms 86.4875ms 11.5624 Ops/s 10.8434 Ops/s $\textbf{\color{#35bf28}+6.63\%}$
test_serialize_weights_returnearly 0.3325s 74.0837ms 13.4982 Ops/s 11.7585 Ops/s $\textbf{\color{#35bf28}+14.80\%}$
test_serialize_weights_pickle 1.3544s 1.2485s 0.8010 Ops/s 0.8094 Ops/s $\color{#d91a1a}-1.05\%$
test_reshape_pytree 56.3010μs 24.8879μs 40.1802 KOps/s 40.7495 KOps/s $\color{#d91a1a}-1.40\%$
test_reshape_td 55.5410μs 30.9682μs 32.2912 KOps/s 32.3960 KOps/s $\color{#d91a1a}-0.32\%$
test_view_pytree 52.1010μs 24.5034μs 40.8107 KOps/s 40.8623 KOps/s $\color{#d91a1a}-0.13\%$
test_view_td 0.1310s 55.6893μs 17.9568 KOps/s 21.8615 KOps/s $\textbf{\color{#d91a1a}-17.86\%}$
test_unbind_pytree 61.7010μs 30.2057μs 33.1063 KOps/s 33.0727 KOps/s $\color{#35bf28}+0.10\%$
test_unbind_td 0.3221ms 40.2161μs 24.8657 KOps/s 25.1915 KOps/s $\color{#d91a1a}-1.29\%$
test_split_pytree 50.9510μs 28.1925μs 35.4705 KOps/s 34.8966 KOps/s $\color{#35bf28}+1.64\%$
test_split_td 0.1084ms 38.0418μs 26.2869 KOps/s 25.6861 KOps/s $\color{#35bf28}+2.34\%$
test_add_pytree 62.8920μs 36.5555μs 27.3557 KOps/s 26.7681 KOps/s $\color{#35bf28}+2.19\%$
test_add_td 82.8620μs 50.8771μs 19.6552 KOps/s 18.4770 KOps/s $\textbf{\color{#35bf28}+6.38\%}$
test_distributed 2.3635ms 80.8410μs 12.3700 KOps/s 10.7188 KOps/s $\textbf{\color{#35bf28}+15.40\%}$
test_tdmodule 62.4310μs 18.3156μs 54.5982 KOps/s 53.3268 KOps/s $\color{#35bf28}+2.38\%$
test_tdmodule_dispatch 0.2157ms 36.6716μs 27.2690 KOps/s 25.5772 KOps/s $\textbf{\color{#35bf28}+6.61\%}$
test_tdseq 42.6600μs 20.5516μs 48.6579 KOps/s 45.9331 KOps/s $\textbf{\color{#35bf28}+5.93\%}$
test_tdseq_dispatch 62.7310μs 38.8393μs 25.7471 KOps/s 24.5122 KOps/s $\textbf{\color{#35bf28}+5.04\%}$
test_instantiation_functorch 1.7597ms 1.6551ms 604.1846 Ops/s 599.3502 Ops/s $\color{#35bf28}+0.81\%$
test_instantiation_td 1.7062ms 1.1598ms 862.2502 Ops/s 857.7037 Ops/s $\color{#35bf28}+0.53\%$
test_exec_functorch 0.1816ms 0.1596ms 6.2659 KOps/s 6.2709 KOps/s $\color{#d91a1a}-0.08\%$
test_exec_functional_call 0.2207ms 0.1612ms 6.2017 KOps/s 6.2738 KOps/s $\color{#d91a1a}-1.15\%$
test_exec_td 0.1770ms 0.1474ms 6.7850 KOps/s 6.6404 KOps/s $\color{#35bf28}+2.18\%$
test_exec_td_decorator 0.7689ms 0.1962ms 5.0964 KOps/s 5.0839 KOps/s $\color{#35bf28}+0.25\%$
test_vmap_mlp_speed[True-True] 0.6859ms 0.5965ms 1.6765 KOps/s 1.5750 KOps/s $\textbf{\color{#35bf28}+6.44\%}$
test_vmap_mlp_speed[True-False] 0.7494ms 0.6053ms 1.6519 KOps/s 1.5743 KOps/s $\color{#35bf28}+4.93\%$
test_vmap_mlp_speed[False-True] 0.5912ms 0.5264ms 1.8998 KOps/s 1.7927 KOps/s $\textbf{\color{#35bf28}+5.97\%}$
test_vmap_mlp_speed[False-False] 0.5789ms 0.5287ms 1.8913 KOps/s 1.8013 KOps/s $\textbf{\color{#35bf28}+5.00\%}$
test_vmap_mlp_speed_decorator[True-True] 0.7322ms 0.6405ms 1.5612 KOps/s 1.4997 KOps/s $\color{#35bf28}+4.10\%$
test_vmap_mlp_speed_decorator[True-False] 0.8656ms 0.6433ms 1.5545 KOps/s 1.4451 KOps/s $\textbf{\color{#35bf28}+7.57\%}$
test_vmap_mlp_speed_decorator[False-True] 0.6775ms 0.5524ms 1.8104 KOps/s 1.7106 KOps/s $\textbf{\color{#35bf28}+5.83\%}$
test_vmap_mlp_speed_decorator[False-False] 0.8468ms 0.5478ms 1.8255 KOps/s 1.7717 KOps/s $\color{#35bf28}+3.03\%$
test_vmap_transformer_speed[True-True] 8.2609ms 8.0654ms 123.9863 Ops/s 122.9843 Ops/s $\color{#35bf28}+0.81\%$
test_vmap_transformer_speed[True-False] 8.3014ms 8.0400ms 124.3785 Ops/s 123.6544 Ops/s $\color{#35bf28}+0.59\%$
test_vmap_transformer_speed[False-True] 8.1447ms 7.9872ms 125.2007 Ops/s 124.5661 Ops/s $\color{#35bf28}+0.51\%$
test_vmap_transformer_speed[False-False] 8.2751ms 7.9972ms 125.0444 Ops/s 124.5428 Ops/s $\color{#35bf28}+0.40\%$
test_vmap_transformer_speed_decorator[True-True] 19.4455ms 19.2050ms 52.0698 Ops/s 51.5052 Ops/s $\color{#35bf28}+1.10\%$
test_vmap_transformer_speed_decorator[True-False] 19.7489ms 19.2238ms 52.0188 Ops/s 51.7326 Ops/s $\color{#35bf28}+0.55\%$
test_vmap_transformer_speed_decorator[False-True] 19.4404ms 18.8594ms 53.0240 Ops/s 52.7744 Ops/s $\color{#35bf28}+0.47\%$
test_vmap_transformer_speed_decorator[False-False] 19.0047ms 18.7590ms 53.3077 Ops/s 52.8523 Ops/s $\color{#35bf28}+0.86\%$
test_to_module_speed[True] 2.9417ms 1.2459ms 802.6521 Ops/s 800.1434 Ops/s $\color{#35bf28}+0.31\%$
test_to_module_speed[False] 1.3115ms 1.2109ms 825.8025 Ops/s 816.9872 Ops/s $\color{#35bf28}+1.08\%$

@vmoens vmoens added the bug Something isn't working label Feb 24, 2024
@vmoens vmoens merged commit f09f20d into main Feb 24, 2024
47 of 48 checks passed
@vmoens vmoens deleted the no-names-in-gather branch February 24, 2024 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants