Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Store non tensor stacks in a single json #711

Merged
merged 12 commits into from
Mar 18, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Mar 15, 2024

This is a bit bc breaking for non tensor data as we're still trying to figure out what is the right API for these things.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 15, 2024
Copy link

github-actions bot commented Mar 15, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}5$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 36.4480μs 15.8043μs 63.2739 KOps/s 62.4281 KOps/s $\color{#35bf28}+1.35\%$
test_plain_set_stack_nested 40.8270μs 16.1954μs 61.7458 KOps/s 62.1366 KOps/s $\color{#d91a1a}-0.63\%$
test_plain_set_nested_inplace 52.9400μs 18.0235μs 55.4831 KOps/s 54.7810 KOps/s $\color{#35bf28}+1.28\%$
test_plain_set_stack_nested_inplace 53.3410μs 18.1942μs 54.9626 KOps/s 54.1597 KOps/s $\color{#35bf28}+1.48\%$
test_items 22.3320μs 2.4351μs 410.6637 KOps/s 409.1142 KOps/s $\color{#35bf28}+0.38\%$
test_items_nested 0.8223ms 0.2741ms 3.6489 KOps/s 3.6511 KOps/s $\color{#d91a1a}-0.06\%$
test_items_nested_locked 0.4661ms 0.2740ms 3.6494 KOps/s 3.6294 KOps/s $\color{#35bf28}+0.55\%$
test_items_nested_leaf 0.5324ms 0.1678ms 5.9608 KOps/s 5.8748 KOps/s $\color{#35bf28}+1.46\%$
test_items_stack_nested 0.8453ms 0.2769ms 3.6113 KOps/s 3.6234 KOps/s $\color{#d91a1a}-0.33\%$
test_items_stack_nested_leaf 0.3489ms 0.1669ms 5.9927 KOps/s 5.9357 KOps/s $\color{#35bf28}+0.96\%$
test_items_stack_nested_locked 0.4361ms 0.2788ms 3.5865 KOps/s 3.6334 KOps/s $\color{#d91a1a}-1.29\%$
test_keys 23.9350μs 3.8566μs 259.2978 KOps/s 262.9954 KOps/s $\color{#d91a1a}-1.41\%$
test_keys_nested 2.1996ms 0.1433ms 6.9772 KOps/s 6.9373 KOps/s $\color{#35bf28}+0.57\%$
test_keys_nested_locked 0.2458ms 0.1484ms 6.7399 KOps/s 6.7751 KOps/s $\color{#d91a1a}-0.52\%$
test_keys_nested_leaf 32.5339ms 0.1303ms 7.6755 KOps/s 7.9437 KOps/s $\color{#d91a1a}-3.38\%$
test_keys_stack_nested 0.2939ms 0.1457ms 6.8656 KOps/s 6.7991 KOps/s $\color{#35bf28}+0.98\%$
test_keys_stack_nested_leaf 0.2371ms 0.1270ms 7.8740 KOps/s 7.8843 KOps/s $\color{#d91a1a}-0.13\%$
test_keys_stack_nested_locked 0.3077ms 0.1498ms 6.6753 KOps/s 6.6187 KOps/s $\color{#35bf28}+0.85\%$
test_values 7.3098μs 1.1297μs 885.1984 KOps/s 863.5295 KOps/s $\color{#35bf28}+2.51\%$
test_values_nested 0.1080ms 50.9599μs 19.6233 KOps/s 19.8492 KOps/s $\color{#d91a1a}-1.14\%$
test_values_nested_locked 0.1022ms 50.9892μs 19.6120 KOps/s 19.7333 KOps/s $\color{#d91a1a}-0.61\%$
test_values_nested_leaf 93.8560μs 45.8848μs 21.7937 KOps/s 21.9484 KOps/s $\color{#d91a1a}-0.70\%$
test_values_stack_nested 0.1039ms 51.6677μs 19.3545 KOps/s 19.2895 KOps/s $\color{#35bf28}+0.34\%$
test_values_stack_nested_leaf 0.1005ms 44.9425μs 22.2506 KOps/s 21.8205 KOps/s $\color{#35bf28}+1.97\%$
test_values_stack_nested_locked 96.4820μs 51.1440μs 19.5526 KOps/s 19.4794 KOps/s $\color{#35bf28}+0.38\%$
test_membership 21.4810μs 1.3321μs 750.7053 KOps/s 751.0741 KOps/s $\color{#d91a1a}-0.05\%$
test_membership_nested 25.2780μs 3.3649μs 297.1898 KOps/s 292.8493 KOps/s $\color{#35bf28}+1.48\%$
test_membership_nested_leaf 17.7840μs 3.4232μs 292.1236 KOps/s 293.7599 KOps/s $\color{#d91a1a}-0.56\%$
test_membership_stacked_nested 19.8670μs 3.4053μs 293.6605 KOps/s 280.6825 KOps/s $\color{#35bf28}+4.62\%$
test_membership_stacked_nested_leaf 26.3690μs 3.3642μs 297.2442 KOps/s 293.8735 KOps/s $\color{#35bf28}+1.15\%$
test_membership_nested_last 38.9160μs 4.1942μs 238.4233 KOps/s 237.8663 KOps/s $\color{#35bf28}+0.23\%$
test_membership_nested_leaf_last 26.9500μs 4.1535μs 240.7628 KOps/s 237.5779 KOps/s $\color{#35bf28}+1.34\%$
test_membership_stacked_nested_last 29.0650μs 8.6205μs 116.0023 KOps/s 210.3603 KOps/s $\textbf{\color{#d91a1a}-44.86\%}$
test_membership_stacked_nested_leaf_last 45.0350μs 8.7154μs 114.7392 KOps/s 208.7821 KOps/s $\textbf{\color{#d91a1a}-45.04\%}$
test_nested_getleaf 31.7000μs 10.5203μs 95.0544 KOps/s 93.8291 KOps/s $\color{#35bf28}+1.31\%$
test_nested_get 53.6720μs 9.9213μs 100.7929 KOps/s 97.8588 KOps/s $\color{#35bf28}+3.00\%$
test_stacked_getleaf 43.8020μs 10.4459μs 95.7312 KOps/s 95.0919 KOps/s $\color{#35bf28}+0.67\%$
test_stacked_get 41.1880μs 9.9371μs 100.6327 KOps/s 100.2359 KOps/s $\color{#35bf28}+0.40\%$
test_nested_getitemleaf 31.1980μs 11.0325μs 90.6411 KOps/s 90.6514 KOps/s $\color{#d91a1a}-0.01\%$
test_nested_getitem 42.2800μs 10.0658μs 99.3464 KOps/s 97.1480 KOps/s $\color{#35bf28}+2.26\%$
test_stacked_getitemleaf 48.7320μs 10.7886μs 92.6904 KOps/s 91.0860 KOps/s $\color{#35bf28}+1.76\%$
test_stacked_getitem 48.8320μs 9.9997μs 100.0029 KOps/s 99.3059 KOps/s $\color{#35bf28}+0.70\%$
test_lock_nested 0.6962ms 0.3309ms 3.0225 KOps/s 2.9681 KOps/s $\color{#35bf28}+1.83\%$
test_lock_stack_nested 0.3374ms 0.2895ms 3.4540 KOps/s 3.3403 KOps/s $\color{#35bf28}+3.40\%$
test_unlock_nested 82.5033ms 0.4158ms 2.4050 KOps/s 2.4117 KOps/s $\color{#d91a1a}-0.28\%$
test_unlock_stack_nested 0.6280ms 0.2980ms 3.3552 KOps/s 3.2617 KOps/s $\color{#35bf28}+2.87\%$
test_flatten_speed 0.5710ms 0.2591ms 3.8595 KOps/s 3.7521 KOps/s $\color{#35bf28}+2.86\%$
test_unflatten_speed 0.5276ms 0.3968ms 2.5201 KOps/s 2.4729 KOps/s $\color{#35bf28}+1.91\%$
test_common_ops 4.2032ms 0.6667ms 1.4999 KOps/s 1.5308 KOps/s $\color{#d91a1a}-2.02\%$
test_creation 13.1550μs 1.7903μs 558.5761 KOps/s 545.7989 KOps/s $\color{#35bf28}+2.34\%$
test_creation_empty 24.3770μs 9.1850μs 108.8737 KOps/s 112.4051 KOps/s $\color{#d91a1a}-3.14\%$
test_creation_nested_1 32.2410μs 11.7201μs 85.3237 KOps/s 87.6508 KOps/s $\color{#d91a1a}-2.66\%$
test_creation_nested_2 38.5720μs 15.0965μs 66.2406 KOps/s 68.6818 KOps/s $\color{#d91a1a}-3.55\%$
test_clone 64.1910μs 13.4530μs 74.3330 KOps/s 77.5163 KOps/s $\color{#d91a1a}-4.11\%$
test_getitem[int] 27.3920μs 10.9335μs 91.4621 KOps/s 92.7174 KOps/s $\color{#d91a1a}-1.35\%$
test_getitem[slice_int] 53.8920μs 22.0049μs 45.4444 KOps/s 43.4662 KOps/s $\color{#35bf28}+4.55\%$
test_getitem[range] 0.1461ms 40.8466μs 24.4818 KOps/s 23.9252 KOps/s $\color{#35bf28}+2.33\%$
test_getitem[tuple] 42.6000μs 18.1542μs 55.0838 KOps/s 54.1261 KOps/s $\color{#35bf28}+1.77\%$
test_getitem[list] 0.1359ms 35.3638μs 28.2775 KOps/s 26.9814 KOps/s $\color{#35bf28}+4.80\%$
test_setitem_dim[int] 75.4020μs 33.7008μs 29.6729 KOps/s 31.3556 KOps/s $\textbf{\color{#d91a1a}-5.37\%}$
test_setitem_dim[slice_int] 93.4360μs 59.4315μs 16.8261 KOps/s 16.9972 KOps/s $\color{#d91a1a}-1.01\%$
test_setitem_dim[range] 0.1817ms 77.4397μs 12.9133 KOps/s 13.1449 KOps/s $\color{#d91a1a}-1.76\%$
test_setitem_dim[tuple] 76.3740μs 47.2685μs 21.1557 KOps/s 20.9040 KOps/s $\color{#35bf28}+1.20\%$
test_setitem 96.0210μs 18.9854μs 52.6721 KOps/s 53.2002 KOps/s $\color{#d91a1a}-0.99\%$
test_set 69.2600μs 18.4768μs 54.1220 KOps/s 54.8988 KOps/s $\color{#d91a1a}-1.41\%$
test_set_shared 2.1640ms 0.1398ms 7.1552 KOps/s 7.2911 KOps/s $\color{#d91a1a}-1.86\%$
test_update 81.3530μs 21.3576μs 46.8217 KOps/s 49.1787 KOps/s $\color{#d91a1a}-4.79\%$
test_update_nested 76.0740μs 28.3487μs 35.2749 KOps/s 36.5758 KOps/s $\color{#d91a1a}-3.56\%$
test_update__nested 0.1134ms 24.9928μs 40.0115 KOps/s 41.7255 KOps/s $\color{#d91a1a}-4.11\%$
test_set_nested 66.5260μs 20.2194μs 49.4574 KOps/s 49.3227 KOps/s $\color{#35bf28}+0.27\%$
test_set_nested_new 68.2680μs 23.9076μs 41.8277 KOps/s 41.3960 KOps/s $\color{#35bf28}+1.04\%$
test_select 0.9170ms 38.1018μs 26.2455 KOps/s 25.9400 KOps/s $\color{#35bf28}+1.18\%$
test_select_nested 0.1157ms 59.1278μs 16.9125 KOps/s 17.0651 KOps/s $\color{#d91a1a}-0.89\%$
test_exclude_nested 0.2559ms 0.1173ms 8.5252 KOps/s 8.5627 KOps/s $\color{#d91a1a}-0.44\%$
test_empty[True] 0.7126ms 0.4126ms 2.4238 KOps/s 2.4812 KOps/s $\color{#d91a1a}-2.31\%$
test_empty[False] 6.8550μs 1.0221μs 978.3891 KOps/s 976.2184 KOps/s $\color{#35bf28}+0.22\%$
test_unbind_speed 0.4845ms 0.2473ms 4.0437 KOps/s 4.0977 KOps/s $\color{#d91a1a}-1.32\%$
test_unbind_speed_stack0 0.3945ms 0.2341ms 4.2711 KOps/s 4.1414 KOps/s $\color{#35bf28}+3.13\%$
test_unbind_speed_stack1 0.1151s 0.6664ms 1.5006 KOps/s 1.4658 KOps/s $\color{#35bf28}+2.37\%$
test_split 0.1224s 1.6525ms 605.1614 Ops/s 618.0591 Ops/s $\color{#d91a1a}-2.09\%$
test_chunk 2.3012ms 1.4640ms 683.0486 Ops/s 694.3396 Ops/s $\color{#d91a1a}-1.63\%$
test_creation[device0] 0.1703ms 99.6367μs 10.0365 KOps/s 9.8778 KOps/s $\color{#35bf28}+1.61\%$
test_creation_from_tensor 5.0115ms 81.2662μs 12.3052 KOps/s 12.0943 KOps/s $\color{#35bf28}+1.74\%$
test_add_one[memmap_tensor0] 99.9190μs 5.2641μs 189.9671 KOps/s 185.1686 KOps/s $\color{#35bf28}+2.59\%$
test_contiguous[memmap_tensor0] 18.9450μs 0.6343μs 1.5766 MOps/s 1.6141 MOps/s $\color{#d91a1a}-2.32\%$
test_stack[memmap_tensor0] 30.2870μs 3.6302μs 275.4641 KOps/s 288.7722 KOps/s $\color{#d91a1a}-4.61\%$
test_memmaptd_index 0.9065ms 0.2431ms 4.1130 KOps/s 4.1608 KOps/s $\color{#d91a1a}-1.15\%$
test_memmaptd_index_astensor 0.7331ms 0.3073ms 3.2537 KOps/s 3.3200 KOps/s $\color{#d91a1a}-2.00\%$
test_memmaptd_index_op 0.9220ms 0.5750ms 1.7391 KOps/s 1.7718 KOps/s $\color{#d91a1a}-1.84\%$
test_serialize_model 0.2119s 0.1111s 8.9997 Ops/s 8.5924 Ops/s $\color{#35bf28}+4.74\%$
test_serialize_model_pickle 0.4453s 0.3756s 2.6628 Ops/s 2.6274 Ops/s $\color{#35bf28}+1.34\%$
test_serialize_weights 0.1010s 96.4873ms 10.3641 Ops/s 9.8020 Ops/s $\textbf{\color{#35bf28}+5.73\%}$
test_serialize_weights_returnearly 0.2377s 0.1333s 7.5037 Ops/s 7.9758 Ops/s $\textbf{\color{#d91a1a}-5.92\%}$
test_serialize_weights_pickle 0.4479s 0.4246s 2.3552 Ops/s 2.3254 Ops/s $\color{#35bf28}+1.28\%$
test_serialize_weights_filesystem 99.0030ms 92.7998ms 10.7759 Ops/s 9.3193 Ops/s $\textbf{\color{#35bf28}+15.63\%}$
test_serialize_model_filesystem 99.6371ms 93.5186ms 10.6931 Ops/s 10.4365 Ops/s $\color{#35bf28}+2.46\%$
test_reshape_pytree 57.9390μs 21.0132μs 47.5891 KOps/s 44.0209 KOps/s $\textbf{\color{#35bf28}+8.11\%}$
test_reshape_td 71.2740μs 31.4114μs 31.8356 KOps/s 30.6949 KOps/s $\color{#35bf28}+3.72\%$
test_view_pytree 59.6020μs 20.8434μs 47.9769 KOps/s 48.5233 KOps/s $\color{#d91a1a}-1.13\%$
test_view_td 0.1205s 61.2130μs 16.3364 KOps/s 15.4206 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_unbind_pytree 57.6590μs 23.9292μs 41.7900 KOps/s 40.8831 KOps/s $\color{#35bf28}+2.22\%$
test_unbind_td 0.1073ms 35.8367μs 27.9044 KOps/s 27.3140 KOps/s $\color{#35bf28}+2.16\%$
test_split_pytree 70.2220μs 23.8516μs 41.9259 KOps/s 42.2429 KOps/s $\color{#d91a1a}-0.75\%$
test_split_td 0.1167ms 39.7695μs 25.1449 KOps/s 25.5151 KOps/s $\color{#d91a1a}-1.45\%$
test_add_pytree 81.1730μs 29.1092μs 34.3534 KOps/s 33.8144 KOps/s $\color{#35bf28}+1.59\%$
test_add_td 0.1788ms 50.0597μs 19.9761 KOps/s 19.8707 KOps/s $\color{#35bf28}+0.53\%$
test_distributed 0.2182ms 99.4136μs 10.0590 KOps/s 9.7964 KOps/s $\color{#35bf28}+2.68\%$
test_tdmodule 80.3210μs 16.7316μs 59.7670 KOps/s 60.0200 KOps/s $\color{#d91a1a}-0.42\%$
test_tdmodule_dispatch 64.9430μs 33.1649μs 30.1524 KOps/s 29.7156 KOps/s $\color{#35bf28}+1.47\%$
test_tdseq 48.6920μs 19.9730μs 50.0676 KOps/s 50.7389 KOps/s $\color{#d91a1a}-1.32\%$
test_tdseq_dispatch 62.8780μs 38.4649μs 25.9978 KOps/s 25.8055 KOps/s $\color{#35bf28}+0.75\%$
test_instantiation_functorch 1.4898ms 1.2954ms 771.9439 Ops/s 767.6616 Ops/s $\color{#35bf28}+0.56\%$
test_instantiation_td 1.5420ms 0.9968ms 1.0032 KOps/s 847.3705 Ops/s $\textbf{\color{#35bf28}+18.39\%}$
test_exec_functorch 0.3142ms 0.1561ms 6.4061 KOps/s 6.3241 KOps/s $\color{#35bf28}+1.30\%$
test_exec_functional_call 0.2838ms 0.1460ms 6.8477 KOps/s 6.5477 KOps/s $\color{#35bf28}+4.58\%$
test_exec_td 0.2562ms 0.1409ms 7.0959 KOps/s 6.9034 KOps/s $\color{#35bf28}+2.79\%$
test_exec_td_decorator 0.7009ms 0.1937ms 5.1639 KOps/s 4.9332 KOps/s $\color{#35bf28}+4.68\%$
test_vmap_mlp_speed[True-True] 0.6684ms 0.4725ms 2.1163 KOps/s 2.1220 KOps/s $\color{#d91a1a}-0.27\%$
test_vmap_mlp_speed[True-False] 0.7297ms 0.4702ms 2.1268 KOps/s 2.1513 KOps/s $\color{#d91a1a}-1.14\%$
test_vmap_mlp_speed[False-True] 0.5900ms 0.3864ms 2.5881 KOps/s 2.6494 KOps/s $\color{#d91a1a}-2.31\%$
test_vmap_mlp_speed[False-False] 0.5835ms 0.3876ms 2.5801 KOps/s 2.6265 KOps/s $\color{#d91a1a}-1.77\%$
test_vmap_mlp_speed_decorator[True-True] 0.9609ms 0.4873ms 2.0521 KOps/s 2.0771 KOps/s $\color{#d91a1a}-1.20\%$
test_vmap_mlp_speed_decorator[True-False] 0.8006ms 0.4877ms 2.0506 KOps/s 2.0740 KOps/s $\color{#d91a1a}-1.13\%$
test_vmap_mlp_speed_decorator[False-True] 0.6245ms 0.3986ms 2.5087 KOps/s 2.5397 KOps/s $\color{#d91a1a}-1.22\%$
test_vmap_mlp_speed_decorator[False-False] 0.6469ms 0.3982ms 2.5111 KOps/s 2.5349 KOps/s $\color{#d91a1a}-0.94\%$
test_to_module_speed[True] 2.1652ms 1.3735ms 728.0436 Ops/s 734.4915 Ops/s $\color{#d91a1a}-0.88\%$
test_to_module_speed[False] 1.4398ms 1.3495ms 741.0337 Ops/s 745.1109 Ops/s $\color{#d91a1a}-0.55\%$

@vmoens vmoens added bug Something isn't working BC-breaking labels Mar 18, 2024
@vmoens vmoens merged commit ca4256e into main Mar 18, 2024
45 of 48 checks passed
@vmoens vmoens deleted the nontensor-stack-memmap branch March 18, 2024 14:03
vmoens added a commit that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BC-breaking bug Something isn't working CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants