{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":150154628,"defaultBranch":"main","name":"FBGEMM","ownerLogin":"pytorch","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2018-09-24T19:07:42.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/21003710?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1720719171.0","currentOid":""},"activityList":{"items":[{"before":"635825f3bd8e2e2653f2c25c57e5c76e847e885e","after":"a6a6a5c1e98778bd9f111a43620969522bbb7662","ref":"refs/heads/export-D38010326","pushedAt":"2024-07-11T17:35:18.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Back out \"Back out \"[KT.regroup Ops][1-2/N] benchmark of fbgemm operator\"\"\n\nSummary:\n# context\nreland the diff stack\n\nDifferential Revision: D38010326","shortMessageHtmlLink":"Back out \"Back out \"[KT.regroup Ops][1-2/N] benchmark of fbgemm opera…"}},{"before":null,"after":"635825f3bd8e2e2653f2c25c57e5c76e847e885e","ref":"refs/heads/export-D38010326","pushedAt":"2024-07-11T17:32:51.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Summary: # context\n\nDifferential Revision: D38010326","shortMessageHtmlLink":"Summary: # context"}},{"before":"af511999f480b0b1c269569d84b5d81260db4c38","after":"85a6df1f20b7ae6112fcd0f321717ed8b2c1a4da","ref":"refs/heads/gh-pages","pushedAt":"2024-07-11T17:06:59.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@8cf6133fd77d3559e6a51c7cf30f09afbd5a373d πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ 8cf6133 πŸš€"}},{"before":"9ad57eed50657046303ae8b2921d3b31346140a0","after":"af511999f480b0b1c269569d84b5d81260db4c38","ref":"refs/heads/gh-pages","pushedAt":"2024-07-11T17:00:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@eb97848f3ba0f1a45c057f84e95bffbc61c0a2cb πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ eb97848 πŸš€"}},{"before":"eb97848f3ba0f1a45c057f84e95bffbc61c0a2cb","after":"8cf6133fd77d3559e6a51c7cf30f09afbd5a373d","ref":"refs/heads/main","pushedAt":"2024-07-11T16:57:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Implement SDPA kernel wrapper to use run_kernel flow for perf (#2820)\n\nSummary:\nX-link: https://github.com/facebookresearch/FBGEMM/pull/21\n\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2820\n\nImplments a custom kernel wrapper around scaled_dot_product_attention to directly use scaled_dot_product_attention athena kernel implementation.\n\nReviewed By: jiyuanzFB\n\nDifferential Revision: D59565188\n\nfbshipit-source-id: df79e84576e0d3163d8b97b9eaaf5f50eb1d4071","shortMessageHtmlLink":"Implement SDPA kernel wrapper to use run_kernel flow for perf (#2820)"}},{"before":"8889d445832a6a40cc82011a07b3fdea3f2c5ccb","after":"eb97848f3ba0f1a45c057f84e95bffbc61c0a2cb","ref":"refs/heads/main","pushedAt":"2024-07-11T16:51:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Back out \"benchmark of fbgemm operator\" (#2826)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2826\n\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2825\n\nX-link: https://github.com/pytorch/torchrec/pull/2221\n\nrevert to mitigate S432472\n\nReviewed By: gnahzg, ErickLuo90\n\nDifferential Revision: D59618680\n\nfbshipit-source-id: 41762b15136c93146b7ddf11566452cf739b7540","shortMessageHtmlLink":"Back out \"benchmark of fbgemm operator\" (#2826)"}},{"before":null,"after":"13e66d6a97914f4395aa5df305bb078ef063c9ae","ref":"refs/heads/export-D59618680","pushedAt":"2024-07-11T15:11:25.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Back out \"benchmark of fbgemm o\n\nSummary:\nX-link: https://github.com/pytorch/torchrec/pull/2221\n\nrevert to mitigate S432472\n\nReviewed By: gnahzg, ErickLuo90\n\nDifferential Revision: D59618680","shortMessageHtmlLink":"Back out \"benchmark of fbgemm o"}},{"before":"2d32fcf590606eb696f107cc98eb012df305fb50","after":"3247a36bea973c2e540acb5543ce510532174c03","ref":"refs/heads/nightly","pushedAt":"2024-07-11T11:35:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pytorchbot","name":null,"path":"/pytorchbot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/21957446?s=80&v=4"},"commit":{"message":"2024-07-11 nightly release (8889d445832a6a40cc82011a07b3fdea3f2c5ccb)","shortMessageHtmlLink":"2024-07-11 nightly release (8889d44)"}},{"before":"fd82a85a708d8a3a89c69a8aabcc00c14959a918","after":"9ad57eed50657046303ae8b2921d3b31346140a0","ref":"refs/heads/gh-pages","pushedAt":"2024-07-11T07:10:16.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@8889d445832a6a40cc82011a07b3fdea3f2c5ccb πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ 8889d44 πŸš€"}},{"before":"0285d2855e1cad98d93875c5686773c23eb3fbc2","after":"8889d445832a6a40cc82011a07b3fdea3f2c5ccb","ref":"refs/heads/main","pushedAt":"2024-07-11T07:00:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Break up `fbgemm_cuda_utils.cuh`, pt 10 (#2814)\n\nSummary:\nX-link: https://github.com/pytorch/pytorch/pull/130469\n\nX-link: https://github.com/pytorch/pytorch/pull/130468\n\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2814\n\nX-link: https://github.com/facebookresearch/FBGEMM/pull/19\n\n- Break up `fbgemm_cuda_utils.cuh`, pt 10\n\nReviewed By: spcyppt\n\nDifferential Revision: D59545097\n\nfbshipit-source-id: a7bd6725afddc3c1bdc0567c02449e7a8fe13d32","shortMessageHtmlLink":"Break up fbgemm_cuda_utils.cuh, pt 10 (#2814)"}},{"before":"579eab4f5ad98c244e7848b63b88821b4cd772ec","after":"fd82a85a708d8a3a89c69a8aabcc00c14959a918","ref":"refs/heads/gh-pages","pushedAt":"2024-07-10T23:30:58.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@0285d2855e1cad98d93875c5686773c23eb3fbc2 πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ 0285d28 πŸš€"}},{"before":"b903979543526edbcde7b9fa512dbe3244f494e9","after":"0285d2855e1cad98d93875c5686773c23eb3fbc2","ref":"refs/heads/main","pushedAt":"2024-07-10T23:21:23.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Revert D54917144: Multisect successfully blamed \"D54917144: [TorchRec] clean up unused permute_duplicate_pooled_embs\" for one test failure\n\nSummary:\nThis diff reverts D54917144\nD54917144: [TorchRec] clean up unused permute_duplicate_pooled_embs by TroyGarden causes the following test failure:\n\nTests affected:\n- [cogwheel:cogwheel_model_import_inference_ads_v0_test#test_ads_v0_inference_model_import](https://www.internalfb.com/intern/test/281475118337004/)\n\nHere's the Multisect link:\nhttps://www.internalfb.com/multisect/6097703\nHere are the tasks that are relevant to this breakage:\nT191380983: 100+ tests unhealthy for ai_test_validation\n\nThe backout may land if someone accepts it.\n\nIf this diff has been generated in error, you can Commandeer and Abandon it.\n\nReviewed By: drdarshan\n\nDifferential Revision: D59582940\n\nfbshipit-source-id: e1676cedb66f387327e40bf32858c91df2b72729","shortMessageHtmlLink":"Revert D54917144: Multisect successfully blamed \"D54917144: [TorchRec…"}},{"before":null,"after":"39b6f46b584f2fe03915a78332a1ae2526b3d2a7","ref":"refs/heads/export-D55277833","pushedAt":"2024-07-10T16:55:56.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"use at::parallel_for in cpu kernel\n\nSummary:\n# context\n* use `at::parallel_for` to run a parallel threading in CPU kernel\n\nNOTE: variables declared before at::parallel_for would be shared by the parallel threads.\n* benchmark\npython generic method (_regroup_keyed_tenors) is actually the best. \n```\n _regroup_keyed_tenors | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 160.37 ms | Memory (P90): 0.0\n KeyedTensor.regroup | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 212.82 ms | Memory (P90): 0.0\n KTRegroupAsDict | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 175.38 ms | Memory (P90): 0.0\n permute_multi_embs | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 553.27 ms | Memory (P90): 0.0\n _regroup_keyed_tenors_dup | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 142.05 ms | Memory (P90): 0.0\n KeyedTensor.regroup_dup | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 141.69 ms | Memory (P90): 0.0\n KTRegroupAsDict_dup | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 133.37 ms | Memory (P90): 0.0\n permute_multi_embs_dup | B: 1024 | F: 1020 | device: cpu | Runtime (P90): 572.00 ms | Memory (P90): 0.0\n```\n\nDifferential Revision: D55277833","shortMessageHtmlLink":"use at::parallel_for in cpu kernel"}},{"before":"5947b1968fa74d352556f0ba2f6dd075e3a36e4b","after":"579eab4f5ad98c244e7848b63b88821b4cd772ec","ref":"refs/heads/gh-pages","pushedAt":"2024-07-10T15:32:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@b903979543526edbcde7b9fa512dbe3244f494e9 πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ b903979 πŸš€"}},{"before":"1b63049cc804f0dd311be146900d504b5d6f37a5","after":"b903979543526edbcde7b9fa512dbe3244f494e9","ref":"refs/heads/main","pushedAt":"2024-07-10T15:22:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"address reviewer comments (#2815)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2815\n\n# context\n* use `at::parallel_for` to run a parallel threading in CPU kernel\n* somehow the results are wrong.\n```\n(Pdb) outputs[0]\ntensor([[-6.4628e-01, -1.9817e+00, -8.4945e-01, 1.3860e+00, 7.4463e-01,\n 1.3079e-01, 8.5881e-01, -7.4804e-01],\n [-2.3989e-01, 1.2933e+00, 1.3789e+00, -1.9305e+00, -5.7734e-01,\n -4.5220e-01, -1.3703e+00, -1.9221e+00],\n [ 1.2582e+00, 1.2426e+00, 2.6749e-01, 6.8250e-01, 7.0065e-45,\n 0.0000e+00, 0.0000e+00, 0.0000e+00],\n [-4.5382e-02, -9.4207e-01, 7.1254e-01, 7.8096e-01, -1.3482e+00,\n -1.2763e+00, 4.2996e-01, -8.9042e-01],\n [-5.8892e-02, 1.1909e+00, -1.4653e+00, 0.0000e+00, 0.0000e+00,\n 0.0000e+00, 0.0000e+00, 0.0000e+00],\n [ 1.0306e-01, 2.2235e-01, 1.6044e+00, -3.0457e-01, 1.6609e+00,\n 4.1478e-43, 0.0000e+00, 0.0000e+00]],\n grad_fn=>)\n(Pdb) refs[0]\ntensor([[-0.6463, -1.9817, -0.8494, 0.8588, -0.7480, -1.0166, 1.3860, 0.7446],\n [ 1.3572, 0.0403, -0.2078, -0.8592, 0.4000, 1.0562, -0.2399, 1.2933],\n [-0.8118, -0.8703, -0.1429, -0.0802, 0.2706, -0.6728, 1.2582, 1.2426],\n [ 0.0026, -1.3482, -1.2763, 0.9094, 1.2502, 0.5035, -0.0454, -0.9421],\n [-1.4653, 0.8384, -0.3290, -1.2008, -0.4272, -1.0376, 1.0920, 0.2197],\n [ 1.4819, 0.1565, -0.1601, -0.8323, -0.0130, 0.4165, 0.1031, 0.2223]],\n grad_fn=)\n```\n\nReviewed By: sryap\n\nDifferential Revision: D38300272\n\nfbshipit-source-id: 74546cf05b619ce8175915d21ba330fbfe7bd513","shortMessageHtmlLink":"address reviewer comments (#2815)"}},{"before":"9fe74438c8fe320202a68deef6ac56497be047bc","after":"2d32fcf590606eb696f107cc98eb012df305fb50","ref":"refs/heads/nightly","pushedAt":"2024-07-10T11:35:24.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pytorchbot","name":null,"path":"/pytorchbot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/21957446?s=80&v=4"},"commit":{"message":"2024-07-10 nightly release (1b63049cc804f0dd311be146900d504b5d6f37a5)","shortMessageHtmlLink":"2024-07-10 nightly release (1b63049)"}},{"before":"9f4622415c960e0fb541977fc34fe1663a13aa0a","after":"5947b1968fa74d352556f0ba2f6dd075e3a36e4b","ref":"refs/heads/gh-pages","pushedAt":"2024-07-10T00:29:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@1b63049cc804f0dd311be146900d504b5d6f37a5 πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ 1b63049 πŸš€"}},{"before":"9f786a1df7d3d63ca5f28135aa58040d68eceeed","after":"1b63049cc804f0dd311be146900d504b5d6f37a5","ref":"refs/heads/main","pushedAt":"2024-07-10T00:18:53.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"clean up unused permute_duplicate_pooled_embs (#2811)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2811\n\n# context\n* this `permute_duplicate_pooled_embs` is never used in prod\n* it doesn't support backward\n* there is another op can provide better feature\n* it was introduced by D48090591\n* it adds burden to maintain this op in the codebase.\n\nReviewed By: AGZain\n\nDifferential Revision: D54917144\n\nfbshipit-source-id: 3c77ad732560b3820dc50bfa45ad9aa015b3f83d","shortMessageHtmlLink":"clean up unused permute_duplicate_pooled_embs (#2811)"}},{"before":null,"after":"416d6a29ad27bfe030d751789b5d488e2d092258","ref":"refs/heads/export-D38300272","pushedAt":"2024-07-09T22:12:08.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"address reviewer comments\n\nSummary:\n# context\n* use `at::parallel_for` to run a parallel threading in CPU kernel\n* somehow the results are wrong.\n```\n(Pdb) outputs[0]\ntensor([[-6.4628e-01, -1.9817e+00, -8.4945e-01, 1.3860e+00, 7.4463e-01,\n 1.3079e-01, 8.5881e-01, -7.4804e-01],\n [-2.3989e-01, 1.2933e+00, 1.3789e+00, -1.9305e+00, -5.7734e-01,\n -4.5220e-01, -1.3703e+00, -1.9221e+00],\n [ 1.2582e+00, 1.2426e+00, 2.6749e-01, 6.8250e-01, 7.0065e-45,\n 0.0000e+00, 0.0000e+00, 0.0000e+00],\n [-4.5382e-02, -9.4207e-01, 7.1254e-01, 7.8096e-01, -1.3482e+00,\n -1.2763e+00, 4.2996e-01, -8.9042e-01],\n [-5.8892e-02, 1.1909e+00, -1.4653e+00, 0.0000e+00, 0.0000e+00,\n 0.0000e+00, 0.0000e+00, 0.0000e+00],\n [ 1.0306e-01, 2.2235e-01, 1.6044e+00, -3.0457e-01, 1.6609e+00,\n 4.1478e-43, 0.0000e+00, 0.0000e+00]],\n grad_fn=>)\n(Pdb) refs[0]\ntensor([[-0.6463, -1.9817, -0.8494, 0.8588, -0.7480, -1.0166, 1.3860, 0.7446],\n [ 1.3572, 0.0403, -0.2078, -0.8592, 0.4000, 1.0562, -0.2399, 1.2933],\n [-0.8118, -0.8703, -0.1429, -0.0802, 0.2706, -0.6728, 1.2582, 1.2426],\n [ 0.0026, -1.3482, -1.2763, 0.9094, 1.2502, 0.5035, -0.0454, -0.9421],\n [-1.4653, 0.8384, -0.3290, -1.2008, -0.4272, -1.0376, 1.0920, 0.2197],\n [ 1.4819, 0.1565, -0.1601, -0.8323, -0.0130, 0.4165, 0.1031, 0.2223]],\n grad_fn=)\n```\n\nReviewed By: sryap\n\nDifferential Revision: D38300272","shortMessageHtmlLink":"address reviewer comments"}},{"before":"e0f82ea81bc641c4d2b586632809350bdfe1ba27","after":"9f4622415c960e0fb541977fc34fe1663a13aa0a","ref":"refs/heads/gh-pages","pushedAt":"2024-07-09T22:09:14.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@9f786a1df7d3d63ca5f28135aa58040d68eceeed πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ 9f786a1 πŸš€"}},{"before":"dcd2af503a60cc6c97a61ed108b8b0abddaed6ba","after":"9f786a1df7d3d63ca5f28135aa58040d68eceeed","ref":"refs/heads/main","pushedAt":"2024-07-09T21:59:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Break up `fbgemm_cuda_utils.cuh`, pt 9 (#2812)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2812\n\nX-link: https://github.com/facebookresearch/FBGEMM/pull/18\n\n- Break up `fbgemm_cuda_utils.cuh`, pt 9\n\nReviewed By: jianyuh\n\nDifferential Revision: D59497725\n\nfbshipit-source-id: 052011c7e56f24c1de810ab9b0722acfd3f38423","shortMessageHtmlLink":"Break up fbgemm_cuda_utils.cuh, pt 9 (#2812)"}},{"before":"51224100f4bd1c4e7b297b5c9bd21fe8785e75aa","after":"e0f82ea81bc641c4d2b586632809350bdfe1ba27","ref":"refs/heads/gh-pages","pushedAt":"2024-07-09T21:25:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@dcd2af503a60cc6c97a61ed108b8b0abddaed6ba πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ dcd2af5 πŸš€"}},{"before":"f8021eea2bb3da9baac31a45d16775368b876223","after":"dcd2af503a60cc6c97a61ed108b8b0abddaed6ba","ref":"refs/heads/main","pushedAt":"2024-07-09T21:15:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add missing Pyre mode headers] [batch:27/308] [shard:5/N] (#2809)\n\nSummary:\nX-link: https://github.com/facebookresearch/FBGEMM/pull/16\n\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2809\n\nReviewed By: MaggieMoss\n\nDifferential Revision: D59386593\n\nfbshipit-source-id: 9f729ca06b6738703d916225f4b7c89c484bf97a","shortMessageHtmlLink":"Add missing Pyre mode headers] [batch:27/308] [shard:5/N] (#2809)"}},{"before":"ace129882efcd2ce8f60f0fc06ab77da4013f41f","after":"51224100f4bd1c4e7b297b5c9bd21fe8785e75aa","ref":"refs/heads/gh-pages","pushedAt":"2024-07-09T20:21:13.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@f8021eea2bb3da9baac31a45d16775368b876223 πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ f8021ee πŸš€"}},{"before":"87cfbdff45ac661f9d607cf63a39d3e0e0124f86","after":"f8021eea2bb3da9baac31a45d16775368b876223","ref":"refs/heads/main","pushedAt":"2024-07-09T20:11:41.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"benchmark of fbgemm op - permute_multi_embedding (#2771)\n\nSummary:\nX-link: https://github.com/pytorch/torchrec/pull/2158\n\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2771\n\n# context\n* added both **op-level** and **fn-level** benchmarks for the KT.regroup implementations\n* analyze the op-level and fn-level performance in runtime and memory usage\n* findings are that:\n**a**. In the fn-level performance, the `permute_multi_embedding` (new op) outperforms both the native-pytorch implementation and the `permute_pooled_embs_auto_grad` (current Prod) by 50% GPU runtime and 33% memory usage\n**b**. In the op-level performance, the new op is slightly slower than the current prod (by ~5% GPU runtime)\n* conclusion: **we should use the new op**\n\n# other considerations\nThe good:\n1. the algorithm is designed in a way that it doesn't need to know in advance whether the 1-to-N mapping exists in the permutes.\n2. `_all_keys_used_once` is no longer needed\n3. no longer need a torch.cat before calling the old operator\n4. no need to use `_pin_and_move` for the meta data (arguments), it will be handled inside the operator, it's more friendly to tracing.\n5. no longer need to fallback to native-pytorch implementation when duplicates existed\n\nThe same bad:\n1. it requires several HtoD communications (move tensor to device):\na) [resolved] 3 tensors, which are `permutes`, `input_lengths`, and `output_lengths`. Those tensors needs to be on the device so that the cuda kernels has access to it.\nb) [resolved] 2 lists of (scalar_t*) pointers, input and output tensor lists.\nc) [resolved] Didn't find a good way to let the kernel knows the address of the lists of input/output tensors, because the lists are also need to be on the device.\n2. tensor.contiguous for the backward function, it looks like the grad from the backward are somehow not contiguous.\n\n# benchmark\n* op-level results: new op is ~5% slower in GPU runtime\n```\nINFO:root:size: 1024 x 136896; permute_multi_embedding: 2.25 ms; permute_pooled_embs: 2.15 ms; delta: 4.5%\nINFO:root:size: 1024 x 108432; permute_multi_embedding: 1.79 ms; permute_pooled_embs: 1.7 ms; delta: 5.3%\nINFO:root:size: 1024 x 277232; permute_multi_embedding: 4.54 ms; permute_pooled_embs: 4.37 ms; delta: 3.9%\nINFO:root:size: 1024 x 244352; permute_multi_embedding: 4.01 ms; permute_pooled_embs: 3.83 ms; delta: 4.9%\nINFO:root:size: 1024 x 524224; permute_multi_embedding: 8.62 ms; permute_pooled_embs: 8.25 ms; delta: 4.5%\nINFO:root:size: 1024 x 564080; permute_multi_embedding: 9.27 ms; permute_pooled_embs: 8.92 ms; delta: 3.9%\n```\n* fn-level results: new op is 50%+ faster in GPU runtime and uses 33% fewer GPU memory\n```\n _regroup_keyed_tenors | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 2.8 ms | Memory (P90): 1011.0\n KeyedTensor.regroup | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 5.0 ms | Memory (P90): 1517.0\n KTRegroupAsDict | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 4.9 ms | Memory (P90): 1517.0\n permute_multi_embs | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 2.2 ms | Memory (P90): 1011.0\n _regroup_keyed_tenors_dup | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 2.5 ms | Memory (P90): 1011.0\n KeyedTensor.regroup_dup | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 2.5 ms | Memory (P90): 1011.0\n KTRegroupAsDict_dup | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 2.5 ms | Memory (P90): 1011.0\n permute_multi_embs_dup | B: 1024 | F: 1020 | device: cuda | Runtime (P90): 3.2 ms | Memory (P90): 1011.0\n```\n\n# traces\n* [files](https://drive.google.com/drive/folders/1_9hOtQUQeFICBVxQtusvpQ_VajduFUmR?usp=sharing)\n```\n[hhy@50836.od /data/sandcastle/boxes/fbsource (ae677c240)]$ ll *.json\n-rw-rw-r-- 1 hhy hhy 8062993 Jun 21 23:26 trace-KeyedTensor.regroup_dup.json\n-rw-rw-r-- 1 hhy hhy 949610 Jun 21 23:26 trace-KeyedTensor.regroup.json\n-rw-rw-r-- 1 hhy hhy 5140143 Jun 21 23:26 trace-KTRegroupAsDict_dup.json\n-rw-rw-r-- 1 hhy hhy 350370 Jun 21 23:26 trace-KTRegroupAsDict.json\n-rw-rw-r-- 1 hhy hhy 581033 Jun 21 23:26 trace-permute_multi_embs_dup.json\n-rw-rw-r-- 1 hhy hhy 582607 Jun 21 23:26 trace-permute_multi_embs.json\n-rw-rw-r-- 1 hhy hhy 8025337 Jun 21 23:26 trace-_regroup_keyed_tenors_dup.json\n-rw-rw-r-- 1 hhy hhy 8041586 Jun 21 23:26 trace-_regroup_keyed_tenors.json\n```\n* native-pytorch\n {F1713052022}\n* current prod\n {F1713052648}\n* new op\n {F1713052907}\n* runtime\n|Operator|CPU runtime|GPU runtime|GPU memory|notes|\n|---|---|---|---|---|\n|**native-pytorch**|3.9 ms|3.1 ms|1.0 K|CPU-bounded, allow duplicates|\n|**prod op**|2.1 ms|4.9 ms|1.5 K|GPU-boudned due to torch.cat, does **NOT** allow duplicates|\n|**new op**|2.0 ms|2.2 ms|1.0 K|both CPU and GPU runtime outperformed, **ALLOW** duplicates|\n\nReviewed By: dstaay-fb\n\nDifferential Revision: D58906839\n\nfbshipit-source-id: 6cb28ca17daf16943b28af9b074d1032e7079912","shortMessageHtmlLink":"benchmark of fbgemm op - permute_multi_embedding (#2771)"}},{"before":"a1c3117f9d7f0ab1c9d4170c9c3f663d843e88db","after":"ace129882efcd2ce8f60f0fc06ab77da4013f41f","ref":"refs/heads/gh-pages","pushedAt":"2024-07-09T18:19:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@87cfbdff45ac661f9d607cf63a39d3e0e0124f86 πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ 87cfbdf πŸš€"}},{"before":"bbdd76c5d0fbc97ffd948c458777808bb343b789","after":"87cfbdff45ac661f9d607cf63a39d3e0e0124f86","ref":"refs/heads/main","pushedAt":"2024-07-09T18:09:35.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"implementation of fbgemm op - permute_multi_embedding (#2738)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2738\n\nX-link: https://github.com/pytorch/torchrec/pull/2120\n\n# context\n* current we have a working function `permute_pooled_embs_auto_grad` to do a full permute of KTs, including forward and backward\n* it has several limitations:\na) it has to be a full permute, duplicates are not supported;\nb) in the main [use case](https://fburl.com/code/89od0rqm) there has to be a torch.concat on the input KTs, which is not very efficient;\nc) the function output a single KT which requires a split operation\n* there is some attempt to support duplicated outputs, but the backward doesn't work\n* this diff is trying to create a new kernel (named `permute_multi_embedding`) to support a multiple-KT to multiple-KT mapping operation with backward support\n\n# notes\n* this diff focuses on the implemenation and test of the operator\n* performance analysis and benchmark are in the next diff\n\n# operator example usage\n* used in python\n```\n# test inputs: 3 KTs with batch_size=2048\nbatch_size = 2048\nkeys = [[\"f1\", \"f2\"], [\"f3\", \"f4\", \"f5\"], [\"f6\"]]\nlengths = [[96, 256], [512, 128, 768], [1024]]\nvalues = [\n torch.randn(batch_size, sum(lens), device=\"cuda\", requires_grad=True)\n for lens in lengths\n]\n\n# target outputs: 4 KTs with re-arranged keys (features), duplicates are allowed\ngroups = [[\"f1\", \"f3\"], [\"f2\"], [\"f4\", \"f1\", \"f6\"], [\"f1\", \"f5\"]]\n\n# accessorial arguments to the op/kernel\npermutes, in_lengths, out_lengths = _multi_remap_to_groups(\n keys, lengths, groups\n)\n\n# arguments\noutputs = torch.ops.fbgemm.permute_multi_embedding_internal_testing(\n values, permutes, in_lengths, out_lengths\n)\n```\n* permutes\n```\n# each row represents a key (feature) permute move, which consists of the following parameters:\n# [input_tensor_idx, output_tensor_idx, input_key_idx, output_key_idx, key_length, magic_jump]\npermutes = tensor(\n [\n [0, 0, 0, 0, 3, 4], # f1\n [1, 0, 0, 3, 5, 0], # f3\n [0, 1, 3, 0, 4, 0], # f2\n [1, 2, 5, 0, 6, 0], # f4\n [0, 2, 0, 6, 3, -6], # f1\n [2, 2, 0, 9, 8, 0], # f6\n [0, 3, 0, 0, 3, -8], # f1\n [1, 3, 11, 3, 7, 0], # f5\n ]\n)\n```\n\n# details\n1. from the above example usage, we can clearly see that the operatior takes in the following:\na) values: List[torch.Tensor], which represents the input KTs\nb) permutes: torch.Tensor, which contains the permute information, will be explained later\nc) output_lengths_list: List[int], the lengths of the output tensors (KTs), which is needed to allocate memory on device ahead\nd) in_lengths: torch.Tensor, lengths of input tensors, which is on device\ne) out_lengths: torch.Tensor, lengths of output tensors, which is on device\n2. the operator returns a list of tensors, which represents the permuted KTs\n3. `permute` is the most critical argument in this operator:\na) 2-D tensor\nb) each row represents a key (feature) permute move\nc) a permute move = [input_tensor_id, output_tensor_id, input_start_idx, output_start_idx, feature_length, jump]\nd) jump is used in backward when a key (feature) from the input tensor is mapped to multiple places in the output tensors\n4. The magic_jump\na) It's only used in the backward computation\nb) it's usually 0, means no jump\nc) it's non-zero when there is a duplicate in the permute, e.g., the same feature appears more than once in the output\nd) the `magic_jump` is the next index of the very same feature in the permute sequence with some modifications\ne) modification-1: `magic_jump` is positive when it's the first of its kind [Start]\nf) modification-2: `magic_jump` is negative when it's not the first of its kind [Continue]\ng) modification-3: `magic_jump` is the negative value of the length of the permute sequence when it's the last of its kind. [Stop]\n\nReviewed By: sryap\n\nDifferential Revision: D57055616\n\nfbshipit-source-id: 16673d3a2eafab93b08d4ff3c43d54366966064a","shortMessageHtmlLink":"implementation of fbgemm op - permute_multi_embedding (#2738)"}},{"before":"099742c4a0bbfb64f3c654abd7f4792755e12252","after":"9fe74438c8fe320202a68deef6ac56497be047bc","ref":"refs/heads/nightly","pushedAt":"2024-07-09T11:35:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"pytorchbot","name":null,"path":"/pytorchbot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/21957446?s=80&v=4"},"commit":{"message":"2024-07-09 nightly release (bbdd76c5d0fbc97ffd948c458777808bb343b789)","shortMessageHtmlLink":"2024-07-09 nightly release (bbdd76c)"}},{"before":"0ddec8d7c7f71a8fcce51a9f062e1d084ef2c396","after":"a1c3117f9d7f0ab1c9d4170c9c3f663d843e88db","ref":"refs/heads/gh-pages","pushedAt":"2024-07-08T23:23:03.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"github-actions[bot]","name":null,"path":"/apps/github-actions","primaryAvatarUrl":"https://avatars.githubusercontent.com/in/15368?s=80&v=4"},"commit":{"message":"Deploying to gh-pages from @ pytorch/FBGEMM@bbdd76c5d0fbc97ffd948c458777808bb343b789 πŸš€","shortMessageHtmlLink":"Deploying to gh-pages from @ bbdd76c πŸš€"}},{"before":"24e6f964a1f4244bd9ddf450d7b87833d9b23fb1","after":"bbdd76c5d0fbc97ffd948c458777808bb343b789","ref":"refs/heads/main","pushedAt":"2024-07-08T23:13:09.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Break up `fbgemm_cuda_utils.cuh`, pt 8 (#2807)\n\nSummary:\nX-link: https://github.com/facebookresearch/FBGEMM/pull/14\n\nPull Request resolved: https://github.com/pytorch/FBGEMM/pull/2807\n\n- Break up `fbgemm_cuda_utils.cuh`, pt 8\n\nReviewed By: spcyppt\n\nDifferential Revision: D59412344\n\nfbshipit-source-id: d9acf70a666316e8b0d28726bb147502769313b1","shortMessageHtmlLink":"Break up fbgemm_cuda_utils.cuh, pt 8 (#2807)"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEfRoi4AA","startCursor":null,"endCursor":null}},"title":"Activity Β· pytorch/FBGEMM"}