{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":92541759,"defaultBranch":"main","name":"benchmark","ownerLogin":"pytorch","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2017-05-26T19:21:12.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/21003710?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1724954658.0","currentOid":""},"activityList":{"items":[{"before":"1f4461fb5cc86a7c1f337a929be336567f645ab4","after":"49bcc381e4f18a2f1deeacc8800d4793381acf06","ref":"refs/heads/main","pushedAt":"2024-08-30T16:28:56.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"move loop ordering after fusion (#126254)\n\nSummary:\nRestart the work from PR https://github.com/pytorch/pytorch/pull/100331 in this new PR since it's hard to rebase. It would be expected that some code is copy/pasted from the previous PR and main idea is the same.\n\nPreviously we see relatively large compilation time increase due to too many loop orders being considered. This PR tries to continue the work by doing pruning and only considering loop orders that we know for sure are relevant (i.e. do it on demand).\n\nSome manually created cases that loop ordering matters are added as unit tests. The PR can make sure inductor does not miss fusion opportunities for them.\n\nThis PR should solve the not-able to fusion problem in https://github.com/pytorch/pytorch/issues/130015\n\nRight now there is still significant increase of compilation time. I'll disable the feature by default. Later on after the compilation time issue is resolved, I'll enable it by default.\n\nX-link: https://github.com/pytorch/pytorch/pull/126254\nApproved by: https://github.com/jansel\n\nReviewed By: ZainRizvi\n\nDifferential Revision: D62008970\n\nPulled By: shunting314\n\nfbshipit-source-id: ce4c7c7003b93a2faccd2c65d78eeee0300b6bff","shortMessageHtmlLink":"move loop ordering after fusion (#126254)"}},{"before":"e6251f1838701030872e276f94987b7a24a3d6c8","after":"1f4461fb5cc86a7c1f337a929be336567f645ab4","ref":"refs/heads/main","pushedAt":"2024-08-30T03:54:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Switch benchmarking to use export non-strict mode (#130977)\n\nSummary:\nSwitch the export part used by AOTInductor benchmarking from strict to non-strict, and switch it from producing torch IR to aten IR.\n\nX-link: https://github.com/pytorch/pytorch/pull/130977\nApproved by: https://github.com/angelayi\nghstack dependencies: #134639\n\nReviewed By: ZainRizvi\n\nDifferential Revision: D62009176\n\nPulled By: desertfire\n\nfbshipit-source-id: 72cb7b2f4ef8297537a932f30010ebcb46e12935","shortMessageHtmlLink":"Switch benchmarking to use export non-strict mode (#130977)"}},{"before":"97ea55763192f7a46e1b7bee0ce18a5743244dea","after":"3bbd1919e00ce4c468a4d3275a2fb5549cd7cdc9","ref":"refs/heads/atalman/r2.4.0","pushedAt":"2024-08-29T20:08:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"atalman","name":"Andrey Talman","path":"/atalman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7563158?s=80&v=4"},"commit":{"message":"Rename 2.4.1.yml to 2.4.1.yaml","shortMessageHtmlLink":"Rename 2.4.1.yml to 2.4.1.yaml"}},{"before":"4babfce3e51a9d49082629be43d682e7d5a0fc42","after":"97ea55763192f7a46e1b7bee0ce18a5743244dea","ref":"refs/heads/atalman/r2.4.0","pushedAt":"2024-08-29T19:23:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"atalman","name":"Andrey Talman","path":"/atalman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7563158?s=80&v=4"},"commit":{"message":"Update setup_env.sh","shortMessageHtmlLink":"Update setup_env.sh"}},{"before":"f8643750417f72bd58c56773f91b8ea527e939e8","after":"4babfce3e51a9d49082629be43d682e7d5a0fc42","ref":"refs/heads/atalman/r2.4.0","pushedAt":"2024-08-29T19:17:52.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"atalman","name":"Andrey Talman","path":"/atalman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7563158?s=80&v=4"},"commit":{"message":"Create 2.4.1.yml","shortMessageHtmlLink":"Create 2.4.1.yml"}},{"before":null,"after":"e6251f1838701030872e276f94987b7a24a3d6c8","ref":"refs/heads/atalman/r2.4.1","pushedAt":"2024-08-29T18:04:18.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"atalman","name":"Andrey Talman","path":"/atalman","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7563158?s=80&v=4"},"commit":{"message":"Enable flash_v3 backward (#2445)\n\nSummary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2445\n\nReviewed By: xuzhao9\n\nDifferential Revision: D61924864\n\nPulled By: bertmaher\n\nfbshipit-source-id: 760036820c1196a921eaff4d99bf8647e25264ee","shortMessageHtmlLink":"Enable flash_v3 backward (#2445)"}},{"before":"d87351433ef54b23a6cc8490189664192abc23aa","after":"e6251f1838701030872e276f94987b7a24a3d6c8","ref":"refs/heads/main","pushedAt":"2024-08-28T20:12:44.000Z","pushType":"push","commitsCount":3,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Enable flash_v3 backward (#2445)\n\nSummary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2445\n\nReviewed By: xuzhao9\n\nDifferential Revision: D61924864\n\nPulled By: bertmaher\n\nfbshipit-source-id: 760036820c1196a921eaff4d99bf8647e25264ee","shortMessageHtmlLink":"Enable flash_v3 backward (#2445)"}},{"before":"ae902e4c06efbe9f338454e99106030ebdce7e2d","after":"d87351433ef54b23a6cc8490189664192abc23aa","ref":"refs/heads/main","pushedAt":"2024-08-28T15:33:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add compile time instruction count metric (#133834)\n\nSummary:\nPYTHONPATH=$(pwd) python benchmarks/update_hint_benchmark.py out\nas of this diff, compile_time_instruction_count counts the number of instruction from within\nconvert_frame.compile_inner\n```\nupdate_hint_regression,compile_time_instruction_count,10522459165\n```\n will add result from CI once populated.\n\nX-link: https://github.com/pytorch/pytorch/pull/133834\nApproved by: https://github.com/aorenste\n\nReviewed By: ZainRizvi\n\nDifferential Revision: D61897875\n\nPulled By: laithsakka\n\nfbshipit-source-id: 4588bc94c81b1b2df970962231c3ca72aa03a5a0","shortMessageHtmlLink":"Add compile time instruction count metric (#133834)"}},{"before":"b0ae45a81a5deff36682e78d8006fa34b4f9422c","after":"ae902e4c06efbe9f338454e99106030ebdce7e2d","ref":"refs/heads/main","pushedAt":"2024-08-27T09:15:18.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"fix test_functional_call_sequential_params_and_buffers expectation on Windows (#134394)\n\nSummary:\nThis UT actual code only one empty line wrap difference(`linear` and `add`) between Windows/Linux, and the context is right.\nReproduce UTs:\n```cmd\npytest test\\dynamo\\test_higher_order_ops.py -v -k test_functional_call_sequential_params_and_buffers\n```\n\nWe can add `empty_line_normalizer` to fix it.\n\n```cmd\n______________________________________________________________________________________________ FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers _______________________________________________________________________________________________\nTraceback (most recent call last):\n File \"D:\\xu_git\\dnnl_cb\\pytorch\\test\\dynamo\\test_higher_order_ops.py\", line 3676, in test_functional_call_sequential_params_and_buffers\n self.assertExpectedInline(\n File \"C:\\Users\\Xuhan\\.conda\\envs\\win_mkl_static\\lib\\site-packages\\torch\\testing\\_internal\\common_utils.py\", line 2871, in assertExpectedInline\n return super().assertExpectedInline(actual if isinstance(actual, str) else str(actual), expect, skip + 1)\n File \"C:\\Users\\Xuhan\\.conda\\envs\\win_mkl_static\\lib\\site-packages\\expecttest\\__init__.py\", line 271, in assertExpectedInline\n self.assertMultiLineEqualMaybeCppStack(expect, actual, msg=help_text)\n File \"C:\\Users\\Xuhan\\.conda\\envs\\win_mkl_static\\lib\\site-packages\\expecttest\\__init__.py\", line 292, in assertMultiLineEqualMaybeCppStack\n self.assertMultiLineEqual(expect, actual, *args, **kwargs)\n File \"C:\\Users\\Xuhan\\.conda\\envs\\win_mkl_static\\lib\\unittest\\case.py\", line 1226, in assertMultiLineEqual\n self.fail(self._formatMessage(msg, standardMsg))\n File \"C:\\Users\\Xuhan\\.conda\\envs\\win_mkl_static\\lib\\unittest\\case.py\", line 675, in fail\n raise self.failureException(msg)\nAssertionError: 'clas[509 chars]one\\n add: \"f32[1, 1]\" = linear + l_buf[69 chars],)\\n' != 'clas[509 chars]one\\n\\n add: \"f32[1, 1]\" = linear + l_b[71 chars],)\\n'\n class GraphModule(torch.nn.Module):\n def forward(self, L_params_l1_weight_: \"f32[1, 1]\", L_params_l1_bias_: \"f32[1]\", L_buffers_buffer_: \"f32[1]\", L_inputs_: \"f32[1, 1]\"):\n l_params_l1_weight_ = L_params_l1_weight_\n l_params_l1_bias_ = L_params_l1_bias_\n l_buffers_buffer_ = L_buffers_buffer_\n l_inputs_ = L_inputs_\n\n linear: \"f32[1, 1]\" = torch._C._nn.linear(l_inputs_, l_params_l1_weight_, l_params_l1_bias_); l_inputs_ = l_params_l1_weight_ = l_params_l1_bias_ = None\n+ <<<< (difference is here )\n add: \"f32[1, 1]\" = linear + l_buffers_buffer_; linear = l_buffers_buffer_ = None\n return (add,)\n : To accept the new output, re-run test with envvar EXPECTTEST_ACCEPT=1 (we recommend staging/committing your changes before doing this)\n\nTo execute this test, run the following from the base repo dir:\n python test\\dynamo\\test_higher_order_ops.py FuncTorchHigherOrderOpTests.test_functional_call_sequential_params_and_buffers\n\nThis message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0\n========================================================================================================================== short test summary info ==========================================================================================================================\nFAILED [0.4275s] test/dynamo/test_higher_order_ops.py::FuncTorchHigherOrderOpTests::test_functional_call_sequential_params_and_buffers - AssertionError: 'clas[509 chars]one\\n add: \"f32[1, 1]\" = linear + l_buf[69 chars],)\\n' != 'clas[509 chars]one\\n\\n add: \"f32[1, 1]\" = linear + l_b[71 chars],)\\n'\n```\n\nX-link: https://github.com/pytorch/pytorch/pull/134394\nApproved by: https://github.com/jansel\n\nReviewed By: ZainRizvi\n\nDifferential Revision: D61829624\n\nfbshipit-source-id: b44a2bd36bf64601263e901e0384a8a45fab6e7a\n\nCo-authored-by: Jason Ansel ","shortMessageHtmlLink":"fix test_functional_call_sequential_params_and_buffers expectation on…"}},{"before":"16d2a49869faddeab082f497fcf4b3b1d4d3c696","after":"1597e0a1ab34b8fa07c8134b9d7bb28fbe3d1d72","ref":"refs/heads/export-D61819148","pushedAt":"2024-08-26T21:30:02.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"adding vector add to CI (#2440)\n\nSummary:\n\nnothing\n\nDifferential Revision: D61819148","shortMessageHtmlLink":"adding vector add to CI (#2440)"}},{"before":null,"after":"16d2a49869faddeab082f497fcf4b3b1d4d3c696","ref":"refs/heads/export-D61819148","pushedAt":"2024-08-26T21:26:30.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"adding vector add to CI\n\nSummary: nothing\n\nDifferential Revision: D61819148","shortMessageHtmlLink":"adding vector add to CI"}},{"before":"babb1285ba0794e8923de581f5dac488692805b8","after":"b0ae45a81a5deff36682e78d8006fa34b4f9422c","ref":"refs/heads/main","pushedAt":"2024-08-26T21:21:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"fixing OSS ci for tritonbench (#2439)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2439\n\ntwo problems found so far in CI runs\n1. the ci file was not added to the target deps so the ci file couldnt be found even if the flag was used\n2. `--op` was required so we fixed that here where `--op` is only required if `--ci` is not used\n\nReviewed By: danzimm, xuzhao9\n\nDifferential Revision: D61809602\n\nfbshipit-source-id: e84844f0b6a697892222a008f45a8c20db8ef6bd","shortMessageHtmlLink":"fixing OSS ci for tritonbench (#2439)"}},{"before":null,"after":"e0cf5efa2bc67dadf7b4e5faf5f6b57f62e71435","ref":"refs/heads/export-D61809602","pushedAt":"2024-08-26T19:01:17.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"fixing OSS ci for tritonbench\n\nSummary:\ntwo problems found so far in CI runs\n1. the ci file was not added to the target deps so the ci file couldnt be found even if the flag was used\n2. `--op` was required so we fixed that here where `--op` is only required if `--ci` is not used\n\nReviewed By: danzimm\n\nDifferential Revision: D61809602","shortMessageHtmlLink":"fixing OSS ci for tritonbench"}},{"before":"52103b5e969f1ddafb239673278f9115b184b204","after":"babb1285ba0794e8923de581f5dac488692805b8","ref":"refs/heads/main","pushedAt":"2024-08-26T15:41:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"PR#4179 (#2435)\n\nSummary:\nX-link: https://github.com/facebookresearch/FBGEMM/pull/124\n\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2435\n\nX-link: https://github.com/pytorch/FBGEMM/pull/3027\n\nThis PR is a dependency of the grid_constant PR. The API for TMA descriptor fill methods was changed, so I fixed up all usages in fbcode.\n\nhttps://github.com/triton-lang/triton/pull/4179\n\nReviewed By: minjang\n\nDifferential Revision: D61729239\n\nfbshipit-source-id: 8ce25b7c230c3f4ad960f76aa0dd29626c8ee4d2","shortMessageHtmlLink":"PR#4179 (#2435)"}},{"before":"ef3ce35ed656f542721969417d5278a3f1105d74","after":"52103b5e969f1ddafb239673278f9115b184b204","ref":"refs/heads/main","pushedAt":"2024-08-24T03:00:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Install the benchmark before running. (#2436)\n\nSummary:\nIn workflows like https://github.com/pytorch/benchmark/actions/runs/10513238627/job/29128387392, we are hitting errors like `No module named 'torchbenchmark.util.framework.fb'`, this is because we directly run the benchmark without installing it.\n\nWe need to first install the benchmark before running the models.\n\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2436\n\nTest Plan: https://github.com/pytorch/benchmark/actions/runs/10533763643\n\nReviewed By: kit1980\n\nDifferential Revision: D61748902\n\nPulled By: xuzhao9\n\nfbshipit-source-id: ae1661691881766b5899d34cf4de8045d1d6b4ed","shortMessageHtmlLink":"Install the benchmark before running. (#2436)"}},{"before":null,"after":"8439da56824597c63b2c18375f410115522fe426","ref":"refs/heads/xz9/add-install","pushedAt":"2024-08-23T23:38:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"xuzhao9","name":"Xu Zhao","path":"/xuzhao9","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/502017?s=80&v=4"},"commit":{"message":"Install the benchmark before running.","shortMessageHtmlLink":"Install the benchmark before running."}},{"before":"f4f133065d1634f9fc2066375240a6b2ff393e68","after":"ef3ce35ed656f542721969417d5278a3f1105d74","ref":"refs/heads/main","pushedAt":"2024-08-23T22:02:05.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Enable cublas for FP8 rowwise\n\nSummary: Enabling cublas for FP8 rowwise. Also adding support for fp8_fast_accum.\n\nReviewed By: xuzhao9\n\nDifferential Revision: D61735547\n\nfbshipit-source-id: d93f1a7c5867fe728088637d4fd4fea592ff3ee8","shortMessageHtmlLink":"Enable cublas for FP8 rowwise"}},{"before":"0968f5ed55a7e15c34f36945985628bcaf295b83","after":"f4f133065d1634f9fc2066375240a6b2ff393e68","ref":"refs/heads/main","pushedAt":"2024-08-23T20:45:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"typing for decorators - fx/_compatibility (part 1) (#134202)\n\nSummary:\nPart of #134054.\n\nThis corresponds to the pytorch mypy changes from D61493706. Updating takes so\nlong and touches so many files that it's impossible to land as a whole without conflicting with some other intermediate change.\nSo landing these 'type: ignore' for pytorch in advance of them actually being needed.\n\nX-link: https://github.com/pytorch/pytorch/pull/134202\nApproved by: https://github.com/Skylion007\n\nReviewed By: jeanschmidt\n\nDifferential Revision: D61718215\n\nPulled By: aorenste\n\nfbshipit-source-id: 22dde0aebaff2d9356bc03cf5d0fa46c2daca05a","shortMessageHtmlLink":"typing for decorators - fx/_compatibility (part 1) (#134202)"}},{"before":"ea7a2d35052fd5fd1884ea90c1e9efc285bee3f6","after":"0968f5ed55a7e15c34f36945985628bcaf295b83","ref":"refs/heads/main","pushedAt":"2024-08-23T18:11:40.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Always emit end events even on failure, use thread local storage for stack (#2432)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2432\n\nX-link: https://github.com/pytorch/pytorch/pull/134279\n\nWe should always emit an end event in a finally block so that if a unit test or job fails, the stack is still correct.\n\nAlso, we use thread local storage for the stack, so that in multithreaded scenarios the stack will still be correctly added.\n\nReviewed By: laithsakka\n\nDifferential Revision: D61682556\n\nfbshipit-source-id: ece87c4c198233f439d5af5678b56de7350b116e","shortMessageHtmlLink":"Always emit end events even on failure, use thread local storage for …"}},{"before":"41cf189a71cd39b5b67f6bd596f0c99267fcecf6","after":"ea7a2d35052fd5fd1884ea90c1e9efc285bee3f6","ref":"refs/heads/main","pushedAt":"2024-08-23T15:44:38.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"fix PR #2428 (#2433)\n\nSummary:\nhttps://github.com/pytorch/benchmark/issues/2428 was code reviewed internally on fbsource, and was not tested on OSS. This PR fixes the changes during code review. Before importing to Phabricator, it has been code reviewed on GitHub and tested locally.\n\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2433\n\nReviewed By: embg\n\nDifferential Revision: D61717464\n\nPulled By: sfzhu93\n\nfbshipit-source-id: f75fa3081840baa9079c6fbef9c22baaf469bb7a","shortMessageHtmlLink":"fix PR #2428 (#2433)"}},{"before":"fa08aeb5e193ad55689bb4374994ef7b708f83a2","after":"41cf189a71cd39b5b67f6bd596f0c99267fcecf6","ref":"refs/heads/main","pushedAt":"2024-08-23T07:35:30.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Collect Export pass rate separately (#134076)\n\nSummary:\nCollect Export pass rate separately when running AOTInduction, so that we can have a better isolated signal.\n\nX-link: https://github.com/pytorch/pytorch/pull/134076\nApproved by: https://github.com/angelayi\n\nReviewed By: jeanschmidt\n\nDifferential Revision: D61670555\n\nPulled By: desertfire\n\nfbshipit-source-id: 88d9a5894b95efc5857764990215bddc745b68de","shortMessageHtmlLink":"Collect Export pass rate separately (#134076)"}},{"before":"d445d6d76ec203df06bb1f48ac39874530000168","after":"fa08aeb5e193ad55689bb4374994ef7b708f83a2","ref":"refs/heads/main","pushedAt":"2024-08-22T22:28:08.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"add support for auto-tune TMA grid constant (#2428)\n\nSummary:\nThis PR follows [a recent PR in Triton](https://github.com/triton-lang/triton/pull/4498) that supports passing TMA descriptors by-value using `__grid_constant__`.\n\nIn this PR, we update the kernel `_attn_fwd_inner` to support the above new feature in Triton. To support auto-tune, we implement a helper class that wraps operations for TMA during auto-tune and computations in kernel respectively.\nIn addition, the benchmark program now also checks whether the triton version supports this new feature. If it doesn't, the helper class applies the old way of handling TMA.\n\nThe change has been tested on Triton from the standard installation of pytorch on conda, as well as the recent Triton including the above PR.\n\nCommand for testing and experiment results:\nBefore removing fences: P1541573348\nAfter removing fences: P1541736645\n1) CUDA_VISIBLE_DEVICES=5, old tma: 138.476\n2) CUDA_VISIBLE_DEVICES=5, new tma, with fences: 152 - 164\n3) CUDA_VISIBLE_DEVICES=5, new tma, after removing fences: 168.0\n4) CUDA_VISIBLE_DEVICES=5, no tma: 187.881\n\nThe result is still behind no TMA and we can investigate further.\n\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2428\n\nReviewed By: embg\n\nDifferential Revision: D61668142\n\nPulled By: sfzhu93\n\nfbshipit-source-id: d08bab147c6b2197f73447ee8f30ede877e712ca","shortMessageHtmlLink":"add support for auto-tune TMA grid constant (#2428)"}},{"before":"3d6d97a02289d4252394d810f7579c5310978a94","after":"d445d6d76ec203df06bb1f48ac39874530000168","ref":"refs/heads/main","pushedAt":"2024-08-22T21:01:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Remove non-existing arg from docstring (#2431)\n\nSummary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2431\n\nReviewed By: xuzhao9\n\nDifferential Revision: D61677262\n\nPulled By: kit1980\n\nfbshipit-source-id: f45475517978beb395de6f698d036bf28330d03f","shortMessageHtmlLink":"Remove non-existing arg from docstring (#2431)"}},{"before":null,"after":"bf1a7445408e7ea401dff22d97b1df8d02603e77","ref":"refs/heads/sdym/docstring","pushedAt":"2024-08-22T19:42:59.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"kit1980","name":"Sergii Dymchenko","path":"/kit1980","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/420184?s=80&v=4"},"commit":{"message":"Remove non-existing arg from docstring","shortMessageHtmlLink":"Remove non-existing arg from docstring"}},{"before":"3cd3dd55cd12a2e0fa46d1fe9dc3e800ac72a6a7","after":"3d6d97a02289d4252394d810f7579c5310978a94","ref":"refs/heads/main","pushedAt":"2024-08-22T18:59:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Increase max total number of dynamo partitions to 15 (#134153)\n\nSummary:\nNeeded to be able to split some of the aarch64 workflows to 15 shards\n\nX-link: https://github.com/pytorch/pytorch/pull/134153\nApproved by: https://github.com/seemethere, https://github.com/kit1980, https://github.com/ZainRizvi\n\nReviewed By: jeanschmidt\n\nDifferential Revision: D61670458\n\nPulled By: malfet\n\nfbshipit-source-id: 31361b7f3ad03828019d171936996502d87d71a6","shortMessageHtmlLink":"Increase max total number of dynamo partitions to 15 (#134153)"}},{"before":"b38b6f33a79c08f0b6e4edd859b416dab7406eb4","after":"3cd3dd55cd12a2e0fa46d1fe9dc3e800ac72a6a7","ref":"refs/heads/main","pushedAt":"2024-08-22T17:52:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Mark attri…\n\nSummary:\nShuai wants to test this internally before https://github.com/pytorch/pytorch/pull/133713 can go in. Creating a separate PR for ghmport.\n\ncc rec\n\nX-link: https://github.com/pytorch/pytorch/pull/134136\n\nReviewed By: yanboliang\n\nDifferential Revision: D61612768\n\nPulled By: anijain2305\n\nfbshipit-source-id: 9681ee2215967730446b3abe57fbf0f5ed209968","shortMessageHtmlLink":"Mark attri…"}},{"before":"96200f70221c584668f342b4e457390cc9ce54a2","after":"b38b6f33a79c08f0b6e4edd859b416dab7406eb4","ref":"refs/heads/main","pushedAt":"2024-08-22T14:57:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Log chromium events to scuba (#2429)\n\nSummary:\nPull Request resolved: https://github.com/pytorch/benchmark/pull/2429\n\nX-link: https://github.com/pytorch/pytorch/pull/134118\n\nFor some reason exporting D61392607 is just not working, so I'm resubmitting this.\n\nThis diff implements a bunch of views for internal scuba viewing.\n\nTODOS that I might punt to another diff:\n- Saving cache stats via counter is definitely sus here, but there's not really a good way to track \"fx graph cache hit for this compile phase\" right now. Will think about this more.\n- We should definitely log frame id, compile id, etc\n- We should definitely be logging configs. That way, we can A/B test based on whether a config is turned on.\n- idk what I'm doing with compile_uuid yet, but it's useful when you want to look at samples for a single run. I think if we had mast job info this field is not needed, but it's nice to be able to drill down to a single run and get its chrome trace view or icicle view, so idk\n\nReviewed By: oulgen\n\nDifferential Revision: D61603243\n\nfbshipit-source-id: cf467e75ba16365010f34e9e104039246778c8df","shortMessageHtmlLink":"Log chromium events to scuba (#2429)"}},{"before":"5a121b69d8b932991bb6dd9882e5805718029bdd","after":"96200f70221c584668f342b4e457390cc9ce54a2","ref":"refs/heads/main","pushedAt":"2024-08-21T18:06:32.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Correctly get run_id and run_attempt (#2426)\n\nSummary: Pull Request resolved: https://github.com/pytorch/benchmark/pull/2426\n\nReviewed By: atalman\n\nDifferential Revision: D61606745\n\nPulled By: kit1980\n\nfbshipit-source-id: 7a7ce4bfcef2e4309e07589ed0171f96fb478cf1","shortMessageHtmlLink":"Correctly get run_id and run_attempt (#2426)"}},{"before":"57de51f8dab21a008f410a1e566d34380935dbcb","after":"5a121b69d8b932991bb6dd9882e5805718029bdd","ref":"refs/heads/main","pushedAt":"2024-08-21T17:49:42.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add HSTU apply_rope and batched_multihead_index_sum\n\nSummary: Add apply_rope operator from HSTU\n\nReviewed By: int3\n\nDifferential Revision: D61150155\n\nfbshipit-source-id: 0317c45efc87e00d208ec2ce8749278314bbafad","shortMessageHtmlLink":"Add HSTU apply_rope and batched_multihead_index_sum"}},{"before":"78d22e56366d1f21d9ec58c01f6a680e2e5afa5a","after":"57de51f8dab21a008f410a1e566d34380935dbcb","ref":"refs/heads/main","pushedAt":"2024-08-21T01:02:31.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"facebook-github-bot","name":"Facebook Community Bot","path":"/facebook-github-bot","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6422482?s=80&v=4"},"commit":{"message":"Add fbgemm split_embedding_table benchmark\n\nSummary:\nAdd the nbit_device test from fbgemm split_embedding_table benchmark\n\nSource: https://www.internalfb.com/code/fbsource/fbcode/deeplearning/fbgemm/fbgemm_gpu/bench/split_table_batched_embeddings_benchmark.py\n\nReviewed By: int3\n\nDifferential Revision: D61215046\n\nfbshipit-source-id: cb48557ff5163dd10a84534b0360bfade7f8988c","shortMessageHtmlLink":"Add fbgemm split_embedding_table benchmark"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEqNuNoAA","startCursor":null,"endCursor":null}},"title":"Activity · pytorch/benchmark"}