Write dynamo benchmarks performance result to csv when throw exceptions #126764

WeizhuoZhang-intel · 2024-05-21T08:44:31Z

Performance mode Issue: When dynamo benchmarks performance warm-up failed, the result will be not written into csv file. But the accuracy will be written as fail_to_run even when dynamo pass failed. So the accuracy model number is not aligned with performance model number for each of their csv files.

Fix: The warm-up failed models will be recorded into csv file shown as following:

Accuracy mode issue: detectron2_fasterrcnn_r models failed on accuracy mode, but was tested successfully on performance mode. The accuracy failure is same as PR ee557d8.

Dynamic Shape:
Traceback (most recent call last):
  File "benchmarks/dynamo/torchbench.py", line 449, in <module>
    torchbench_main()
  File "benchmarks/dynamo/torchbench.py", line 445, in torchbench_main
    main(TorchBenchmarkRunner(), original_dir)
  File "/workspace/pytorch/benchmarks/dynamo/common.py", line 3650, in main
    process_entry(0, runner, original_dir, args)
  File "/workspace/pytorch/benchmarks/dynamo/common.py", line 3582, in process_entry
    return run(runner, args, original_dir)
  File "/workspace/pytorch/benchmarks/dynamo/common.py", line 4163, in run
    assert marked, f"nothing in example_inputs had a dim with {batch_size}"
AssertionError: nothing in example_inputs had a dim with 4

Fix: same as PR ee557d8, the batch_size will be skipped to set as 4 when testing dynamic shapes.

Dynamic shapes passrate improved from 89% -> 95%

Comp Item	Compiler	suite	before	After fix
Pass Rate	Inductor	torchbench	89%, 73/82	95%, 79/83

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @mcarilli @ptrblck @leslie-fang-intel @jgong5 @voznesenskym @EikanWang @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire

pytorch-bot · 2024-05-21T08:44:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126764

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit e0c60f9 with merge base 18fdc0a ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 2, 5, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test_foreach.py::TestForeachCUDA::test_binary_op_list_slow_path__foreach_clamp_max_cuda_complex128

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2, unstable) (gh) (#128871)
'test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_pool_multiple_devices'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jansel · 2024-06-04T00:27:24Z

@pytorchbot merge

pytorchmergebot · 2024-06-04T00:29:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-04T00:34:38Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-13-py3-arm64 / build

Details for Dev Infra team

Raised by workflow job

chuanqi129 · 2024-06-04T14:02:58Z

@pytorchbot rebase

pytorchmergebot · 2024-06-04T14:08:45Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-06-04T14:08:48Z

Successfully rebased weizhuoz/fix_dynamo_perf_failure onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout weizhuoz/fix_dynamo_perf_failure && git pull --rebase)

WeizhuoZhang-intel · 2024-06-06T01:23:01Z

@pytorchbot merge

pytorchmergebot · 2024-06-06T01:24:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-06T01:25:07Z

Merge failed

Reason: 2 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

chuanqi129 · 2024-06-06T17:16:14Z

@pytorchbot rebase

pytorchmergebot · 2024-06-06T17:17:54Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-06-06T17:17:57Z

Successfully rebased weizhuoz/fix_dynamo_perf_failure onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout weizhuoz/fix_dynamo_perf_failure && git pull --rebase)

pytorchmergebot · 2024-06-14T05:55:39Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

chuanqi129 · 2024-06-14T06:00:43Z

Hi @atalman seems this UT failure pull / linux-focal-cuda12.1-py3.10-gcc9-sm86 / test (default, 5, 5, linux.g5.4xlarge.nvidia.gpu) is not related with the PR change. Could you please help to double check it?

WeizhuoZhang-intel · 2024-06-17T08:21:49Z

@pytorchbot rebase

pytorchmergebot · 2024-06-17T08:23:13Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-06-17T08:23:16Z

Successfully rebased weizhuoz/fix_dynamo_perf_failure onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout weizhuoz/fix_dynamo_perf_failure && git pull --rebase)

WeizhuoZhang-intel · 2024-06-17T08:25:01Z

@pytorchbot merge

pytorchmergebot · 2024-06-17T08:26:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-17T10:46:38Z

Merge failed

Reason: 12 jobs have failed, first few of them are: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu), inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

WeizhuoZhang-intel · 2024-06-25T01:34:25Z

@pytorchbot rebase

pytorchmergebot · 2024-06-25T01:35:44Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-06-25T01:35:48Z

Successfully rebased weizhuoz/fix_dynamo_perf_failure onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout weizhuoz/fix_dynamo_perf_failure && git pull --rebase)

chuanqi129 · 2024-06-25T17:47:13Z

@pytorchbot merge

pytorchmergebot · 2024-06-25T17:48:47Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ns (#126764) Summary: **Performance mode Issue**: When dynamo benchmarks performance warm-up failed, the result will be not written into csv file. But the accuracy will be written as `fail_to_run` even when dynamo pass failed. So the accuracy model number is not aligned with performance model number for each of their csv files. ![image](https://github.com/pytorch/pytorch/assets/84730719/9043d215-130b-46b4-a835-f148c225947c) - **Fix**: The warm-up failed models will be recorded into csv file shown as following: ![image](https://github.com/pytorch/pytorch/assets/84730719/7907a3c2-c942-42bb-b31c-55424a0e8117) **Accuracy mode issue**: `detectron2_fasterrcnn_r` models failed on accuracy mode, but was tested successfully on performance mode. The accuracy failure is same as PR pytorch/pytorch@ee557d8. ``` Dynamic Shape: Traceback (most recent call last): File "benchmarks/dynamo/torchbench.py", line 449, in <module> torchbench_main() File "benchmarks/dynamo/torchbench.py", line 445, in torchbench_main main(TorchBenchmarkRunner(), original_dir) File "/workspace/pytorch/benchmarks/dynamo/common.py", line 3650, in main process_entry(0, runner, original_dir, args) File "/workspace/pytorch/benchmarks/dynamo/common.py", line 3582, in process_entry return run(runner, args, original_dir) File "/workspace/pytorch/benchmarks/dynamo/common.py", line 4163, in run assert marked, f"nothing in example_inputs had a dim with {batch_size}" AssertionError: nothing in example_inputs had a dim with 4 ``` ![image](https://github.com/pytorch/pytorch/assets/84730719/f25392f0-f982-46c8-8e2c-a8a25d85a21a) - **Fix**: same as PR pytorch/pytorch@ee557d8, the batch_size will be skipped to set as 4 when testing dynamic shapes. Dynamic shapes passrate improved from 89% -> **95%** | Comp Item | Compiler | suite | before | After fix | |-----------|----------|------------|------------|------------| | Pass Rate | Inductor | torchbench | 89%, 73/82 | 95%, 79/83 | X-link: pytorch/pytorch#126764 Approved by: https://github.com/jansel Reviewed By: huydhn Differential Revision: D59035907 fbshipit-source-id: 03b5abd293bc695621af7ef25a4d5940601c81d4

pytorch-bot bot added the module: dynamo label May 21, 2024

WeizhuoZhang-intel marked this pull request as draft May 21, 2024 08:44

pytorchbot added the open source label May 21, 2024

WeizhuoZhang-intel marked this pull request as ready for review May 23, 2024 03:09

WeizhuoZhang-intel marked this pull request as draft May 23, 2024 03:09

WeizhuoZhang-intel marked this pull request as ready for review June 3, 2024 08:22

cpuhrsch requested a review from jansel June 3, 2024 18:09

cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 3, 2024

jansel added ciflow/inductor release notes: inductor labels Jun 4, 2024

jansel approved these changes Jun 4, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 4, 2024

pytorchmergebot added the merging label Jun 4, 2024

pytorchmergebot removed the merging label Jun 4, 2024

pytorchmergebot force-pushed the weizhuoz/fix_dynamo_perf_failure branch from b84d4ac to 6bbbbb9 Compare June 4, 2024 14:08

pytorchmergebot added the merging label Jun 6, 2024

pytorchmergebot removed the merging label Jun 6, 2024

pytorchmergebot removed the merging label Jun 14, 2024

pytorchmergebot force-pushed the weizhuoz/fix_dynamo_perf_failure branch from 86e2459 to 1be9007 Compare June 17, 2024 08:23

pytorchmergebot added the merging label Jun 17, 2024

pytorchmergebot removed the merging label Jun 17, 2024

WeizhuoZhang-intel added 6 commits June 25, 2024 01:35

Write dynamo benchmarks performance result to csv when throw exceptions

ca27c1d

rebase main

2b4efa3

apply lintrunner patch

8919f29

Update dynamo torchbench accuracy baseline

c827b71

Update dynamo torchbench accuracy baseline

b820cc7

remove redundant acc expect perf

e0c60f9

pytorchmergebot force-pushed the weizhuoz/fix_dynamo_perf_failure branch from 1be9007 to e0c60f9 Compare June 25, 2024 01:35

pytorchmergebot added the merging label Jun 25, 2024

pytorchmergebot added the Merged label Jun 25, 2024

pytorchmergebot closed this in 53f462c Jun 25, 2024

pytorchmergebot removed the merging label Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write dynamo benchmarks performance result to csv when throw exceptions #126764

Write dynamo benchmarks performance result to csv when throw exceptions #126764

WeizhuoZhang-intel commented May 21, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading

jansel commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

chuanqi129 commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

WeizhuoZhang-intel commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

chuanqi129 commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

pytorchmergebot commented Jun 14, 2024

chuanqi129 commented Jun 14, 2024

WeizhuoZhang-intel commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

WeizhuoZhang-intel commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

WeizhuoZhang-intel commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

chuanqi129 commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

Write dynamo benchmarks performance result to csv when throw exceptions #126764

Write dynamo benchmarks performance result to csv when throw exceptions #126764

Conversation

WeizhuoZhang-intel commented May 21, 2024 • edited by pytorch-bot bot Loading

pytorch-bot bot commented May 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126764

✅ You can merge normally! (2 Unrelated Failures)

jansel commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

Merge started

pytorchmergebot commented Jun 4, 2024

Merge failed

chuanqi129 commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

pytorchmergebot commented Jun 4, 2024

WeizhuoZhang-intel commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

Merge started

pytorchmergebot commented Jun 6, 2024

Merge failed

chuanqi129 commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

pytorchmergebot commented Jun 6, 2024

pytorchmergebot commented Jun 14, 2024

Merge failed

chuanqi129 commented Jun 14, 2024

WeizhuoZhang-intel commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

WeizhuoZhang-intel commented Jun 17, 2024

pytorchmergebot commented Jun 17, 2024

Merge started

pytorchmergebot commented Jun 17, 2024

Merge failed

WeizhuoZhang-intel commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

chuanqi129 commented Jun 25, 2024

pytorchmergebot commented Jun 25, 2024

Merge started

WeizhuoZhang-intel commented May 21, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 21, 2024 •

edited

Loading