[Dynamic Shapes] fixed dynamic shape inference #128807

zero000064 · 2024-06-17T03:44:31Z

Made dynamic dimension indirectly bound to an integer constrained.
After each ShapeEnv._refine_ranges, check if the new ValueRange is singleton, if it is, replace the symbol.

Fixes #122307

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @ezyang

pytorch-bot · 2024-06-17T03:44:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128807

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 65099e1 with merge base 8a2fed7 ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) (gh) (trunk failure)
test_mps.py::TestMPS::test_mps_allocator_module
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh) (trunk failure)
test_mps.py::TestMPS::test_mps_allocator_module

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / rocm6.1-py3.8-inductor / test (inductor, 1, 1, linux.rocm.gpu.2, unstable) (gh)
'test/inductor/test_max_autotune.py::TestTuningProcess::test_tuning_pool_multiple_devices'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zero000064 · 2024-06-18T02:27:21Z

@pytorchbot merge

pytorchmergebot · 2024-06-18T02:30:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-18T03:32:17Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge)

Details for Dev Infra team

Raised by workflow job

zero000064 · 2024-06-18T03:56:31Z

Here's the information related to the failure, inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge).

This error comes from DALLE2_pytorch unable to be loaded on CPU, doesn't look like related to my code change.

cpu eval DALLE2_pytorch
Traceback (most recent call last):
File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 4149, in run
) = runner.load_model(
File "benchmarks/dynamo/torchbench.py", line 327, in load_model
benchmark = benchmark_cls(
File "/var/lib/jenkins/workspace/torchbench/torchbenchmark/util/model.py", line 24, in call
obj = type.call(cls, *args, **kwargs)
File "/var/lib/jenkins/workspace/torchbench/torchbenchmark/models/DALLE2_pytorch/init.py", line 21, in init
raise NotImplementedError("DALL-E 2 Not Supported on CPU")
NotImplementedError: DALL-E 2 Not Supported on CPU

DALLE2_pytorch FAIL: accuracy=model_fail_to_load, expected=eager_fail_to_run

Error: 1 models have accuracy status regressed:
DALLE2_pytorch

Error: Process completed with exit code 1.

zero000064 · 2024-06-18T18:11:37Z

DALLE2_pytorch doesn't look related to the code change, its error is NotImplementedError: DALL-E 2 Not Supported on CPU.
Checking the first and the last to see if there's connection between the change and them.

[inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346794677) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346794677)) 'Test' [inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench_freezing, 1, 2, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346263190) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346263190)) DALLE2_pytorch [inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346262516) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346262516)) DALLE2_pytorch [inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346263926) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346263926)) DALLE2_pytorch [trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26356646373) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557971266/job/26356646373)) export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes

zero000064 · 2024-06-19T00:47:23Z

Just checked export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serde, the following codes in export/test_export.py give some flexibility on input shape. But my code change ruins this flexibility, need to think about how to keep this flexibility. Luckily, it's Juneteenth tmr, I have some free time to investigate.
dx = Dim("dx", min=1, max=2) ep = export( Bar(), (torch.randn(2, 2), torch.randn(3, 2)), dynamic_shapes=({0: dx, 1: None}, {0: dx + 1, 1: None}), )

zero000064 · 2024-06-20T04:06:57Z

After investigation, the change doesn't deter exporting a graph module in test_derived_dim_1_2 in test_export.
ep = export( Bar(), (torch.randn(2, 2), torch.randn(3, 2)), dynamic_shapes=({0: dx, 1: None}, {0: dx + 1, 1: None}), )

But it only allows the usage of a tensor with dim 2 and blocks the usage of tensor with any other dim figure. This scenario is not the intention because the change intends to block graph compliling instead of its usage.
ep.module()(torch.randn(1, 2), torch.randn(2, 2))
If the graph module is compiled successfully and the the usage meets the min-max requirement, the usage should be allowed.
There might be something incorrectly set in the graph module, so it could be only used with dim 2.

zero000064 · 2024-06-21T04:57:33Z

Before this code change, the graph's first placeholder node represents a fake tensor, FakeTensor(..., size=(s0, 2).
After the code change, the graph's first placeholder node represents a fake tensorFakeTensor(..., size=(2, 2).
The first dimension transforms into a number instead of a SymInt, which causes RuntimeError.
Need to investigate the placeholder node construction to make it turn back to SymInt.
The plan is to catch all SymInt construction with expression equivalent to s0, then I could pick up the fake tensor construction in question and see how replacement plays out in this process.

zero000064 · 2024-06-24T03:12:30Z

_update_var_to_range() is also called in constrain_symbol_range() when deserializing the the graph in _export, that's why replacement's effect propagates to the ExportedProgram, the potential solution might be avoiding code change in _update_var_to_range. Instead, move the code change to ShapeEnv._refine_ranges().
File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/test_serdes.py", line 53, in <module> run_tests() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_dynamo/test_case.py", line 36, in run_tests run_tests() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 1174, in run_tests unittest.main(argv=argv) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/main.py", line 101, in __init__ self.runTests() File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/main.py", line 271, in runTests self.result = testRunner.run(self.test) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/runner.py", line 184, in run test(result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 122, in run test(result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 122, in run test(result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/case.py", line 651, in __call__ return self.run(*args, **kwds) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 2883, in run self._run_custom( File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 2855, in _run_custom super_run(result=result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/case.py", line 592, in run self._callTestMethod(testMethod) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/case.py", line 550, in _callTestMethod method() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 2754, in wrapper method(*args, **kwargs) File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/testing.py", line 40, in _fn return fn(*args, **kwargs) File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/test_export.py", line 1178, in test_derived_dim_1_2 ep = export( File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/test_serdes.py", line 21, in mocked_serder_export loaded_ep = load(buffer) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/export/__init__.py", line 300, in load return load( File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/__init__.py", line 336, in load ep = deserialize(artifact, expected_opset_version) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 2368, in deserialize ExportedProgramDeserializer(expected_opset_version) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 2247, in deserialize GraphModuleDeserializer() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1849, in deserialize self.deserialize_graph(serialized_graph_module.graph) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1586, in deserialize_graph meta_val = self.deserialize_tensor_meta(tensor_value) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1553, in deserialize_tensor_meta tuple(self.deserialize_sym_int(val) for val in tensor_meta.sizes), # type: ignore[misc] File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1553, in <genexpr> tuple(self.deserialize_sym_int(val) for val in tensor_meta.sizes), # type: ignore[misc] File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1500, in deserialize_sym_int self.shape_env.constrain_symbol_range( File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/fx/experimental/recording.py", line 245, in wrapper return fn(*args, **kwargs) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/fx/experimental/symbolic_shapes.py", line 5465, in constrain_symbol_range self._update_var_to_range(s, upd_vr) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/fx/experimental/symbolic_shapes.py", line 4722, in _update_var_to_range log.warning("traceback:%s", traceback.print_stack())

…o fix_dynamic_shape_inference

zero000064 · 2024-06-24T15:37:53Z

Moved the replacement logic to ShapeEnv._refine_ranges.

Tested these test cases including reported failure ones.

TORCH_LOGS="+dynamic,+dynamo" python test/dynamo/test_misc.py -k test_raise_guard_indirect_full_constraint

TORCH_LOGS="+dynamic,+dynamo" python test/dynamo/test_misc.py -k test_guard_failure_fn_shape_control

TORCH_LOGS="+dynamic,+dynamo" python test/dynamo/test_subclasses.py -k test_wrapper_subclass_guards_on_inner_tensor

TORCH_LOGS="+dynamic,+dynamo" python test/export/test_serdes.py -k test_derived_dim_1_2_serdes

zero000064 · 2024-06-24T15:40:03Z

@ezyang Could you review the change again? I changed the code in symbolic_shapes.py to fix test failures.

ezyang · 2024-06-27T19:15:31Z

@pytorchbot merge

pytorchmergebot · 2024-06-27T19:17:42Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

fixed dynamic shape inference bug

9552fb3

pytorch-bot bot added module: dynamo release notes: fx release notes category labels Jun 17, 2024

pytorchbot added the open source label Jun 17, 2024

zero000064 changed the title ~~fixed dynamic shape inference bug~~ [Dynamic Shapes] fixed dynamic shape inference Jun 17, 2024

mikaylagawarecki requested a review from ezyang June 17, 2024 15:16

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 17, 2024

ezyang approved these changes Jun 17, 2024

View reviewed changes

pytorch-bot bot added the ciflow/inductor label Jun 18, 2024

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 18, 2024

pytorchmergebot added the merging label Jun 18, 2024

pytorchmergebot removed the merging label Jun 18, 2024

zero000064 and others added 3 commits June 23, 2024 21:29

Merge branch 'pytorch:main' into fix_dynamic_shape_inference

519a072

moved the replacement code to _refine_range

db014b6

Merge remote-tracking branch 'origin/fix_dynamic_shape_inference' int…

65099e1

…o fix_dynamic_shape_inference

zero000064 requested a review from ezyang June 24, 2024 15:38

ezyang approved these changes Jun 27, 2024

View reviewed changes

pytorchmergebot added the merging label Jun 27, 2024

pytorchmergebot added the Merged label Jun 27, 2024

pytorchmergebot closed this in 5e7ac69 Jun 27, 2024

pytorchmergebot removed the merging label Jun 27, 2024

zero000064 deleted the fix_dynamic_shape_inference branch June 27, 2024 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Dynamic Shapes] fixed dynamic shape inference #128807

[Dynamic Shapes] fixed dynamic shape inference #128807

zero000064 commented Jun 17, 2024 •

edited

Loading

pytorch-bot bot commented Jun 17, 2024 •

edited

Loading

zero000064 commented Jun 18, 2024

pytorchmergebot commented Jun 18, 2024

pytorchmergebot commented Jun 18, 2024

zero000064 commented Jun 18, 2024

zero000064 commented Jun 18, 2024

zero000064 commented Jun 19, 2024

zero000064 commented Jun 20, 2024 •

edited

Loading

zero000064 commented Jun 21, 2024 •

edited

Loading

zero000064 commented Jun 24, 2024 •

edited

Loading

zero000064 commented Jun 24, 2024 •

edited

Loading

zero000064 commented Jun 24, 2024 •

edited

Loading

ezyang commented Jun 27, 2024

pytorchmergebot commented Jun 27, 2024

[Dynamic Shapes] fixed dynamic shape inference #128807

[Dynamic Shapes] fixed dynamic shape inference #128807

Conversation

zero000064 commented Jun 17, 2024 • edited Loading

pytorch-bot bot commented Jun 17, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128807

✅ You can merge normally! (3 Unrelated Failures)

zero000064 commented Jun 18, 2024

pytorchmergebot commented Jun 18, 2024

Merge started

pytorchmergebot commented Jun 18, 2024

Merge failed

zero000064 commented Jun 18, 2024

zero000064 commented Jun 18, 2024

zero000064 commented Jun 19, 2024

zero000064 commented Jun 20, 2024 • edited Loading

zero000064 commented Jun 21, 2024 • edited Loading

zero000064 commented Jun 24, 2024 • edited Loading

zero000064 commented Jun 24, 2024 • edited Loading

zero000064 commented Jun 24, 2024 • edited Loading

ezyang commented Jun 27, 2024

pytorchmergebot commented Jun 27, 2024

Merge started

zero000064 commented Jun 17, 2024 •

edited

Loading

pytorch-bot bot commented Jun 17, 2024 •

edited

Loading

zero000064 commented Jun 20, 2024 •

edited

Loading

zero000064 commented Jun 21, 2024 •

edited

Loading

zero000064 commented Jun 24, 2024 •

edited

Loading

zero000064 commented Jun 24, 2024 •

edited

Loading

zero000064 commented Jun 24, 2024 •

edited

Loading