Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic Shapes] fixed dynamic shape inference #128807

Closed

Conversation

zero000064
Copy link
Contributor

@zero000064 zero000064 commented Jun 17, 2024

Made dynamic dimension indirectly bound to an integer constrained.
After each ShapeEnv._refine_ranges, check if the new ValueRange is singleton, if it is, replace the symbol.

Fixes #122307

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @ezyang

Copy link

pytorch-bot bot commented Jun 17, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128807

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 65099e1 with merge base 8a2fed7 (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: dynamo release notes: fx release notes category labels Jun 17, 2024
@zero000064 zero000064 changed the title fixed dynamic shape inference bug [Dynamic Shapes] fixed dynamic shape inference Jun 17, 2024
@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 17, 2024
@zero000064
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 18, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge)

Details for Dev Infra team Raised by workflow job

@zero000064
Copy link
Contributor Author

Here's the information related to the failure, inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge).

This error comes from DALLE2_pytorch unable to be loaded on CPU, doesn't look like related to my code change.

cpu eval DALLE2_pytorch
Traceback (most recent call last):
File "/var/lib/jenkins/workspace/benchmarks/dynamo/common.py", line 4149, in run
) = runner.load_model(
File "benchmarks/dynamo/torchbench.py", line 327, in load_model
benchmark = benchmark_cls(
File "/var/lib/jenkins/workspace/torchbench/torchbenchmark/util/model.py", line 24, in call
obj = type.call(cls, *args, **kwargs)
File "/var/lib/jenkins/workspace/torchbench/torchbenchmark/models/DALLE2_pytorch/init.py", line 21, in init
raise NotImplementedError("DALL-E 2 Not Supported on CPU")
NotImplementedError: DALL-E 2 Not Supported on CPU

DALLE2_pytorch FAIL: accuracy=model_fail_to_load, expected=eager_fail_to_run

Error: 1 models have accuracy status regressed:
DALLE2_pytorch

Error: Process completed with exit code 1.

@zero000064
Copy link
Contributor Author

DALLE2_pytorch doesn't look related to the code change, its error is NotImplementedError: DALL-E 2 Not Supported on CPU.
Checking the first and the last to see if there's connection between the change and them.

[inductor / cuda12.1-py3.12-gcc9-sm86 / test (inductor, 1, 1, linux.g5.4xlarge.nvidia.gpu)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346794677) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346794677)) 'Test' [inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench_freezing, 1, 2, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346263190) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346263190)) DALLE2_pytorch [inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346262516) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346262516)) DALLE2_pytorch [inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26346263926) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557946267/job/26346263926)) DALLE2_pytorch [trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)](https://hud.pytorch.org/pr/pytorch/pytorch/128807#26356646373) ([gh](https://github.com/pytorch/pytorch/actions/runs/9557971266/job/26356646373)) export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serdes

@zero000064
Copy link
Contributor Author

Just checked export/test_serdes.py::SerDesExportTestExport::test_derived_dim_1_2_serde, the following codes in export/test_export.py give some flexibility on input shape. But my code change ruins this flexibility, need to think about how to keep this flexibility. Luckily, it's Juneteenth tmr, I have some free time to investigate.
dx = Dim("dx", min=1, max=2) ep = export( Bar(), (torch.randn(2, 2), torch.randn(3, 2)), dynamic_shapes=({0: dx, 1: None}, {0: dx + 1, 1: None}), )

@zero000064
Copy link
Contributor Author

zero000064 commented Jun 20, 2024

After investigation, the change doesn't deter exporting a graph module in test_derived_dim_1_2 in test_export.
ep = export( Bar(), (torch.randn(2, 2), torch.randn(3, 2)), dynamic_shapes=({0: dx, 1: None}, {0: dx + 1, 1: None}), )

But it only allows the usage of a tensor with dim 2 and blocks the usage of tensor with any other dim figure. This scenario is not the intention because the change intends to block graph compliling instead of its usage.
ep.module()(torch.randn(1, 2), torch.randn(2, 2))
If the graph module is compiled successfully and the the usage meets the min-max requirement, the usage should be allowed.
There might be something incorrectly set in the graph module, so it could be only used with dim 2.

@zero000064
Copy link
Contributor Author

zero000064 commented Jun 21, 2024

Before this code change, the graph's first placeholder node represents a fake tensor, FakeTensor(..., size=(s0, 2).
After the code change, the graph's first placeholder node represents a fake tensorFakeTensor(..., size=(2, 2).
The first dimension transforms into a number instead of a SymInt, which causes RuntimeError.
Need to investigate the placeholder node construction to make it turn back to SymInt.
The plan is to catch all SymInt construction with expression equivalent to s0, then I could pick up the fake tensor construction in question and see how replacement plays out in this process.

@zero000064
Copy link
Contributor Author

zero000064 commented Jun 24, 2024

_update_var_to_range() is also called in constrain_symbol_range() when deserializing the the graph in _export, that's why replacement's effect propagates to the ExportedProgram, the potential solution might be avoiding code change in _update_var_to_range. Instead, move the code change to ShapeEnv._refine_ranges().
File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/test_serdes.py", line 53, in <module> run_tests() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_dynamo/test_case.py", line 36, in run_tests run_tests() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 1174, in run_tests unittest.main(argv=argv) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/main.py", line 101, in __init__ self.runTests() File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/main.py", line 271, in runTests self.result = testRunner.run(self.test) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/runner.py", line 184, in run test(result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 122, in run test(result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 84, in __call__ return self.run(*args, **kwds) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/suite.py", line 122, in run test(result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/case.py", line 651, in __call__ return self.run(*args, **kwds) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 2883, in run self._run_custom( File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 2855, in _run_custom super_run(result=result) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/case.py", line 592, in run self._callTestMethod(testMethod) File "/home/zero064/anaconda3/envs/pytorch_dev/lib/python3.9/unittest/case.py", line 550, in _callTestMethod method() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/testing/_internal/common_utils.py", line 2754, in wrapper method(*args, **kwargs) File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/testing.py", line 40, in _fn return fn(*args, **kwargs) File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/test_export.py", line 1178, in test_derived_dim_1_2 ep = export( File "/home/zero064/Downloads/pytorch_fork/pytorch/test/export/test_serdes.py", line 21, in mocked_serder_export loaded_ep = load(buffer) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/export/__init__.py", line 300, in load return load( File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/__init__.py", line 336, in load ep = deserialize(artifact, expected_opset_version) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 2368, in deserialize ExportedProgramDeserializer(expected_opset_version) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 2247, in deserialize GraphModuleDeserializer() File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1849, in deserialize self.deserialize_graph(serialized_graph_module.graph) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1586, in deserialize_graph meta_val = self.deserialize_tensor_meta(tensor_value) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1553, in deserialize_tensor_meta tuple(self.deserialize_sym_int(val) for val in tensor_meta.sizes), # type: ignore[misc] File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1553, in <genexpr> tuple(self.deserialize_sym_int(val) for val in tensor_meta.sizes), # type: ignore[misc] File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/_export/serde/serialize.py", line 1500, in deserialize_sym_int self.shape_env.constrain_symbol_range( File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/fx/experimental/recording.py", line 245, in wrapper return fn(*args, **kwargs) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/fx/experimental/symbolic_shapes.py", line 5465, in constrain_symbol_range self._update_var_to_range(s, upd_vr) File "/home/zero064/Downloads/pytorch_fork/pytorch/torch/fx/experimental/symbolic_shapes.py", line 4722, in _update_var_to_range log.warning("traceback:%s", traceback.print_stack())

@zero000064
Copy link
Contributor Author

zero000064 commented Jun 24, 2024

Moved the replacement logic to ShapeEnv._refine_ranges.

Tested these test cases including reported failure ones.

TORCH_LOGS="+dynamic,+dynamo" python test/dynamo/test_misc.py -k test_raise_guard_indirect_full_constraint

TORCH_LOGS="+dynamic,+dynamo" python test/dynamo/test_misc.py -k test_guard_failure_fn_shape_control

TORCH_LOGS="+dynamic,+dynamo" python test/dynamo/test_subclasses.py -k test_wrapper_subclass_guards_on_inner_tensor

TORCH_LOGS="+dynamic,+dynamo" python test/export/test_serdes.py -k test_derived_dim_1_2_serdes

@zero000064 zero000064 requested a review from ezyang June 24, 2024 15:38
@zero000064
Copy link
Contributor Author

zero000064 commented Jun 24, 2024

@ezyang Could you review the change again? I changed the code in symbolic_shapes.py to fix test failures.

@ezyang
Copy link
Contributor

ezyang commented Jun 27, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@zero000064 zero000064 deleted the fix_dynamic_shape_inference branch June 27, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: dynamo open source release notes: fx release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ConstraintViolationError not thrown when constraining dynamic dim to static int
5 participants