Reduce all tensors to their metadata in AOTAutogradCache; add tests #128583

jamesjwu · 2024-06-13T04:13:51Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this.

One other thing that this diff does is add [pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast) to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior.

Technically fast is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed)

[ghstack-poisoned]

pytorch-bot · 2024-06-13T04:13:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128583

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[PREEMPTIVE] CPU amd64 non-GPU instances migration to Linux AMZN2023

❌ 1 New Failure

As of commit 081a7f5 with merge base 4cc3fb5 ():

NEW FAILURE - The following job has failed:

inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 21f51799b5f3c15425fca3f72ca8bd99f5909c57 Pull Request resolved: #128583

[ghstack-poisoned]

ghstack-source-id: cd72952c676601bd289d9139ed8a6904f67a60d7 Pull Request resolved: #128583

[ghstack-poisoned]

oulgen · 2024-06-17T16:47:44Z

torch/_inductor/codecache.py

@@ -526,6 +526,8 @@ def dumps(cls, obj) -> bytes:
 """
 with io.BytesIO() as stream:
 pickler = cls(stream)
+ # TODO: pickler.fast is technically deprecated. Will this work on new python versions?
+ pickler.fast = True # Run with pickler.fast so it doesn't intern strings, making the hash result more predictable


this feels like a time bomb

Here's my reasoning why it isn't, but feel free to disagree:

The way this breaks is if we upgrade python version, right? As of Python 3.12 this feature is supported, and when we upgrade python version, it's not like we'll instantly move to it without a PR to fix existing issues. So Python would need to 1) fully remove the feature 2) cause it to crash instead of, say, silently doing nothing 3) release it in a new version of python.

Even after all of that, because it's python, even if they remove the fast feature from picklers, setting "fast" on an object in python won't crash: it might do nothing, which would just lead to the behavior of not adding it.

If we want to be extra safe, we could add a version check here as well.

I'm more worried about silent failure and the next steps. We need to be able to detect that we have cache misses which will be hard because it we will still get hits since this is not a common occurence. Secondly, we don't actually know how to fix it when this happens, so this will be on the critical path for say python 3.13 upgrade. I'd be more okay with it I think if we knew how to fix it but choose not to do it yet.

One other thing: this makes it so we can't pickle cyclic objects like:

class MyNode: def __init__(self): self.name = self

However, on trying to pickle one of these, we'll get a safe Pickle ValueError like so:

*** ValueError: fast mode: can't pickle cyclic objects including object type dict at 0x7f7079dd8980

which will naturally lead to a BypassGraphCache and a cache miss. So if we ever do create cyclic objects (which we don't today), we will still fail fast.

That said, my opinion on this isn't super strong because I feel like in real life no one will ever hit this cache miss too seriously (since it looks like pickle will be deterministic anyway on process start). But it does feel like a free win to me, where the only downside is we might have to remove the win later.

To elaborate on why I think it's a "free win": checking https://docs.python.org/es/dev/whatsnew/3.13.html#new-deprecations, it doesn't look like Pickler.fast is being removed, up to and including Python 3.16. Second of all, I'm not sure if it's really in the critical path: if, say, Python 3.17 removes Pickler.fast, we just stop doing this optimization and cache miss more, right? At worst, we've reverted back to the existing behavior without it.

My 2c. I don't like it because it feels like a hack, but I think it's much less of a hack than the one I already added here:

pytorch/torch/_inductor/codecache.py

Lines 432 to 440 in c012013

# The pickle implementation avoids serializing the same object more than once.

# That behavior means the byte stream we create to hash will vary if, for example,

# we see two tensor objects with the same device, but the torch.device object is

# actually the same object vs. merely equivalent. We want to produce the same hash

# value in either situation, so we memoize the device objects and always reference

# the same object for a given device. It's possible other metadata fields deserve

# the same treatment, but so far we've only observed this issue with the device.

if meta.device not in device_map:

device_map[meta.device] = meta.device

So I think I I'd prefer to remove my hack in favor of your hack.

this will be on the critical path for say python 3.13 upgrade

I wonder if there's precedent for an assertion on the version just so that we can get a "callback" when someone starts work on support for the next Python version? If it "just works", we can bump the version. If not, then I presume support for any new version is a kind of a slog and we can discuss alternatives in parallel with the slog? But I can also see how that'd be obnoxious.

we'll get a safe Pickle ValueError like so:

Not specific to this specific exception, but: what do you guys think about introducing some kind of "strict" mode that we enable for CI (but not prod) that hard fails for unexpected failures like this, rather than quietly bypassing? It would be nice to know immediately if anyone makes a change that introduces cycles, for example.

I actually already added that for AOTAutogradCache in #128222, so feel free to add one for strict FXGraphCache as well. I use the strict setting on that specific unit test suite. If you do add one for FXGraphCache, let me know/please also turn it on in TestAOTAutogradWithCache.

[ghstack-poisoned]

…add tests" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this. One other thing that this diff does is add `[pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast)` to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior. Technically `fast` is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed) [ghstack-poisoned]

…iltins [ghstack-poisoned]

[ghstack-poisoned]

ghstack-source-id: 4275126f2416e4411fbc3fec7e7e3edd46f6903d Pull Request resolved: #128583

[ghstack-poisoned]

jamesjwu · 2024-07-09T17:08:35Z

@bdhirsh I think this is ready for review now

jamesjwu · 2024-07-10T17:24:11Z

@pytorchbot merge -i

pytorchmergebot · 2024-07-10T17:26:01Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

jamesjwu · 2024-07-11T15:31:55Z

@pytorchbot merge -i

pytorchmergebot · 2024-07-11T15:33:39Z

Merge started

Your change will be merged while ignoring the following 1 checks: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ghstack-source-id: 70374dfa690c0e98b18af793d1e470e698227ac0 Pull Request resolved: pytorch/pytorch#128583

…ytorch#128583) This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this. One other thing that this diff does is add `[pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast)` to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior. Technically `fast` is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed) Pull Request resolved: pytorch#128583 Approved by: https://github.com/oulgen ghstack dependencies: pytorch#128335

Update

4c42e9c

[ghstack-poisoned]

This was referenced Jun 13, 2024

Add functionality to make ViewAndMutationData (slightly more) cache safe #127618

Closed

Run TestAOTAutograd test suite with cache #128222

Closed

jamesjwu mentioned this pull request Jun 13, 2024

Support allowlisted modules and op overloads in AOTAutogradCache #128329

Closed

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Jun 13, 2024

jamesjwu mentioned this pull request Jun 13, 2024

Ignore functional tensor wrapper when caching #128335

Closed

Update

abc8ca2

[ghstack-poisoned]

Update

586c5bb

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jun 13, 2024

Reduce all tensors to their metadata in AOTAutogradCache; add tests

ee6d73d

ghstack-source-id: 21f51799b5f3c15425fca3f72ca8bd99f5909c57 Pull Request resolved: #128583

Update

fb62c93

[ghstack-poisoned]

Update

05c11c9

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jun 13, 2024

Reduce all tensors to their metadata in AOTAutogradCache; add tests

d36a6b4

ghstack-source-id: cd72952c676601bd289d9139ed8a6904f67a60d7 Pull Request resolved: #128583

Code review

bfd52f1

[ghstack-poisoned]

jamesjwu mentioned this pull request Jun 17, 2024

[not for commit] Random perf opts #128841

Draft

jamesjwu requested review from a team, oulgen, masnesral and aorenste June 17, 2024 16:37

jamesjwu marked this pull request as ready for review June 17, 2024 16:42

oulgen reviewed Jun 17, 2024

View reviewed changes

jamesjwu added 4 commits June 17, 2024 10:39

Update

f1157ba

[ghstack-poisoned]

Update

9147356

[ghstack-poisoned]

jamesjwu added 3 commits June 20, 2024 08:37

Code review: ignore private namespaced torch functions that aren't bu…

93d5af7

…iltins [ghstack-poisoned]

Add note and comment to check_node_safe docstring

de67585

[ghstack-poisoned]

Update

3db13e4

[ghstack-poisoned]

jamesjwu added a commit that referenced this pull request Jun 21, 2024

Reduce all tensors to their metadata in AOTAutogradCache; add tests

c0853a6

ghstack-source-id: 4275126f2416e4411fbc3fec7e7e3edd46f6903d Pull Request resolved: #128583

Update

6d5d6c9

[ghstack-poisoned]

jamesjwu mentioned this pull request Jun 24, 2024

Factor out cudagraph post compile into its own function #129384

Closed

jamesjwu added 5 commits June 25, 2024 11:02

Remove torch.* functions from safe list

b357048

[ghstack-poisoned]

Rebase

80e076f

[ghstack-poisoned]

Update

6ba95c0

[ghstack-poisoned]

Update

ba48d6f

[ghstack-poisoned]

Rebase

081a7f5

[ghstack-poisoned]

jamesjwu mentioned this pull request Jul 8, 2024

Refactor cudagraphs to use serializable placeholder info #130252

Closed

oulgen approved these changes Jul 10, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 10, 2024

pytorchmergebot added the merging label Jul 10, 2024

pytorchmergebot removed the merging label Jul 10, 2024

jamesjwu added the topic: not user facing topic category label Jul 11, 2024

pytorchmergebot added the merging label Jul 11, 2024

pytorchmergebot added the Merged label Jul 11, 2024

pytorchmergebot closed this in 5ed72ff Jul 11, 2024

pytorchmergebot removed the merging label Jul 11, 2024

henrylhtsang mentioned this pull request Jul 17, 2024

[aoti] Unskip some aot inductor tests #130973

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce all tensors to their metadata in AOTAutogradCache; add tests #128583

Reduce all tensors to their metadata in AOTAutogradCache; add tests #128583

jamesjwu commented Jun 13, 2024 •

edited

Loading

pytorch-bot bot commented Jun 13, 2024 •

edited

Loading

oulgen Jun 17, 2024

jamesjwu Jun 17, 2024

oulgen Jun 17, 2024

jamesjwu Jun 17, 2024

jamesjwu Jun 17, 2024

jamesjwu Jun 17, 2024

masnesral Jun 25, 2024

jamesjwu Jun 26, 2024

jamesjwu commented Jul 9, 2024

jamesjwu commented Jul 10, 2024

pytorchmergebot commented Jul 10, 2024

jamesjwu commented Jul 11, 2024

pytorchmergebot commented Jul 11, 2024

	# The pickle implementation avoids serializing the same object more than once.
	# That behavior means the byte stream we create to hash will vary if, for example,
	# we see two tensor objects with the same device, but the torch.device object is
	# actually the same object vs. merely equivalent. We want to produce the same hash
	# value in either situation, so we memoize the device objects and always reference
	# the same object for a given device. It's possible other metadata fields deserve
	# the same treatment, but so far we've only observed this issue with the device.
	if meta.device not in device_map:
	device_map[meta.device] = meta.device

Reduce all tensors to their metadata in AOTAutogradCache; add tests #128583

Reduce all tensors to their metadata in AOTAutogradCache; add tests #128583

Conversation

jamesjwu commented Jun 13, 2024 • edited Loading

pytorch-bot bot commented Jun 13, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128583

❗ 1 Active SEVs

❌ 1 New Failure

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesjwu commented Jul 9, 2024

jamesjwu commented Jul 10, 2024

pytorchmergebot commented Jul 10, 2024

Merge failed

jamesjwu commented Jul 11, 2024

pytorchmergebot commented Jul 11, 2024

Merge started

jamesjwu commented Jun 13, 2024 •

edited

Loading

pytorch-bot bot commented Jun 13, 2024 •

edited

Loading