Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce all tensors to their metadata in AOTAutogradCache; add tests #128583

Closed
wants to merge 20 commits into from

Conversation

jamesjwu
Copy link
Contributor

@jamesjwu jamesjwu commented Jun 13, 2024

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this.

One other thing that this diff does is add [pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast) to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior.

Technically fast is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed)

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jun 13, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128583

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 081a7f5 with merge base 4cc3fb5 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jun 13, 2024
ghstack-source-id: 21f51799b5f3c15425fca3f72ca8bd99f5909c57
Pull Request resolved: #128583
[ghstack-poisoned]
[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jun 13, 2024
ghstack-source-id: cd72952c676601bd289d9139ed8a6904f67a60d7
Pull Request resolved: #128583
[ghstack-poisoned]
@jamesjwu jamesjwu requested review from a team, oulgen, masnesral and aorenste June 17, 2024 16:37
@jamesjwu jamesjwu marked this pull request as ready for review June 17, 2024 16:42
@@ -526,6 +526,8 @@ def dumps(cls, obj) -> bytes:
"""
with io.BytesIO() as stream:
pickler = cls(stream)
# TODO: pickler.fast is technically deprecated. Will this work on new python versions?
pickler.fast = True # Run with pickler.fast so it doesn't intern strings, making the hash result more predictable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a time bomb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my reasoning why it isn't, but feel free to disagree:

  • The way this breaks is if we upgrade python version, right? As of Python 3.12 this feature is supported, and when we upgrade python version, it's not like we'll instantly move to it without a PR to fix existing issues. So Python would need to 1) fully remove the feature 2) cause it to crash instead of, say, silently doing nothing 3) release it in a new version of python.
  • Even after all of that, because it's python, even if they remove the fast feature from picklers, setting "fast" on an object in python won't crash: it might do nothing, which would just lead to the behavior of not adding it.
  • If we want to be extra safe, we could add a version check here as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more worried about silent failure and the next steps. We need to be able to detect that we have cache misses which will be hard because it we will still get hits since this is not a common occurence. Secondly, we don't actually know how to fix it when this happens, so this will be on the critical path for say python 3.13 upgrade. I'd be more okay with it I think if we knew how to fix it but choose not to do it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing: this makes it so we can't pickle cyclic objects like:

class MyNode:
    def __init__(self):
        self.name = self

However, on trying to pickle one of these, we'll get a safe Pickle ValueError like so:

*** ValueError: fast mode: can't pickle cyclic objects including object type dict at 0x7f7079dd8980

which will naturally lead to a BypassGraphCache and a cache miss. So if we ever do create cyclic objects (which we don't today), we will still fail fast.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, my opinion on this isn't super strong because I feel like in real life no one will ever hit this cache miss too seriously (since it looks like pickle will be deterministic anyway on process start). But it does feel like a free win to me, where the only downside is we might have to remove the win later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on why I think it's a "free win": checking https://docs.python.org/es/dev/whatsnew/3.13.html#new-deprecations, it doesn't look like Pickler.fast is being removed, up to and including Python 3.16. Second of all, I'm not sure if it's really in the critical path: if, say, Python 3.17 removes Pickler.fast, we just stop doing this optimization and cache miss more, right? At worst, we've reverted back to the existing behavior without it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c. I don't like it because it feels like a hack, but I think it's much less of a hack than the one I already added here:

# The pickle implementation avoids serializing the same object more than once.
# That behavior means the byte stream we create to hash will vary if, for example,
# we see two tensor objects with the same device, but the torch.device object is
# actually the same object vs. merely equivalent. We want to produce the same hash
# value in either situation, so we memoize the device objects and always reference
# the same object for a given device. It's possible other metadata fields deserve
# the same treatment, but so far we've only observed this issue with the device.
if meta.device not in device_map:
device_map[meta.device] = meta.device

So I think I I'd prefer to remove my hack in favor of your hack.

this will be on the critical path for say python 3.13 upgrade

I wonder if there's precedent for an assertion on the version just so that we can get a "callback" when someone starts work on support for the next Python version? If it "just works", we can bump the version. If not, then I presume support for any new version is a kind of a slog and we can discuss alternatives in parallel with the slog? But I can also see how that'd be obnoxious.

we'll get a safe Pickle ValueError like so:

Not specific to this specific exception, but: what do you guys think about introducing some kind of "strict" mode that we enable for CI (but not prod) that hard fails for unexpected failures like this, rather than quietly bypassing? It would be nice to know immediately if anyone makes a change that introduces cycles, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually already added that for AOTAutogradCache in #128222, so feel free to add one for strict FXGraphCache as well. I use the strict setting on that specific unit test suite. If you do add one for FXGraphCache, let me know/please also turn it on in TestAOTAutogradWithCache.

[ghstack-poisoned]
[ghstack-poisoned]
…add tests"




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this.

One other thing that this diff does is add `[pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast)` to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior. 

Technically `fast` is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed)

[ghstack-poisoned]
…add tests"




cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this.

One other thing that this diff does is add `[pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast)` to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior. 

Technically `fast` is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed)

[ghstack-poisoned]
jamesjwu added a commit that referenced this pull request Jun 21, 2024
ghstack-source-id: 4275126f2416e4411fbc3fec7e7e3edd46f6903d
Pull Request resolved: #128583
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@jamesjwu
Copy link
Contributor Author

jamesjwu commented Jul 9, 2024

@bdhirsh I think this is ready for review now

@jamesjwu
Copy link
Contributor Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 10, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@jamesjwu
Copy link
Contributor Author

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

francograndegmailcom pushed a commit to francograndegmailcom/pytorch-pytorch that referenced this pull request Jul 23, 2024
ghstack-source-id: 70374dfa690c0e98b18af793d1e470e698227ac0
Pull Request resolved: pytorch/pytorch#128583
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…ytorch#128583)

This PR makes it so that all tensors are reduced to their metadata in AOTAutogradCache. Because dynamo always embeds constant tensors into the FXgraph directly, there's no risk of a constant tensor whose values are semantically important being lost here. AOTAutograd itself may take a constant tensor and set it as an attribute on an FXGraph for inductor, but Dynamo never does this.

One other thing that this diff does is add `[pickler.fast](https://docs.python.org/3/library/pickle.html#pickle.Pickler.fast)` to our pickling algorithm for cache key generation. Pickle will often memoize/intern strings when pickling, leading to false cache misses due to inconsistent memoization. Turning on pickler.fast removes this behavior.

Technically `fast` is a "deprecated" feature according to python docs. But it's still supported in py3.8-3.12, and if it ever is removed, the only downside will just be a few more cache misses, so I think it's worth just adding here (and removing later as needed)
Pull Request resolved: pytorch#128583
Approved by: https://github.com/oulgen
ghstack dependencies: pytorch#128335
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants