Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[inductor] Enable FX graph caching in OSS by default #125863

Closed
wants to merge 7 commits into from

Conversation

Copy link

pytorch-bot bot commented May 9, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125863

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 586b5ad with merge base 58b8704 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

masnesral added a commit that referenced this pull request May 9, 2024
ghstack-source-id: 7e9c446a3a85609783254bdb020ce918a52b98ee
Pull Request resolved: #125863
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Jun 7, 2024
ghstack-source-id: c6ffdd3499d05d438551532d3fe4a58b6d69b2da
Pull Request resolved: #125863
@masnesral
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/masnesral/52/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/125863)

pytorchmergebot pushed a commit that referenced this pull request Jun 10, 2024
ghstack-source-id: 6b05f44fac2ae7badbed91a0ca1a0dc81483c95b
Pull Request resolved: #125863
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Jun 12, 2024
ghstack-source-id: 604f921b8c570201bc1eb10bc3b84669fa364295
Pull Request resolved: #125863
@masnesral masnesral marked this pull request as ready for review June 13, 2024 02:36
@masnesral masnesral requested a review from eellison June 13, 2024 02:36
@masnesral masnesral added topic: not user facing topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jun 13, 2024
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 🚀 🚀

What bugs do we still have remaining ? @oulgen was looking into a strides one, right, but I guess that's it ?

@oulgen
Copy link
Contributor

oulgen commented Jun 13, 2024

@masnesral yep, I still havent looked at the stride one and removing cache disable from more unit tests revealed bugs too, so perhaps lets hold off on this until we prove the cache internally?

@eellison
Copy link
Contributor

Okay, good with me. let's fix the remaining unittest bugs. although typically @oulgen we enable in oss first then do internally

@masnesral
Copy link
Contributor Author

@oulgen if you want to point me to any other known failures you've surfaced I'll gladly investigate

@oulgen
Copy link
Contributor

oulgen commented Jun 13, 2024

Clarifying: The only real problem that we are aware of is

[trainer0]:    assert_size_stride(bmm_9, (17, s0, 512), (4081152, 512, 1))
[trainer0]:AssertionError: expected size 17==17, stride 4022784==4081152 at dim=0

I do not yet have a repro..

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Jun 20, 2024
ghstack-source-id: 653db6f494c3ac3dfc94f82b61d06e4fd6cb0e4d
Pull Request resolved: #125863
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Jun 24, 2024
ghstack-source-id: 1bd20123b341af5c068725a9ff1a58326af9fa8e
Pull Request resolved: #125863
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@clee2000 clee2000 added the ci-no-td Do not run TD on this PR label Jun 25, 2024
@pytorchmergebot
Copy link
Collaborator

@masnesral your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Jun 25, 2024
This reverts commit 4c1e4c5.

Reverted #125863 on behalf of https://github.com/clee2000 due to one of the PRs in the stack seems to have broken test/distributed/_composable/test_replicate_with_compiler.py::ReplicateTest::test_bucketing_concat_op on distributed https://github.com/pytorch/pytorch/actions/runs/9653941844/job/26627760340 https://hud.pytorch.org/pytorch/pytorch/commit/4c1e4c5f307f9743014a08cf97d3fa8de7e1ce5f, not tested on this PR due to bad TD ([comment](#129257 (comment)))
@masnesral
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to

Aborting rebase because rebasing the branch resulted in the same sha as the target branch.
This usually happens because the PR has already been merged.  Please rebase locally and push.

Raised by https://github.com/pytorch/pytorch/actions/runs/9942955684

@masnesral
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to

Aborting rebase because rebasing the branch resulted in the same sha as the target branch.
This usually happens because the PR has already been merged.  Please rebase locally and push.

Raised by https://github.com/pytorch/pytorch/actions/runs/10115112965

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
masnesral added a commit that referenced this pull request Jul 26, 2024
Pull Request resolved: #125863
ghstack-source-id: 0e7c6dd350277f360986ca9dd1bbe5225b07639f
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td Do not run TD on this PR ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor Reverted topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants