Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raise_last_usage memory optimization pass to Inductor #125559

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yf225
Copy link
Contributor

@yf225 yf225 commented May 6, 2024

This PR adds a new Inductor memory optimization pass to raise the consumer nodes if moving them up can free more memory (i.e. if its output tensor is smaller than the sum memory of all its last-usage input tensors).

Particularly, for each node in the graph, we raise its consumer node to be right after the current node, if it satisfies the following conditions:

  • The consumer node's all input args have their write sites scheduled before or at the current node.
  • The consumer node only writes to one output tensor.
  • The consumer node's out tensor is smaller than the sum memory of all its last-usage input args (i.e. we can get memory savings by raising it).

This pass is particularly important for significantly reducing peak memory usage for Compiled FSDP2 on llama-7b model (decreasing mem usage by more than 57GB).

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Copy link

pytorch-bot bot commented May 6, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/125559

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 5c742dc with merge base da991fa (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@yf225
Copy link
Contributor Author

yf225 commented Jun 19, 2024

Do we still need this pass if we set Inductor config reorder_for_locality=False by default for Compiled Autograd graph?

Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 18, 2024
@yf225 yf225 removed the Stale label Aug 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant