[custom_op] stop using nonlocals to store information (#128547) #128616

zou3519 · 2024-06-13T15:27:40Z

We had a problem with multithreading where the nonlocals were being clobbered. In the first place, we stored these nonlocals because we wanted to ferry information from an autograd.Function.apply to autograd.Function.forward.

Our new approach is:

pass the information directly as an input to the autograd.Function.apply. This means that the autograd.Function.forward will receive the information too.
this messes up ctx.needs_input_grad, which has an element per input to forward. The user should not see the additional information we passed. We fix this by temporarily overriding ctx.needs_input_grad to the right thing.
this exposed a bug in that ctx.needs_input_grad wasn't correct for TensorList inputs. This PR fixes that too.

Test Plan:

existing and new tests Pull Request resolved: [custom_op] stop using nonlocals to store information #128547 Approved by: https://github.com/williamwen42, https://github.com/soulitzer

Fixes #ISSUE_NUMBER

Fixes #128544 Fixes #128535 We had a problem with multithreading where the nonlocals were being clobbered. In the first place, we stored these nonlocals because we wanted to ferry information from an autograd.Function.apply to autograd.Function.forward. Our new approach is: - pass the information directly as an input to the autograd.Function.apply. This means that the autograd.Function.forward will receive the information too. - this messes up ctx.needs_input_grad, which has an element per input to forward. The user should not see the additional information we passed. We fix this by temporarily overriding ctx.needs_input_grad to the right thing. - this exposed a bug in that ctx.needs_input_grad wasn't correct for TensorList inputs. This PR fixes that too. Test Plan: - existing and new tests Pull Request resolved: #128547 Approved by: https://github.com/williamwen42, https://github.com/soulitzer

pytorch-bot · 2024-06-13T15:27:43Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128616

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 2ce7865 with merge base b66e3f0 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3_8-clang9-xla / test (xla, 1, 1, linux.12xlarge) (gh) (similar failure)
test_repeat_truncated

This comment was automatically generated by Dr. CI and updates every 15 minutes.

zou3519 requested review from albanD and soulitzer as code owners June 13, 2024 15:27

zou3519 mentioned this pull request Jun 13, 2024

[v.2.4.0] Release Tracker #128436

Closed

soulitzer approved these changes Jun 13, 2024

View reviewed changes

Skylion007 approved these changes Jun 13, 2024

View reviewed changes

atalman approved these changes Jun 19, 2024

View reviewed changes

atalman merged commit e7dde73 into release/2.4 Jun 19, 2024
103 of 104 checks passed

github-actions bot deleted the rzou/2.4/fix_custom_op_local branch July 20, 2024 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[custom_op] stop using nonlocals to store information (#128547) #128616

[custom_op] stop using nonlocals to store information (#128547) #128616

zou3519 commented Jun 13, 2024

pytorch-bot bot commented Jun 13, 2024 •

edited

Loading

[custom_op] stop using nonlocals to store information (#128547) #128616

[custom_op] stop using nonlocals to store information (#128547) #128616

Conversation

zou3519 commented Jun 13, 2024

pytorch-bot bot commented Jun 13, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128616

✅ You can merge normally! (1 Unrelated Failure)

pytorch-bot bot commented Jun 13, 2024 •

edited

Loading