Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Add support for device extension autoloading #127074

Closed
wants to merge 31 commits into from

Conversation

shink
Copy link
Contributor

@shink shink commented May 24, 2024

Fixes #122468

  • Load device extensions at the end of torch/__init__.py
  • Enabled by default, or you can disable it with TORCH_DEVICE_BACKEND_AUTOLOAD=0

run test:

python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

cc @albanD

Copy link

pytorch-bot bot commented May 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127074

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 45fd7ad with merge base 6b5fbc5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@shink shink marked this pull request as draft May 24, 2024 09:35
@shink shink marked this pull request as draft May 24, 2024 09:35
@shink shink marked this pull request as draft May 24, 2024 09:35
@jgong5
Copy link
Collaborator

jgong5 commented May 24, 2024

Thanks for the pull request. Could you please add some tests?

@jgong5
Copy link
Collaborator

jgong5 commented May 24, 2024

cc @bsochack

@shink
Copy link
Contributor Author

shink commented May 24, 2024

Thanks for the pull request. Could you please add some tests?

Sure, I'm willing to do this!

@bsochack
Copy link

@shink we pushed a draft PR of a change here: #127228
A few changes (as in review) + unit tests are still required.

@jczaja
Copy link

jczaja commented May 27, 2024

@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash :
image

Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py .
So to avoid that it would be good to hide changes proposed here inside a function. This issue may be observed with extensions that are directly importing torch._dynamo . Here is other PR implementing mentioned RFC that works for us: #127228 .

@shink
Copy link
Contributor Author

shink commented May 28, 2024

@bsochack @jczaja Nice work! Thanks so much!

@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash : image

Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py . So to avoid that it would be good to hide changes proposed here inside a function. This issue may be observed with extensions that are directly importing torch._dynamo . Here is other PR implementing mentioned RFC that works for us: #127228 .

Yes, we alse met this crash.

@shink shink marked this pull request as ready for review May 31, 2024 08:44
@shink
Copy link
Contributor Author

shink commented May 31, 2024

@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3

@@ -0,0 +1,2 @@
[torch.backends]
device_backend = backend_pkg:autoload
Copy link
Contributor Author

@shink shink May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, this file should be ignored, and the reason it is here is to declare the entry point of the backend package

try:
# just load the plugin without calling
backend.load()
except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to show failure information when an exception is encountered.



def is_device_backend_autoload_enabled() -> bool:
var = os.getenv("TORCH_DISABLE_DEVICE_BACKEND_AUTOLOAD")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getenv with default value shoule be better.


class TestAutoload(TestCase):
def test_load_plugins(self):
device_backend_path = os.path.abspath(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to think of this path operation as context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fxied, please take a look again

@bsochack
Copy link

@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3

Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here.

@shink
Copy link
Contributor Author

shink commented May 31, 2024

@jgong5 @bsochack @bkowalskiINTEL @FFFrog @hipudding Please have a look at this unit test. See b0e08d3

Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here.

Sure, thanks so much, and any suggestions are welcome.

def test_autoload(self):
# after importing the extension, the value of this environment variable should be true
torch.import_device_backends()
value = os.getenv("IS_CUSTOM_DEVICE_BACKEND_IMPORTED", "false")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides checking the environment variable, are you going to add test to autoload an extension (maybe a mock one)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. Please have a look at the code I just updated.

@cpuhrsch cpuhrsch requested a review from albanD June 3, 2024 18:10
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 3, 2024
@cpuhrsch cpuhrsch requested a review from ezyang June 3, 2024 18:10
@cpuhrsch cpuhrsch added the module: python frontend For issues relating to PyTorch's Python frontend label Jun 3, 2024
torch/__init__.py Outdated Show resolved Hide resolved
torch/__init__.py Outdated Show resolved Hide resolved
torch/__init__.py Outdated Show resolved Hide resolved
```bash
test/autoload/device_backend
├── README.md
├── backend_pkg # The backend package
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to reuse the existing modules we already have for testing in test/cpp_extension/setup.py to test this without having to create a brand new extension?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your review. I'm testing this idea now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please have a look at this commit 1e7cf76

Comment on lines 33 to 36
# Test the function defined in backend_pkg/__init__.py
import backend_pkg
self.assertTrue(hasattr(backend_pkg, "apply_patch"))
self.assertEqual(backend_pkg.apply_patch(), "success")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's testing a function defined in the backend extension

@shink shink requested a review from albanD June 5, 2024 10:04
@jgong5
Copy link
Collaborator

jgong5 commented Jun 7, 2024

@shink Seems there is still lint errors. You may repro with "lintrunner -a" locally.

Comment on lines 8 to 14
# When importing this package, set this environment variable to true
os.environ["IS_CUSTOM_DEVICE_BACKEND_IMPORTED"] = "true"


def _autoload():
# Do nothing in this entry point
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do things inside this entrypoint instead to make sure it is called?

@@ -367,5 +367,13 @@ def f(a: bool, b: bool):
self.assertIn("torch_library::logical_and", str(s.graph))


class TestDeviceBackendAutoload(common.TestCase):
def test_autoload(self):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand it correctly but when the cpp extension is explicitly imported from this test file, not autoloaded implicitly. Is that correct?

Copy link
Contributor Author

@shink shink Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for putting this test case here is that I want to reuse the existing modules in test/cpp_extension/setup.py.

The extension is installed here:

pytorch/test/run_test.py

Lines 667 to 671 in 7efaeb1

# Build the test cpp extensions modules
shell_env = os.environ.copy()
shell_env["USE_NINJA"] = str(1 if use_ninja else 0)
cmd = [sys.executable, "setup.py", "install", "--root", "./install"]
return_code = shell(cmd, cwd=cpp_extensions_test_dir, env=shell_env)

@shink shink requested a review from a team as a code owner June 7, 2024 12:25
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small things only!


class TestDeviceBackendAutoload(TestCase):
def test_autoload(self):
switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", None)
switch = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", False)

If you do this, you won't need the if switch below right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review, I will test this change.

return _test_autoload(test_directory, options, enable=False)


def _test_autoload(test_directory, options, enable=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @clee2000 for change to run_test.py

"""
# enabled by default
is_enable = os.getenv("TORCH_DEVICE_BACKEND_AUTOLOAD", "1")
return is_enable.strip().lower() in {"1", "true", "yes", "on", "y"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we usually do "0" and "1" only for these for simplicity.

@bsochack
Copy link

@albanD is this PR ready to merge?

test/test_autoload.py Outdated Show resolved Hide resolved
test/test_autoload.py Outdated Show resolved Hide resolved
"""
Whether autoloading out-of-the-tree device extensions is enabled.
The switch depends on the value of the environment variable
`TORCH_DEVICE_BACKEND_AUTOLOAD`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drisspg which .rst file should we add this new env variable to for proper documentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shink shink requested a review from albanD June 28, 2024 01:20
@bsochack
Copy link

bsochack commented Jul 2, 2024

@albanD any comments to this PR?

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit on moving the doc to another file, but SGTM otherwise!

@@ -26,3 +26,4 @@ If you find anything in this documentation that is missing, incorrect, or could
miscellaneous_environment_variables
logging
torch_nccl_environment_variables
privateuse1_environment_variables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this to the miscellaneous section? I do expect all users to be interested in this env variable, not just privateuse1 devs.

- Under some conditions, autograd threads can hang on shutdown, therefore we do not wait for them to shutdown indefinitely but rely on timeout that is default set to ``10`` seconds. This environment variable can be used to set the timeout in seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD Sure, it has been moved. Please take a look again. Thanks so much for your patient review.

@jgong5
Copy link
Collaborator

jgong5 commented Jul 9, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 9, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@jgong5 jgong5 added the release notes: python_frontend release notes category label Jul 9, 2024
@jgong5
Copy link
Collaborator

jgong5 commented Jul 9, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
Fixes pytorch#122468

- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`

run test:

```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author:  @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding

Co-authored-by: albanD <[email protected]>
Co-authored-by: Jiong Gong <[email protected]>
Pull Request resolved: pytorch#127074
Approved by: https://github.com/albanD, https://github.com/jgong5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: python frontend For issues relating to PyTorch's Python frontend open source release notes: python_frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Autoload Device Extension
10 participants