-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Intel xpu as a new backend of PyTorch-Lightning #16834
Conversation
b82fadb
to
fb64ac4
Compare
eae506e
to
6dad88e
Compare
3259844
to
bbc5c40
Compare
8004d6f
to
eb2023a
Compare
ffe2f1b
to
db300a9
Compare
[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci change version comparison to base version number [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci enable bf16 for xpu, enable ccl [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci switch from deprecated set_default_tensor_type to set_default_dtype switch to info to print ipex and torch-ccl version number fix set_default_dtype incorrect argument error
fix crashes when ipex and/or torch-ccl are not installed fix crashes when ipex and/or torch-ccl are not installed fix crashes when ipex and/or torch-ccl are not installed
update docs update docs
db300a9
to
e2d1030
Compare
0be1fce
to
e2d1030
Compare
Hi @jingxu10 thank you for the contribution, lots of work went into that. We are currently looking into making adding devices something you can do externally, so we keep Lightning core lean. This PR looks like a very good case we can look at to enable that kind of mechanism. This will likely happen after 2.0 lands. So we can't merge the PR as is, but you're encouraged to reach out in our Slack at pytorch-lightning.slack.com so we can coordinate on how we can make it happen. |
@lantiga Thanks for the info; when is the tentative plan for 2.0 release . I do see some refactorings which are in progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abhilash1910 to check deepspeed /ddp for WLs.
Any xpu.empty_caches should be placed in try blocks to ensure satefy.
return torch.xpu.memory_stats(device) | ||
|
||
def teardown(self) -> None: | ||
# clean up memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try:
torch.xpu.empty_cache()
except AttributeError:
pass
To ensure safe teardowns
def setup(self, trainer: "pl.Trainer") -> None: | ||
# TODO refactor input from trainer to local_rank @four4fish | ||
# self.set_intel_flags(trainer.local_rank) | ||
# clear cache before training |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try:
torch.xpu.empty_cache()
except AttributeError:
pass
To ensure safe teardowns
- After some sanity checks multiprocessing does not appear to work well - Need to incorporate separate function for xpu forking.
Multiprocessing issue with XPU backend
Will be ported to separate repo, stay tuned! 🐿️ |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions. |
This pull request is going to be closed. Please feel free to reopen it or create a new one based on top of the 'master' branch. |
will be implemented as standalone extension |
Enable Intel xpu as a new backend of PyTorch-Lightning.
Contributed by [email protected] and [email protected].
Fixes #<issue_number>
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist