Enable Intel xpu as a new backend of PyTorch-Lightning #16834

jingxu10 · 2023-02-21T20:55:59Z

Enable Intel xpu as a new backend of PyTorch-Lightning.
Contributed by [email protected] and [email protected].

Fixes #<issue_number>

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

[pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci change version comparison to base version number [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci enable bf16 for xpu, enable ccl [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci switch from deprecated set_default_tensor_type to set_default_dtype switch to info to print ipex and torch-ccl version number fix set_default_dtype incorrect argument error

fix crashes when ipex and/or torch-ccl are not installed fix crashes when ipex and/or torch-ccl are not installed fix crashes when ipex and/or torch-ccl are not installed

update docs update docs

lantiga · 2023-02-27T17:37:51Z

Hi @jingxu10 thank you for the contribution, lots of work went into that.

We are currently looking into making adding devices something you can do externally, so we keep Lightning core lean. This PR looks like a very good case we can look at to enable that kind of mechanism. This will likely happen after 2.0 lands.

So we can't merge the PR as is, but you're encouraged to reach out in our Slack at pytorch-lightning.slack.com so we can coordinate on how we can make it happen.

abhilash1910 · 2023-02-27T17:40:47Z

@lantiga Thanks for the info; when is the tentative plan for 2.0 release . I do see some refactorings which are in progress.

abhilash1910

@abhilash1910 to check deepspeed /ddp for WLs.

Any xpu.empty_caches should be placed in try blocks to ensure satefy.

abhilash1910 · 2023-02-26T06:45:55Z

src/lightning/pytorch/accelerators/xpu.py

+ return torch.xpu.memory_stats(device)
+
+ def teardown(self) -> None:
+ # clean up memory


try: torch.xpu.empty_cache() except AttributeError: pass

To ensure safe teardowns

abhilash1910 · 2023-02-26T06:46:54Z

src/lightning/pytorch/accelerators/xpu.py

+ def setup(self, trainer: "pl.Trainer") -> None:
+ # TODO refactor input from trainer to local_rank @four4fish
+ # self.set_intel_flags(trainer.local_rank)
+ # clear cache before training


try: torch.xpu.empty_cache() except AttributeError: pass

To ensure safe teardowns

- After some sanity checks multiprocessing does not appear to work well - Need to incorporate separate function for xpu forking.

Multiprocessing issue with XPU backend

Borda · 2023-03-18T17:33:02Z

Will be ported to separate repo, stay tuned! 🐿️

stale · 2023-04-02T07:58:05Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need further help see our docs: https://lightning.ai/docs/pytorch/latest/generated/CONTRIBUTING.html#pull-request or ask the assistance of a core contributor here or on Discord. Thank you for your contributions.

stale · 2023-04-13T21:23:50Z

This pull request is going to be closed. Please feel free to reopen it or create a new one based on top of the 'master' branch.

Borda · 2023-04-13T22:25:47Z

will be implemented as standalone extension

jingxu10 requested review from awaelchli, carmocca, justusschock and williamFalcon as code owners February 21, 2023 20:56

github-actions bot added fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Feb 21, 2023

jingxu10 force-pushed the jingxu10/ipex branch from b82fadb to fb64ac4 Compare February 23, 2023 11:50

mergify bot added the has conflicts label Feb 23, 2023

jingxu10 force-pushed the jingxu10/ipex branch 2 times, most recently from eae506e to 6dad88e Compare February 23, 2023 20:25

mergify bot removed the has conflicts label Feb 23, 2023

jingxu10 force-pushed the jingxu10/ipex branch from 3259844 to bbc5c40 Compare February 24, 2023 06:21

jingxu10 requested review from tchaton, lantiga, hhsecond and ethanwharris as code owners February 25, 2023 05:29

github-actions bot added the app (removed) Generic label for Lightning App package label Feb 25, 2023

jingxu10 force-pushed the jingxu10/ipex branch 4 times, most recently from 8004d6f to eb2023a Compare February 25, 2023 12:08

jingxu10 requested review from edenlightning and Borda as code owners February 25, 2023 21:18

jingxu10 force-pushed the jingxu10/ipex branch 2 times, most recently from ffe2f1b to db300a9 Compare February 25, 2023 23:06

jingxu10 marked this pull request as draft February 25, 2023 23:28

jingxu10 changed the title ~~WIP: enable Intel xpu as a new backend of PyTorch-Lightning~~ enable Intel xpu as a new backend of PyTorch-Lightning Feb 25, 2023

weishi-deng and others added 3 commits February 26, 2023 10:57

add ut

47aa7b5

add xpu backend

138f006

jingxu10 added 2 commits February 26, 2023 11:00

fix crashes when ipex and/or torch-ccl are not installed

96a4608

fix crashes when ipex and/or torch-ccl are not installed fix crashes when ipex and/or torch-ccl are not installed fix crashes when ipex and/or torch-ccl are not installed

update docs

e2d1030

update docs update docs

jingxu10 force-pushed the jingxu10/ipex branch from db300a9 to e2d1030 Compare February 26, 2023 02:06

jingxu10 changed the title ~~enable Intel xpu as a new backend of PyTorch-Lightning~~ Enable Intel xpu as a new backend of PyTorch-Lightning Feb 26, 2023

jingxu10 force-pushed the jingxu10/ipex branch from 0be1fce to e2d1030 Compare February 26, 2023 21:10

abhilash1910 suggested changes Feb 28, 2023

View reviewed changes

abhilash1910 and others added 3 commits March 3, 2023 12:15

Multiprocessing issue with XPU backend

8374886

- After some sanity checks multiprocessing does not appear to work well - Need to incorporate separate function for xpu forking.

Call xpu fork correctly

8ed0ed1

Merge pull request #1 from abhilash1910/patch-1

156857e

Multiprocessing issue with XPU backend

stale bot added the won't fix This will not be worked on label Mar 18, 2023

Lightning-AI deleted a comment from stale bot Mar 18, 2023

stale bot removed the won't fix This will not be worked on label Mar 18, 2023

stale bot added the won't fix This will not be worked on label Apr 2, 2023

stale bot closed this Apr 13, 2023

jingxu10 mentioned this pull request May 26, 2023

Enable Intel xpu as a new backend of PyTorch-Lightning #17700

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable Intel xpu as a new backend of PyTorch-Lightning #16834

Enable Intel xpu as a new backend of PyTorch-Lightning #16834

jingxu10 commented Feb 21, 2023 •

edited

Loading

lantiga commented Feb 27, 2023

abhilash1910 commented Feb 27, 2023

abhilash1910 left a comment

abhilash1910 Feb 26, 2023

abhilash1910 Feb 26, 2023

Borda commented Mar 18, 2023

stale bot commented Apr 2, 2023

stale bot commented Apr 13, 2023

Borda commented Apr 13, 2023

Enable Intel xpu as a new backend of PyTorch-Lightning #16834

Enable Intel xpu as a new backend of PyTorch-Lightning #16834

Conversation

jingxu10 commented Feb 21, 2023 • edited Loading

PR review

lantiga commented Feb 27, 2023

abhilash1910 commented Feb 27, 2023

abhilash1910 left a comment

Choose a reason for hiding this comment

abhilash1910 Feb 26, 2023

Choose a reason for hiding this comment

abhilash1910 Feb 26, 2023

Choose a reason for hiding this comment

Borda commented Mar 18, 2023

stale bot commented Apr 2, 2023

stale bot commented Apr 13, 2023

Borda commented Apr 13, 2023

jingxu10 commented Feb 21, 2023 •

edited

Loading