Add testing for PyTorch 2.4 (Trainer) #20010

awaelchli · 2024-06-24T18:13:24Z

What does this PR do?

Adds testing PyTorch 2.4 to the CI matrix.
Previous PR for reference: #19289

The main updates are:

PyTorch wants to flip the default of torch.load(..., weights_only=) to True in the future. In 2.4, they start raising a FutureWarning if you're not explicitly setting this argument to either True or False. This is the case in our code base. For checkpoint loading, we need to set weights_only=False for now because so far we allowed users to include arbitrary objects in checkpoints, and right now the user has no control over this flag internally in the Trainer. In the future, we might want to expose this: Expose weights_only option for loading checkpoints #20058

FSDP is deprecating the state-dict APIs and raises a warning:

FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .

Since it is a non-trivial effort to switch to the new ones and also provide backward-compatibility of older PyTorch versions, I will defer this work to future PRs (Use new state-dict APIs in FSDPStrategy #20060) and for now suppress this warnings (for the user and also in our tests).

The PyTorchProfiler is deprecating the use of the use_cuda argument. I updated the logic in our Profiler wrapper to avoid passing this argument, but kept it backward-compatible with older PyTorch versions.

cc @Borda @carmocca @justusschock @awaelchli

codecov · 2024-06-24T18:41:29Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91%. Comparing base (5829ef8) to head (455b57d).
Report is 65 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #20010   +/-   ##
=======================================
  Coverage      90%      91%           
=======================================
  Files         266      266           
  Lines       22942    22952   +10     
=======================================
+ Hits        20721    20780   +59     
+ Misses       2221     2172   -49

github-actions · 2024-07-08T09:40:24Z

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow

Check ID	Status
pl-cpu (macOS-13, lightning, 3.8, 2.1, oldest)	success	✅
pl-cpu (macOS-14, lightning, 3.10, 2.1)	success	✅
pl-cpu (macOS-14, lightning, 3.10, 2.2)	success	✅
pl-cpu (macOS-14, lightning, 3.10, 2.3)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.8, 2.1, oldest)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.1)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.2)	success	✅
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.3)	success	✅
pl-cpu (windows-2022, lightning, 3.8, 2.1, oldest)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.1)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.2)	success	✅
pl-cpu (windows-2022, lightning, 3.10, 2.3)	success	✅
pl-cpu (macOS-14, pytorch, 3.8, 2.1)	success	✅
pl-cpu (ubuntu-20.04, pytorch, 3.8, 2.1)	success	✅
pl-cpu (windows-2022, pytorch, 3.8, 2.1)	success	✅
pl-cpu (macOS-12, pytorch, 3.11, 2.1)	success	✅
pl-cpu (ubuntu-22.04, pytorch, 3.11, 2.1)	success	✅
pl-cpu (windows-2022, pytorch, 3.11, 2.1)	success	✅

These checks are required after the changes to .github/workflows/ci-tests-pytorch.yml, src/lightning/fabric/utilities/cloud_io.py, requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, tests/tests_pytorch/conftest.py, tests/tests_pytorch/models/test_torchscript.py, tests/tests_pytorch/plugins/precision/test_amp_integration.py, tests/tests_pytorch/profilers/test_profiler.py, tests/tests_pytorch/strategies/test_deepspeed.py, tests/tests_pytorch/strategies/test_fsdp.py, tests/tests_pytorch/strategies/test_model_parallel.py, tests/tests_pytorch/strategies/test_model_parallel_integration.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, tests/tests_pytorch/utilities/test_compile.py.

🟢 pytorch_lightning: Azure GPU

Check ID	Status
pytorch-lightning (GPUs) (testing Lightning \| latest)	success	✅
pytorch-lightning (GPUs) (testing PyTorch \| latest)	success	✅

These checks are required after the changes to .azure/gpu-tests-pytorch.yml, requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, tests/tests_pytorch/conftest.py, tests/tests_pytorch/models/test_torchscript.py, tests/tests_pytorch/plugins/precision/test_amp_integration.py, tests/tests_pytorch/profilers/test_profiler.py, tests/tests_pytorch/strategies/test_deepspeed.py, tests/tests_pytorch/strategies/test_fsdp.py, tests/tests_pytorch/strategies/test_model_parallel.py, tests/tests_pytorch/strategies/test_model_parallel_integration.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, tests/tests_pytorch/utilities/test_compile.py, src/lightning/fabric/utilities/cloud_io.py.

🟢 pytorch_lightning: Benchmarks

Check ID	Status
lightning.Benchmarks	success	✅

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/fabric/utilities/cloud_io.py, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py.

🟢 fabric: Docs

Check ID	Status
docs-make (fabric, doctest)	success	✅
docs-make (fabric, html)	success	✅

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py.

🟢 pytorch_lightning: Docs

Check ID	Status
docs-make (pytorch, doctest)	success	✅
docs-make (pytorch, html)	success	✅

These checks are required after the changes to src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, docs/source-pytorch/versioning.rst, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Docker

Check ID	Status
build-cuda (3.10, 2.1, 12.1.0)	success	✅
build-cuda (3.10, 2.2, 12.1.0)	success	✅
build-cuda (3.11, 2.1, 12.1.0)	success	✅
build-cuda (3.11, 2.2, 12.1.0)	success	✅
build-cuda (3.11, 2.3, 12.1.0)	success	✅
build-cuda (3.11, 2.4, 12.1.0)	success	✅
build-pl (3.10, 2.1, 12.1.0)	success	✅
build-pl (3.10, 2.2, 12.1.0)	success	✅
build-pl (3.11, 2.1, 12.1.0)	success	✅
build-pl (3.11, 2.2, 12.1.0)	success	✅
build-pl (3.11, 2.3, 12.1.0)	success	✅
build-pl (3.11, 2.4, 12.1.0)	success	✅

These checks are required after the changes to .github/workflows/docker-build.yml, dockers/base-cuda/Dockerfile, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 lightning_fabric: CPU workflow

Check ID	Status
fabric-cpu (macOS-13, lightning, 3.8, 2.1, oldest)	success	✅
fabric-cpu (macOS-14, lightning, 3.11, 2.1)	success	✅
fabric-cpu (macOS-14, lightning, 3.11, 2.2)	success	✅
fabric-cpu (macOS-14, lightning, 3.10, 2.3)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.8, 2.1, oldest)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.1)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.2)	success	✅
fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.3)	success	✅
fabric-cpu (windows-2022, lightning, 3.8, 2.1, oldest)	success	✅
fabric-cpu (windows-2022, lightning, 3.11, 2.1)	success	✅
fabric-cpu (windows-2022, lightning, 3.11, 2.2)	success	✅
fabric-cpu (windows-2022, lightning, 3.11, 2.3)	success	✅
fabric-cpu (macOS-14, fabric, 3.8, 2.1)	success	✅
fabric-cpu (ubuntu-20.04, fabric, 3.8, 2.1)	success	✅
fabric-cpu (windows-2022, fabric, 3.8, 2.1)	success	✅
fabric-cpu (macOS-12, fabric, 3.11, 2.1)	success	✅
fabric-cpu (ubuntu-22.04, fabric, 3.11, 2.1)	success	✅
fabric-cpu (windows-2022, fabric, 3.11, 2.1)	success	✅

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py.

🟢 lightning_fabric: Azure GPU

Check ID	Status
lightning-fabric (GPUs) (testing Fabric \| latest)	success	✅
lightning-fabric (GPUs) (testing Lightning \| latest)	success	✅

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py.

🟢 mypy

Check ID	Status
mypy	success	✅

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/fabric/utilities/cloud_io.py, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py.

🟢 install

Check ID	Status
install-pkg (ubuntu-22.04, fabric, 3.8)	success	✅
install-pkg (ubuntu-22.04, fabric, 3.11)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.8)	success	✅
install-pkg (ubuntu-22.04, pytorch, 3.11)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.8)	success	✅
install-pkg (ubuntu-22.04, lightning, 3.11)	success	✅
install-pkg (ubuntu-22.04, notset, 3.8)	success	✅
install-pkg (ubuntu-22.04, notset, 3.11)	success	✅
install-pkg (macOS-12, fabric, 3.8)	success	✅
install-pkg (macOS-12, fabric, 3.11)	success	✅
install-pkg (macOS-12, pytorch, 3.8)	success	✅
install-pkg (macOS-12, pytorch, 3.11)	success	✅
install-pkg (macOS-12, lightning, 3.8)	success	✅
install-pkg (macOS-12, lightning, 3.11)	success	✅
install-pkg (macOS-12, notset, 3.8)	success	✅
install-pkg (macOS-12, notset, 3.11)	success	✅
install-pkg (windows-2022, fabric, 3.8)	success	✅
install-pkg (windows-2022, fabric, 3.11)	success	✅
install-pkg (windows-2022, pytorch, 3.8)	success	✅
install-pkg (windows-2022, pytorch, 3.11)	success	✅
install-pkg (windows-2022, lightning, 3.8)	success	✅
install-pkg (windows-2022, lightning, 3.11)	success	✅
install-pkg (windows-2022, notset, 3.8)	success	✅
install-pkg (windows-2022, notset, 3.11)	success	✅

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

.azure/gpu-tests-pytorch.yml

.github/workflows/docker-build.yml

awaelchli added this to the 2.4 milestone Jun 24, 2024

github-actions bot added ci Continuous Integration fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package dependencies Pull requests that update a dependency file dockers labels Jun 24, 2024

awaelchli changed the title ~~Update CI to test with PyTorch 2.4~~ WIP: Update CI to test with PyTorch 2.4 Jun 25, 2024

awaelchli force-pushed the tests/pytorch-2.4 branch from e12ea4c to 90d61a3 Compare June 28, 2024 23:45

github-actions bot added the package label Jun 29, 2024

awaelchli force-pushed the tests/pytorch-2.4 branch 2 times, most recently from 95c4b36 to 65b86c5 Compare June 30, 2024 12:59

awaelchli changed the title ~~WIP: Update CI to test with PyTorch 2.4~~ Add testing for PyTorch 2.4 (Trainer) Jul 2, 2024

awaelchli force-pushed the tests/pytorch-2.4 branch 2 times, most recently from 808129e to 2b4f413 Compare July 5, 2024 22:41

github-actions bot added the docs Documentation related label Jul 6, 2024

awaelchli mentioned this pull request Jul 7, 2024

Set weights_only in tests to avoid warnings in PyTorch 2.4 #20057

Merged

Add testing for 2.4

ef7955e

awaelchli force-pushed the tests/pytorch-2.4 branch from 73f9bea to ef7955e Compare July 8, 2024 08:51

awaelchli mentioned this pull request Jul 8, 2024

Use new state-dict APIs in FSDPStrategy #20060

Open

revert draft changes

455b57d

awaelchli marked this pull request as ready for review July 8, 2024 09:39

awaelchli requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners July 8, 2024 09:39

Borda reviewed Jul 8, 2024

View reviewed changes

.azure/gpu-tests-pytorch.yml Show resolved Hide resolved

Borda reviewed Jul 8, 2024

View reviewed changes

.github/workflows/docker-build.yml Show resolved Hide resolved

Borda reviewed Jul 8, 2024

View reviewed changes

.github/workflows/docker-build.yml Show resolved Hide resolved

awaelchli requested a review from Borda July 9, 2024 08:44

lantiga approved these changes Jul 9, 2024

View reviewed changes

justusschock approved these changes Jul 11, 2024

View reviewed changes

mergify bot added the ready PRs ready to be merged label Jul 11, 2024

awaelchli merged commit bf25167 into master Jul 11, 2024
115 checks passed

awaelchli deleted the tests/pytorch-2.4 branch July 11, 2024 10:52

ryan597 mentioned this pull request Jul 12, 2024

Expose weights_only option for loading checkpoints #20058

Open

awaelchli mentioned this pull request Jul 13, 2024

Update PyTorch 2.4 tests #20079

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add testing for PyTorch 2.4 (Trainer) #20010

Add testing for PyTorch 2.4 (Trainer) #20010

awaelchli commented Jun 24, 2024 •

edited by github-actions bot

Loading

codecov bot commented Jun 24, 2024 •

edited

Loading

github-actions bot commented Jul 8, 2024 •

edited

Loading

Add testing for PyTorch 2.4 (Trainer) #20010

Add testing for PyTorch 2.4 (Trainer) #20010

Conversation

awaelchli commented Jun 24, 2024 • edited by github-actions bot Loading

What does this PR do?

codecov bot commented Jun 24, 2024 • edited Loading

Codecov Report

github-actions bot commented Jul 8, 2024 • edited Loading

⚡ Required checks status: All passing 🟢

Groups summary

awaelchli commented Jun 24, 2024 •

edited by github-actions bot

Loading

codecov bot commented Jun 24, 2024 •

edited

Loading

github-actions bot commented Jul 8, 2024 •

edited

Loading