Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add testing for PyTorch 2.4 (Trainer) #20010

Merged
merged 2 commits into from
Jul 11, 2024
Merged

Add testing for PyTorch 2.4 (Trainer) #20010

merged 2 commits into from
Jul 11, 2024

Conversation

awaelchli
Copy link
Contributor

@awaelchli awaelchli commented Jun 24, 2024

What does this PR do?

Adds testing PyTorch 2.4 to the CI matrix.
Previous PR for reference: #19289

The main updates are:

  1. PyTorch wants to flip the default of torch.load(..., weights_only=) to True in the future. In 2.4, they start raising a FutureWarning if you're not explicitly setting this argument to either True or False. This is the case in our code base. For checkpoint loading, we need to set weights_only=False for now because so far we allowed users to include arbitrary objects in checkpoints, and right now the user has no control over this flag internally in the Trainer. In the future, we might want to expose this: Expose weights_only option for loading checkpoints #20058

  2. FSDP is deprecating the state-dict APIs and raises a warning:

    FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
    

    Since it is a non-trivial effort to switch to the new ones and also provide backward-compatibility of older PyTorch versions, I will defer this work to future PRs (Use new state-dict APIs in FSDPStrategy #20060) and for now suppress this warnings (for the user and also in our tests).

  3. The PyTorchProfiler is deprecating the use of the use_cuda argument. I updated the logic in our Profiler wrapper to avoid passing this argument, but kept it backward-compatible with older PyTorch versions.

cc @Borda @carmocca @justusschock @awaelchli

@awaelchli awaelchli added this to the 2.4 milestone Jun 24, 2024
@github-actions github-actions bot added ci Continuous Integration fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package dependencies Pull requests that update a dependency file dockers labels Jun 24, 2024
Copy link

codecov bot commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91%. Comparing base (5829ef8) to head (455b57d).
Report is 65 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #20010   +/-   ##
=======================================
  Coverage      90%      91%           
=======================================
  Files         266      266           
  Lines       22942    22952   +10     
=======================================
+ Hits        20721    20780   +59     
+ Misses       2221     2172   -49     

@awaelchli awaelchli changed the title Update CI to test with PyTorch 2.4 WIP: Update CI to test with PyTorch 2.4 Jun 25, 2024
@awaelchli awaelchli force-pushed the tests/pytorch-2.4 branch 2 times, most recently from 95c4b36 to 65b86c5 Compare June 30, 2024 12:59
@awaelchli awaelchli changed the title WIP: Update CI to test with PyTorch 2.4 Add testing for PyTorch 2.4 (Trainer) Jul 2, 2024
@awaelchli awaelchli force-pushed the tests/pytorch-2.4 branch 2 times, most recently from 808129e to 2b4f413 Compare July 5, 2024 22:41
@github-actions github-actions bot added the docs Documentation related label Jul 6, 2024
@awaelchli awaelchli marked this pull request as ready for review July 8, 2024 09:39
Copy link
Contributor

github-actions bot commented Jul 8, 2024

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-13, lightning, 3.8, 2.1, oldest) success
pl-cpu (macOS-14, lightning, 3.10, 2.1) success
pl-cpu (macOS-14, lightning, 3.10, 2.2) success
pl-cpu (macOS-14, lightning, 3.10, 2.3) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 2.1, oldest) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.1) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.2) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.3) success
pl-cpu (windows-2022, lightning, 3.8, 2.1, oldest) success
pl-cpu (windows-2022, lightning, 3.10, 2.1) success
pl-cpu (windows-2022, lightning, 3.10, 2.2) success
pl-cpu (windows-2022, lightning, 3.10, 2.3) success
pl-cpu (macOS-14, pytorch, 3.8, 2.1) success
pl-cpu (ubuntu-20.04, pytorch, 3.8, 2.1) success
pl-cpu (windows-2022, pytorch, 3.8, 2.1) success
pl-cpu (macOS-12, pytorch, 3.11, 2.1) success
pl-cpu (ubuntu-22.04, pytorch, 3.11, 2.1) success
pl-cpu (windows-2022, pytorch, 3.11, 2.1) success

These checks are required after the changes to .github/workflows/ci-tests-pytorch.yml, src/lightning/fabric/utilities/cloud_io.py, requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, tests/tests_pytorch/conftest.py, tests/tests_pytorch/models/test_torchscript.py, tests/tests_pytorch/plugins/precision/test_amp_integration.py, tests/tests_pytorch/profilers/test_profiler.py, tests/tests_pytorch/strategies/test_deepspeed.py, tests/tests_pytorch/strategies/test_fsdp.py, tests/tests_pytorch/strategies/test_model_parallel.py, tests/tests_pytorch/strategies/test_model_parallel_integration.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, tests/tests_pytorch/utilities/test_compile.py.

🟢 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) (testing Lightning | latest) success
pytorch-lightning (GPUs) (testing PyTorch | latest) success

These checks are required after the changes to .azure/gpu-tests-pytorch.yml, requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, tests/tests_pytorch/conftest.py, tests/tests_pytorch/models/test_torchscript.py, tests/tests_pytorch/plugins/precision/test_amp_integration.py, tests/tests_pytorch/profilers/test_profiler.py, tests/tests_pytorch/strategies/test_deepspeed.py, tests/tests_pytorch/strategies/test_fsdp.py, tests/tests_pytorch/strategies/test_model_parallel.py, tests/tests_pytorch/strategies/test_model_parallel_integration.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, tests/tests_pytorch/utilities/test_compile.py, src/lightning/fabric/utilities/cloud_io.py.

🟢 pytorch_lightning: Benchmarks
Check ID Status
lightning.Benchmarks success

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/fabric/utilities/cloud_io.py, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py.

🟢 fabric: Docs
Check ID Status
docs-make (fabric, doctest) success
docs-make (fabric, html) success

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py.

🟢 pytorch_lightning: Docs
Check ID Status
docs-make (pytorch, doctest) success
docs-make (pytorch, html) success

These checks are required after the changes to src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, docs/source-pytorch/versioning.rst, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 pytorch_lightning: Docker
Check ID Status
build-cuda (3.10, 2.1, 12.1.0) success
build-cuda (3.10, 2.2, 12.1.0) success
build-cuda (3.11, 2.1, 12.1.0) success
build-cuda (3.11, 2.2, 12.1.0) success
build-cuda (3.11, 2.3, 12.1.0) success
build-cuda (3.11, 2.4, 12.1.0) success
build-pl (3.10, 2.1, 12.1.0) success
build-pl (3.10, 2.2, 12.1.0) success
build-pl (3.11, 2.1, 12.1.0) success
build-pl (3.11, 2.2, 12.1.0) success
build-pl (3.11, 2.3, 12.1.0) success
build-pl (3.11, 2.4, 12.1.0) success

These checks are required after the changes to .github/workflows/docker-build.yml, dockers/base-cuda/Dockerfile, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.

🟢 lightning_fabric: CPU workflow
Check ID Status
fabric-cpu (macOS-13, lightning, 3.8, 2.1, oldest) success
fabric-cpu (macOS-14, lightning, 3.11, 2.1) success
fabric-cpu (macOS-14, lightning, 3.11, 2.2) success
fabric-cpu (macOS-14, lightning, 3.10, 2.3) success
fabric-cpu (ubuntu-20.04, lightning, 3.8, 2.1, oldest) success
fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.1) success
fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.2) success
fabric-cpu (ubuntu-20.04, lightning, 3.11, 2.3) success
fabric-cpu (windows-2022, lightning, 3.8, 2.1, oldest) success
fabric-cpu (windows-2022, lightning, 3.11, 2.1) success
fabric-cpu (windows-2022, lightning, 3.11, 2.2) success
fabric-cpu (windows-2022, lightning, 3.11, 2.3) success
fabric-cpu (macOS-14, fabric, 3.8, 2.1) success
fabric-cpu (ubuntu-20.04, fabric, 3.8, 2.1) success
fabric-cpu (windows-2022, fabric, 3.8, 2.1) success
fabric-cpu (macOS-12, fabric, 3.11, 2.1) success
fabric-cpu (ubuntu-22.04, fabric, 3.11, 2.1) success
fabric-cpu (windows-2022, fabric, 3.11, 2.1) success

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py.

🟢 lightning_fabric: Azure GPU
Check ID Status
lightning-fabric (GPUs) (testing Fabric | latest) success
lightning-fabric (GPUs) (testing Lightning | latest) success

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to requirements/pytorch/base.txt, requirements/pytorch/examples.txt, src/lightning/fabric/utilities/cloud_io.py, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, fabric, 3.8) success
install-pkg (ubuntu-22.04, fabric, 3.11) success
install-pkg (ubuntu-22.04, pytorch, 3.8) success
install-pkg (ubuntu-22.04, pytorch, 3.11) success
install-pkg (ubuntu-22.04, lightning, 3.8) success
install-pkg (ubuntu-22.04, lightning, 3.11) success
install-pkg (ubuntu-22.04, notset, 3.8) success
install-pkg (ubuntu-22.04, notset, 3.11) success
install-pkg (macOS-12, fabric, 3.8) success
install-pkg (macOS-12, fabric, 3.11) success
install-pkg (macOS-12, pytorch, 3.8) success
install-pkg (macOS-12, pytorch, 3.11) success
install-pkg (macOS-12, lightning, 3.8) success
install-pkg (macOS-12, lightning, 3.11) success
install-pkg (macOS-12, notset, 3.8) success
install-pkg (macOS-12, notset, 3.11) success
install-pkg (windows-2022, fabric, 3.8) success
install-pkg (windows-2022, fabric, 3.11) success
install-pkg (windows-2022, pytorch, 3.8) success
install-pkg (windows-2022, pytorch, 3.11) success
install-pkg (windows-2022, lightning, 3.8) success
install-pkg (windows-2022, lightning, 3.11) success
install-pkg (windows-2022, notset, 3.8) success
install-pkg (windows-2022, notset, 3.11) success

These checks are required after the changes to src/lightning/fabric/utilities/cloud_io.py, src/lightning/pytorch/plugins/precision/amp.py, src/lightning/pytorch/profilers/pytorch.py, src/lightning/pytorch/strategies/launchers/multiprocessing.py, src/lightning/pytorch/strategies/model_parallel.py, requirements/pytorch/base.txt, requirements/pytorch/examples.txt.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@awaelchli awaelchli requested a review from Borda July 9, 2024 08:44
@mergify mergify bot added the ready PRs ready to be merged label Jul 11, 2024
@awaelchli awaelchli merged commit bf25167 into master Jul 11, 2024
115 checks passed
@awaelchli awaelchli deleted the tests/pytorch-2.4 branch July 11, 2024 10:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continuous Integration dependencies Pull requests that update a dependency file dockers docs Documentation related fabric lightning.fabric.Fabric package pl Generic label for PyTorch Lightning package ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants