Skip to content

Commit

Permalink
[docs] Update getting-started-pytorch-lightning.rst (ray-project#38563)
Browse files Browse the repository at this point in the history
* Update getting-started-pytorch-lightning.rst

made some edits for concision

Signed-off-by: angelinalg <[email protected]>

* copy edits to pytorch-based guides

Signed-off-by: angelina <[email protected]>

* copy edits

Signed-off-by: angelina <[email protected]>

---------

Signed-off-by: angelinalg <[email protected]>
Signed-off-by: angelina <[email protected]>
Signed-off-by: angelina <[email protected]>
Co-authored-by: angelina <[email protected]>
Co-authored-by: angelina <[email protected]>
  • Loading branch information
3 people committed Aug 25, 2023
1 parent 5b6e14d commit adb4545
Show file tree
Hide file tree
Showing 3 changed files with 49 additions and 56 deletions.
38 changes: 18 additions & 20 deletions doc/source/train/getting-started-pytorch-lightning.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Getting Started with PyTorch Lightning
======================================

This tutorial will walk you through the process of converting an existing PyTorch Lightning script to use Ray Train.
This tutorial walks through the process of converting an existing PyTorch Lightning script to use Ray Train.

By the end of this, you will learn how to:
Learn how to:

1. Configure your Lightning Trainer so that it runs distributed with Ray and is placed on the correct CPU/GPU device.
2. Configure your training function to report metrics and save checkpoints.
Expand All @@ -15,7 +15,7 @@ By the end of this, you will learn how to:
Quickstart
----------

Before we begin, you can expect that the final code will look something like this:
For reference, the final code follows:

.. code-block:: python
Expand All @@ -29,11 +29,11 @@ Before we begin, you can expect that the final code will look something like thi
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
1. Your `train_func` will be the Python code that is executed on each distributed training worker.
2. Your `ScalingConfig` will define the number of distributed training workers and whether to use GPUs.
3. Your `TorchTrainer` will launch the distributed training job.
1. Your `train_func` is the Python code that is executed on each distributed training worker.
2. Your `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. Your `TorchTrainer` launches the distributed training job.

Let's compare a PyTorch Lightning training script with and without Ray Train.
Compare a PyTorch Lightning training script with and without Ray Train.

.. tabs::

Expand Down Expand Up @@ -147,23 +147,21 @@ Let's compare a PyTorch Lightning training script with and without Ray Train.
result = trainer.fit()
Now, let's get started!

Setting up your training function
---------------------------------

First, you'll want to update your training code to support distributed training.
You can begin by wrapping your code in a function:
First, update your training code to support distributed training.
Begin by wrapping your code in a function:

.. code-block:: python
def train_func(config):
# Your PyTorch Lightning training code here.
This function will be executed on each distributed training worker.
This function is executed on each distributed training worker.


Ray Train will set up your distributed process group on each worker. You only need to
Ray Train sets up your distributed process group on each worker. You only need to
make a few changes to your Lightning Trainer definition.

.. code-block:: diff
Expand Down Expand Up @@ -191,7 +189,7 @@ make a few changes to your Lightning Trainer definition.
trainer.fit(model, datamodule=datamodule)
We will now go over each change.
We now go over each change.

Configuring distributed strategy
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -275,7 +273,7 @@ GPUs by setting ``devices="auto"`` and ``acelerator="auto"``.
Reporting checkpoints and metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To persist your checkpoints and monitor training progress, simply add a
To persist your checkpoints and monitor training progress, add a
:class:`ray.train.lightning.RayTrainReportCallback` utility callback to your Trainer.


Expand Down Expand Up @@ -366,16 +364,16 @@ information about the training run, including the metrics and checkpoints report
Next steps
----------

Congratulations! You have successfully converted your PyTorch Lightningtraining script to use Ray Train.
After you have converted your PyTorch Lightningtraining script to use Ray Train:

* Head over to the :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* See :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* Browse the :ref:`Examples <train-examples>` for end-to-end examples of how to use Ray Train.
* Dive into the :ref:`API Reference <train-api>` for more details on the classes and methods used in this tutorial.

Version Compatibility
---------------------

Ray Train is tested with `pytorch_lightning` versions `1.6.5` and `2.0.4`. For full compatibility, we recommend using ``pytorch_lightning>=1.6.5`` .
Ray Train is tested with `pytorch_lightning` versions `1.6.5` and `2.0.4`. For full compatibility, use ``pytorch_lightning>=1.6.5`` .
Earlier versions are not prohibited but may result in unexpected issues. If you run into any compatibility issues, consider upgrading your PyTorch Lightning version or
`file an issue <https://github.com/ray-project/ray/issues>`_.

Expand All @@ -392,10 +390,10 @@ It then instantiates the model and trainer objects and runs a pre-defined
training loop in a black box.


This version of our LightningTrainer API was constraining and limited
This version of the LightningTrainer API was constraining and limited
the users' ability to manage the training functionality.

In Ray 2.7, we're pleased to introduce the newly unified :class:`~ray.train.torch.TorchTrainer` API, which offers
Ray 2.7 introduces the newly unified :class:`~ray.train.torch.TorchTrainer` API, which offers
enhanced transparency, flexibility, and simplicity. This API is more aligned
with standard PyTorch Lightning scripts, ensuring users have better
control over their native Lightning code.
Expand Down
33 changes: 15 additions & 18 deletions doc/source/train/getting-started-pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Getting Started with PyTorch
============================

This tutorial will walk you through the process of converting an existing PyTorch script to use Ray Train.
This tutorial walks through the process of converting an existing PyTorch script to use Ray Train.

By the end of this, you will learn how to:
Learn how to:

1. Configure your model so that it runs distributed and is placed on the correct CPU/GPU device.
2. Configure your dataloader so that it is sharded across the workers and place data on the correct CPU/GPU device.
Expand All @@ -16,7 +16,7 @@ By the end of this, you will learn how to:
Quickstart
----------

Before we begin, you can expect that the final code will look something like this:
For reference, the final code follows:

.. code-block:: python
Expand All @@ -30,11 +30,11 @@ Before we begin, you can expect that the final code will look something like thi
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
1. Your `train_func` will be the Python code that is executed on each distributed training worker.
2. Your `ScalingConfig` will define the number of distributed training workers and whether to use GPUs.
3. Your `TorchTrainer` will launch the distributed training job.
1. Your `train_func` is the Python code that is executed on each distributed training worker.
2. Your `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
3. Your `TorchTrainer` launches the distributed training job.

Let's compare a PyTorch training script with and without Ray Train.
Compare a PyTorch training script with and without Ray Train.

.. tabs::

Expand Down Expand Up @@ -131,21 +131,18 @@ Let's compare a PyTorch training script with and without Ray Train.
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
Now, let's get started!

Setting up your training function
---------------------------------

First, you'll want to update your training code to support distributed training.
First, update your training code to support distributed training.
You can begin by wrapping your code in a function:

.. code-block:: python
def train_func(config):
# Your PyTorch training code here.
This function will be executed on each distributed training worker.
This function is executed on each distributed training worker.

Setting up your model
^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -180,10 +177,10 @@ Setting up your dataset

.. TODO: Update this to use Ray Data.
Use the :func:`ray.train.torch.prepare_data_loader` utility function. This will:
Use the :func:`ray.train.torch.prepare_data_loader` utility function, which:

1. Add a ``DistributedSampler`` to your ``DataLoader``.
2. Move the batches to the right device.
1. Adds a ``DistributedSampler`` to your ``DataLoader``.
2. Moves the batches to the right device.

Note that this step is not necessary if you are passing in Ray Data to your Trainer
(see :ref:`data-ingest-torch`):
Expand Down Expand Up @@ -274,7 +271,7 @@ with a :class:`~ray.train.torch.TorchTrainer`.
Accessing training results
--------------------------

After training completes, a :class:`~ray.train.Result` object will be returned which contains
After training completes, a :class:`~ray.train.Result` object is returned which contains
information about the training run, including the metrics and checkpoints reported during training.

.. code-block:: python
Expand All @@ -289,8 +286,8 @@ information about the training run, including the metrics and checkpoints report
Next steps
----------

Congratulations! You have successfully converted your PyTorch training script to use Ray Train.
After you have converted your PyTorch training script to use Ray Train:

* Head over to the :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* See :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* Browse the :ref:`Examples <train-examples>` for end-to-end examples of how to use Ray Train.
* Dive into the :ref:`API Reference <train-api>` for more details on the classes and methods used in this tutorial.
34 changes: 16 additions & 18 deletions doc/source/train/getting-started-transformers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@
Getting Started with Hugging Face Transformers
==============================================

This tutorial will walk you through the process of converting an existing Hugging Face Transformers script to use Ray Train.
This tutorial walks through the process of converting an existing Hugging Face Transformers script to use Ray Train.

By the end of this, you will learn how to:
Learn how to:

1. Configure your training function to report metrics and save checkpoints.
2. Configure scale and CPU/GPU resource requirements for your training job.
Expand All @@ -14,7 +14,7 @@ By the end of this, you will learn how to:
Quickstart
----------

Before we begin, you can expect that the final code will look something like this:
For reference, the final code follows:

.. code-block:: python
Expand All @@ -28,11 +28,11 @@ Before we begin, you can expect that the final code will look something like thi
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
1. Your `train_func` will be the Python code that is executed on each distributed training worker.
2. Your :class:`~ray.train.ScalingConfig` will define the number of distributed training workers and computing resources (e.g. GPUs).
3. Your :class:`~ray.train.torch.TorchTrainer` will launch the distributed training job.
1. Your `train_func` is the Python code that is executed on each distributed training worker.
2. Your :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and computing resources (e.g. GPUs).
3. Your :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.

Let's compare a Hugging Face Transformers training script with and without Ray Train.
Compare a Hugging Face Transformers training script with and without Ray Train.

.. tabs::

Expand Down Expand Up @@ -171,20 +171,18 @@ Let's compare a Hugging Face Transformers training script with and without Ray T
ray_trainer.fit()
Now, let's get started!

Setting up your training function
---------------------------------

First, you'll want to update your training code to support distributed training.
First, update your training code to support distributed training.
You can begin by wrapping your code in a function:

.. code-block:: python
def train_func(config):
# Your Transformers training code here.
This function will be executed on each distributed training worker. Ray Train will set up the distributed
This function is executed on each distributed training worker. Ray Train will set up the distributed
process group on each worker before entering this function.

Please put all the logics into this function, including dataset construction and preprocessing,
Expand All @@ -193,14 +191,14 @@ model initialization, transformers trainer definition and more.
.. note::

If you are using Hugging Face Datasets or Evaluate, make sure to call ``datasets.load_dataset`` and ``evaluate.load``
inside the training function. We do not recommend passing the loaded datasets and metrics from outside of the training
inside the training function. Do not pass the loaded datasets and metrics from outside of the training
function, because it might cause serialization errors while transferring the objects to the workers.


Reporting checkpoints and metrics
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To persist your checkpoints and monitor training progress, simply add a
To persist your checkpoints and monitor training progress, add a
:class:`ray.train.huggingface.transformers.RayTrainReportCallback` utility callback to your Trainer.


Expand Down Expand Up @@ -270,12 +268,12 @@ with a :class:`~ray.train.torch.TorchTrainer`.
trainer = TorchTrainer(train_func, scaling_config=scaling_config)
result = trainer.fit()
Please also refer to :ref:`train-run-config` for more configuration options for `TorchTrainer`.
Refer to :ref:`train-run-config` for more configuration options for `TorchTrainer`.

Accessing training results
--------------------------

After training completes, a :class:`~ray.train.Result` object will be returned which contains
After training completes, a :class:`~ray.train.Result` object is returned which contains
information about the training run, including the metrics and checkpoints reported during training.

.. code-block:: python
Expand All @@ -290,9 +288,9 @@ information about the training run, including the metrics and checkpoints report
Next steps
----------

Congratulations! You have successfully converted your Hugging Face Transformers training script to use Ray Train.
After you have converted your Hugging Face Transformers training script to use Ray Train:

* Head over to the :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* See :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
* Browse the :ref:`Examples <train-examples>` for end-to-end examples of how to use Ray Train.
* Dive into the :ref:`API Reference <train-api>` for more details on the classes and methods used in this tutorial.

Expand All @@ -305,7 +303,7 @@ Congratulations! You have successfully converted your Hugging Face Transformers
The `TransformersTrainer` was added in Ray 2.1. It exposes a `trainer_init_per_worker` interface
to define `transformers.Trainer`, then runs a pre-defined training loop in a black box.

In Ray 2.7, we're pleased to introduce the newly unified :class:`~ray.train.torch.TorchTrainer` API,
Ray 2.7 introduces the newly unified :class:`~ray.train.torch.TorchTrainer` API,
which offers enhanced transparency, flexibility, and simplicity. This API is more aligned
with standard Hugging Face Transformers scripts, ensuring users have better control over their
native Transformers training code.
Expand Down

0 comments on commit adb4545

Please sign in to comment.