From a4ad14f84a4aa21d2be09413d71c30b8955bba32 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Thu, 7 Sep 2023 09:59:18 -0700
Subject: [PATCH 01/13] copy editing lightning-mnist and torch-fashion-mnist
 examples

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 .../lightning/lightning_mnist_example.ipynb        | 12 ++++++------
 .../pytorch/torch_fashion_mnist_example.py         | 14 +++++++-------
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
index 850116cf01600..546aca4f0aed8 100644
--- a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
+++ b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
@@ -9,7 +9,7 @@
                 "\n",
                 "# Train a Pytorch Lightning Image Classifier\n",
                 "\n",
-                "This example introduces how to train a Pytorch Lightning Module using Ray Train {class}`TorchTrainer <ray.train.torch.TorchTrainer>`. We will demonstrate how to train a basic neural network on the MNIST dataset with distributed data parallelism.\n"
+                "This example introduces how to train a Pytorch Lightning Module using Ray Train {class}`TorchTrainer <ray.train.torch.TorchTrainer>`. It demonstrates how to train a basic neural network on the MNIST dataset with distributed data parallelism.\n"
             ]
         },
         {
@@ -49,9 +49,9 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "## Prepare Dataset and Module\n",
+                "## Prepare a dataset and module\n",
                 "\n",
-                "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can keep using them without any changes for the Ray AIR LightningTrainer. "
+                "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can continue using them without any changes for the Ray Train LightningTrainer. "
             ]
         },
         {
@@ -183,15 +183,15 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "## Define the Training Loop\n",
+                "## Define the training loop\n",
                 "\n",
-                "Here we define a training loop for each worker. Compare with the original PyTorch Lightning code, there are 3 main differences:\n",
+                "This code defines a training loop for each worker. Comparing the training loop with the original PyTorch Lightning code, there are 3 main differences:\n",
                 "\n",
                 "- Distributed strategy: Use {class}`RayDDPStrategy <ray.train.lightning.RayDDPStrategy>`.\n",
                 "- Cluster environment: Use {class}`RayLightningEnvironment <ray.train.lightning.RayLightningEnvironment>`.\n",
                 "- Parallel devices: Always sets to `devices=\"auto\"` to use all available devices configured by ``TorchTrainer``.\n",
                 "\n",
-                "Please refer to {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>`.\n",
+                "See {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>` for more information.\n",
                 "\n",
                 "\n",
                 "For checkpoint reportining, Ray Train provides a minimal {class}`RayTrainReportCallback <ray.train.lightning.RayTrainReportCallback>` that reports metrics and checkpoint on each train epoch end. For more complex checkpoint logic, please implement custom callbacks as described in {ref}`Saving and Loading Checkpoint <train-checkpointing>` user guide."
diff --git a/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py b/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
index d5aba832806f3..b6db1451216d9 100644
--- a/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
+++ b/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
@@ -19,7 +19,7 @@ def get_dataloaders(batch_size):
     transform = transforms.Compose([ToTensor(), Normalize((0.5,), (0.5,))])
 
     with FileLock(os.path.expanduser("~/data.lock")):
-        # Download training data from open datasets.
+        # Download training data from open datasets
         training_data = datasets.FashionMNIST(
             root="~/data",
             train=True,
@@ -27,7 +27,7 @@ def get_dataloaders(batch_size):
             transform=transform,
         )
 
-        # Download test data from open datasets.
+        # Download test data from open datasets
         test_data = datasets.FashionMNIST(
             root="~/data",
             train=False,
@@ -35,7 +35,7 @@ def get_dataloaders(batch_size):
             transform=transform,
         )
 
-    # Create data loaders.
+    # Create data loaders
     train_dataloader = DataLoader(training_data, batch_size=batch_size)
     test_dataloader = DataLoader(test_data, batch_size=batch_size)
 
@@ -69,7 +69,7 @@ def train_func_per_worker(config: Dict):
     epochs = config["epochs"]
     batch_size = config["batch_size_per_worker"]
 
-    # Get dataloaders inside worker training function
+    # Get dataloaders inside the worker training function
     train_dataloader, test_dataloader = get_dataloaders(batch_size=batch_size)
 
     # [1] Prepare Dataloader for distributed training
@@ -81,7 +81,7 @@ def train_func_per_worker(config: Dict):
     model = NeuralNetwork()
 
     # [2] Prepare and wrap your model with DistributedDataParallel
-    # Move the model the correct GPU/CPU device
+    # Move the model to the correct GPU/CPU device
     # ============================================================
     model = ray.train.torch.prepare_model(model)
 
@@ -137,9 +137,9 @@ def train_fashion_mnist(num_workers=2, use_gpu=False):
         scaling_config=scaling_config,
     )
 
-    # [4] Start Distributed Training
+    # [4] Start distributed training
     # Run `train_func_per_worker` on all workers
-    # =============================================
+    # ==========================================
     result = trainer.fit()
     print(f"Training result: {result}")
 

From 6991189dbe34f827c6884342e36ad1da5cc65534 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Mon, 11 Sep 2023 09:57:09 -0700
Subject: [PATCH 02/13] rebase 2

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 .../train/examples/lightning/lightning_mnist_example.ipynb    | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
index c5b57c6df26a8..f721884879873 100644
--- a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
+++ b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
@@ -51,11 +51,7 @@
             "source": [
                 "## Prepare a dataset and module\n",
                 "\n",
-<<<<<<< HEAD
-                "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can continue using them without any changes for the Ray Train LightningTrainer. "
-=======
                 "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can keep using them without any changes with Ray Train. "
->>>>>>> master
             ]
         },
         {

From 9d793abd107358a221466ea2050c8c593874b6f9 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Mon, 11 Sep 2023 16:49:13 -0700
Subject: [PATCH 03/13] polish examples: make titles more consistent, add links
 to guides

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 .github/styles/Vocab/Train/accept.txt         |   1 +
 doc/source/train/examples.rst                 |  24 ++--
 .../accelerate/accelerate_example.rst         |  21 +++-
 .../lightning/lightning_mnist_example.ipynb   |  33 +++--
 .../pytorch/dreambooth_finetuning.rst         | 119 +++++++++---------
 .../pytorch/torch_fashion_mnist_example.rst   |  16 ++-
 .../transformers_torch_trainer_basic.rst      |  18 ++-
 .../getting-started-pytorch-lightning.rst     |  12 +-
 doc/source/train/getting-started-pytorch.rst  |  26 ++--
 doc/source/train/huggingface-accelerate.rst   |  12 +-
 .../accelerate/accelerate_torch_trainer.py    |   4 +-
 .../transformers_torch_trainer_basic.py       |  10 +-
 python/ray/train/torch/torch_trainer.py       |   4 +-
 13 files changed, 182 insertions(+), 118 deletions(-)

diff --git a/.github/styles/Vocab/Train/accept.txt b/.github/styles/Vocab/Train/accept.txt
index d832f7f80e7ce..d0c7e09aaea0c 100644
--- a/.github/styles/Vocab/Train/accept.txt
+++ b/.github/styles/Vocab/Train/accept.txt
@@ -1,5 +1,6 @@
 Horovod
 Hugging Face
+hyperparameters?
 Keras
 LightGBM
 PyTorch
diff --git a/doc/source/train/examples.rst b/doc/source/train/examples.rst
index 3b2cf585618ce..ac1252e92e9e5 100644
--- a/doc/source/train/examples.rst
+++ b/doc/source/train/examples.rst
@@ -3,7 +3,7 @@
 Ray Train Examples
 ==================
 
-.. Example .rst files should be organized in the same manner as the
+.. Organize example .rst files in the same manner as the
    .py files in ray/python/ray/train/examples.
 
 Below are examples for using Ray Train with a variety of frameworks and use cases.
@@ -18,17 +18,17 @@ Beginner
   * - Framework
     - Example
   * - PyTorch
-    - :ref:`Training an Fashion MNIST Image Classifier with PyTorch <torch_fashion_mnist_ex>`
+    - :ref:`Train a Fashion MNIST Image Classifier with PyTorch <torch_fashion_mnist_ex>`
   * - Lightning
-    - :ref:`Training an MNIST Image Classifier with Lightning <lightning_mnist_example>`
+    - :ref:`Train an MNIST Image Classifier with Lightning <lightning_mnist_example>`
   * - Transformers
-    - :ref:`Fine-tuning a Text Classifier on Yelp Reviews Dataset with HF Transformers <transformers_torch_trainer_basic_example>`
+    - :ref:`Fine-tune a Text Classifier on the Yelp Reviews Dataset with HF Transformers <transformers_torch_trainer_basic_example>`
   * - Accelerate
     - :ref:`Distributed Data Parallel Training with HF Accelerate <accelerate_example>`
   * - DeepSpeed
-    - :ref:`Distributed Training with DeepSpeed ZeRO-3 <deepspeed_example>`
+    - :ref:`Train with DeepSpeed ZeRO-3 <deepspeed_example>`
   * - TensorFlow
-    - :ref:`TensorFlow MNIST Training Example <tensorflow_mnist_example>`
+    - :ref:`Train with TensorFlow MNIST <tensorflow_mnist_example>`
   * - Horovod
     - :ref:`End-to-end Horovod Training Example <horovod_example>`
 
@@ -42,11 +42,11 @@ Intermediate
   * - Framework
     - Example
   * - PyTorch
-    - `DreamBooth fine-tuning of Stable Diffusion with Ray Train <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning>`_
+    - :ref:`Fine-tune of Stable Diffusion with DreamBooth and Ray Train <torch_finetune_dreambooth_ex>`
   * - Lightning
     - :ref:`Model Training with PyTorch Lightning and Ray Data <lightning_advanced_example>`
   * - Accelerate
-    - :ref:`Fine-tuning a Text Classifier on GLUE Benchmark with HF Accelerate. <train_transformers_accelerate_example>`
+    - :ref:`Fine-tune a text classifier on GLUE Benchmark with HF Accelerate <train_transformers_accelerate_example>`
 
 
 Advanced
@@ -59,10 +59,10 @@ Advanced
   * - Framework
     - Example
   * - Accelerate, DeepSpeed
-    - `Fine-tuning Llama-2 series models with Deepspeed, Accelerate, and Ray Train TorchTrainer <https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed>`_
+    - `Fine-tune Llama-2 series models with Deepspeed, Accelerate, and Ray Train TorchTrainer <https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed>`_
   * - Transformers, DeepSpeed
-    - :ref:`Fine-tuning GPT-J-6B with Ray Train and DeepSpeed <gptj_deepspeed_finetune>`
+    - :ref:`Fine-tune GPT-J-6B with Ray Train and DeepSpeed <gptj_deepspeed_finetune>`
   * - Lightning, DeepSpeed
-    - :ref:`Fine-tuning vicuna-13b with PyTorch Lightning and DeepSpeed <vicuna_lightning_deepspeed_finetuning>`
+    - :ref:`Fine-tune vicuna-13b with PyTorch Lightning and DeepSpeed <vicuna_lightning_deepspeed_finetuning>`
   * - Lightning
-    - :ref:`Fine-tuning dolly-v2-7b with PyTorch Lightning and FSDP <dolly_lightning_fsdp_finetuning>`
+    - :ref:`Fine-tune dolly-v2-7b with PyTorch Lightning and FSDP <dolly_lightning_fsdp_finetuning>`
diff --git a/doc/source/train/examples/accelerate/accelerate_example.rst b/doc/source/train/examples/accelerate/accelerate_example.rst
index 6205add5ac48a..e082bf11f2a30 100644
--- a/doc/source/train/examples/accelerate/accelerate_example.rst
+++ b/doc/source/train/examples/accelerate/accelerate_example.rst
@@ -2,7 +2,24 @@
 
 .. _accelerate_example:
 
-Hugging Face Accelerate Distributed Training Example with Ray Train
-===================================================================
+Distributed Training Example with Hugging Face Accelerate
+=========================================================
+
+This example does distributed data parallel training
+with Hugging Face (HF) Accelerate, Ray Train, and Ray Data.
+It fine-tunes a BERT model and is adapted from
+https://github.com/huggingface/accelerate/blob/main/examples/nlp_example.py
+
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/accelerate/accelerate_torch_trainer.py
+
+See also
+--------
+
+For a tutorial on using Ray Train and HF Accelerate, 
+see :ref:`Training with Hugging Face Accelerate <train-hf-accelerate>`.
+
+For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
diff --git a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
index f721884879873..6686b958f9827 100644
--- a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
+++ b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
@@ -51,7 +51,7 @@
             "source": [
                 "## Prepare a dataset and module\n",
                 "\n",
-                "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can keep using them without any changes with Ray Train. "
+                "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can continue using them without any changes with Ray Train. "
             ]
         },
         {
@@ -75,7 +75,7 @@
                 "                self.data_dir, train=True, download=True, transform=self.transform\n",
                 "            )\n",
                 "\n",
-                "            # split data into train and val sets\n",
+                "            # Split data into train and val sets\n",
                 "            self.mnist_train, self.mnist_val = random_split(mnist, [55000, 5000])\n",
                 "\n",
                 "    def train_dataloader(self):\n",
@@ -175,7 +175,7 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "You don't need to make any change to the definition of PyTorch Lightning model and datamodule."
+                "You don't need to modify the definition of the PyTorch Lightning model or datamodule."
             ]
         },
         {
@@ -183,18 +183,18 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "## Define the training loop\n",
+                "## Define a training function\n",
                 "\n",
-                "This code defines a training loop for each worker. Comparing the training loop with the original PyTorch Lightning code, there are 3 main differences:\n",
+                "This code defines a {ref}`training function <train-overview-training-function>` for each worker. Comparing the training fuction with the original PyTorch Lightning code, notice three main differences:\n",
                 "\n",
                 "- Distributed strategy: Use {class}`RayDDPStrategy <ray.train.lightning.RayDDPStrategy>`.\n",
                 "- Cluster environment: Use {class}`RayLightningEnvironment <ray.train.lightning.RayLightningEnvironment>`.\n",
-                "- Parallel devices: Always sets to `devices=\"auto\"` to use all available devices configured by ``TorchTrainer``.\n",
+                "- Parallel devices: Always set to `devices=\"auto\"` to use all available devices configured by ``TorchTrainer``.\n",
                 "\n",
                 "See {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>` for more information.\n",
                 "\n",
                 "\n",
-                "For checkpoint reportining, Ray Train provides a minimal {class}`RayTrainReportCallback <ray.train.lightning.RayTrainReportCallback>` that reports metrics and checkpoint on each train epoch end. For more complex checkpoint logic, please implement custom callbacks as described in {ref}`Saving and Loading Checkpoint <train-checkpointing>` user guide."
+                "For checkpoint reporting, Ray Train provides a minimal {class}`RayTrainReportCallback <ray.train.lightning.RayTrainReportCallback>` class that reports metrics and checkpoints at the end of each train epoch. For more complex checkpoint logic, implement custom callbacks. See {ref}`Saving and Loading Checkpoint <train-checkpointing>`."
             ]
         },
         {
@@ -203,7 +203,7 @@
             "metadata": {},
             "outputs": [],
             "source": [
-                "use_gpu = True # Set it to False if you want to run without GPUs\n",
+                "use_gpu = True # Set to False if you want to run without GPUs\n",
                 "num_workers = 4"
             ]
         },
@@ -804,7 +804,7 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "## Check the Training Results and Checkpoints"
+                "## Check training results and checkpoints"
             ]
         },
         {
@@ -857,9 +857,9 @@
             "cell_type": "markdown",
             "metadata": {},
             "source": [
-                "As we can see, three checkpoints(`checkpoint_000007`, `checkpoint_000008`, `checkpoint_000009`) have been saved in the trial directory. To retrieve the latest checkpoint from the fit results and load it back into the model, follow these steps.\n",
+                "Ray Train saved three checkpoints(`checkpoint_000007`, `checkpoint_000008`, `checkpoint_000009`) in the trial directory. The following code retrieves the latest checkpoint from the fit results and loads it back into the model.\n",
                 "\n",
-                "If you lost the in-memory result object, you can also restore the model from the checkpoint file. Here the checkpoint path is: `/tmp/ray_results/ptl-mnist-example/TorchTrainer_eb925_00000_0_2023-08-07_23-15-06/checkpoint_000009/checkpoint.ckpt`."
+                "If you lost the in-memory result object, you can restore the model from the checkpoint file. The checkpoint path is: `/tmp/ray_results/ptl-mnist-example/TorchTrainer_eb925_00000_0_2023-08-07_23-15-06/checkpoint_000009/checkpoint.ckpt`."
             ]
         },
         {
@@ -903,6 +903,17 @@
                 "\n",
                 "best_model"
             ]
+        },
+        {
+            "cell_type": "markdown",
+            "metadata": {},
+            "source": [
+                "## See also\n",
+                "\n",
+                "For a tutorial on using Ray Train and PyTorch Lightning, see {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>`.\n",
+                "\n",
+                "For more Train examples, see :ref:`Ray Train Examples <train-examples>`."
+            ]
         }
     ],
     "metadata": {
diff --git a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
index 96ea9a5a1f9d8..05bd3e37bf8a1 100644
--- a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
+++ b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
@@ -1,7 +1,9 @@
 :orphan:
 
-Fine-tuning DreamBooth with Ray Train
-=====================================
+.. _torch_finetune_dreambooth_ex:
+
+Fine-tune of Stable Diffusion with DreamBooth and Ray Train
+===========================================================
 
 This example shows how to do DreamBooth fine-tuning of a Stable Diffusion model using Ray Train.
 See the original `DreamBooth project homepage <https://dreambooth.github.io/>`_ for more details on what this fine-tuning method achieves.
@@ -10,41 +12,41 @@ See the original `DreamBooth project homepage <https://dreambooth.github.io/>`_
   :target: https://dreambooth.github.io
   :alt: DreamBooth fine-tuning overview
 
-This example is built on top of `this HuggingFace 🤗 tutorial <https://huggingface.co/docs/diffusers/training/dreambooth>`_.
-See the HuggingFace tutorial for useful explanations and suggestions on hyperparameters.
+This example builds on `this Hugging Face 🤗 tutorial <https://huggingface.co/docs/diffusers/training/dreambooth>`_.
+See the Hugging Face tutorial for useful explanations and suggestions on hyperparameters.
 **Adapting this example to Ray Train allows you to easily scale up the fine-tuning to an arbitrary number of distributed training workers.**
 
 **Compute requirements:**
 
-* Because of the large model sizes, you'll need a machine with at least 1 A10G GPU.
-* Each training worker uses 1 GPU. You can use multiple GPUs/workers to leverage data-parallel training to speed up training time.
+* Because of the large model sizes, you need a machine with at least 1 A10G GPU.
+* Each training worker uses 1 GPU. You can use multiple GPUs or workers to leverage data-parallel training to speed up training time.
 
-This example fine-tunes both the ``text_encoder`` and ``unet`` models used in the Stable Diffusion process, with respect to a prior preserving loss.
+This example fine-tunes both the ``text_encoder`` and ``unet`` models used in the stable diffusion process, with respect to a prior preserving loss.
 
 
 .. image:: /templates/05_dreambooth_finetuning/dreambooth/images/dreambooth_example.png
    :alt: DreamBooth overview
 
-The full code repository can be found here: `https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning>`_
+Find the full code repository at `https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning>`_
 
 
 How it works
 ------------
 
-This example leverages Ray Data for data loading and Ray Train for distributed training.
+This example uses Ray Data for data loading and Ray Train for distributed training.
 
 Data loading
 ^^^^^^^^^^^^
 
 .. note::
-    You can find the latest version of the code here: `dataset.py <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning/dreambooth/dataset.py>`_
+    Find the latest version of the code at `dataset.py <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning/dreambooth/dataset.py>`_
 
     The latest version might differ slightly from the code presented here.
 
 
-We use Ray Data for data loading. The code has three interesting parts.
+Use Ray Data for data loading. The code has three interesting parts.
 
-First, we load two datasets using :func:`ray.data.read_images`:
+First, load two datasets using :func:`ray.data.read_images`:
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth/dataset.py
   :language: python
@@ -52,7 +54,7 @@ First, we load two datasets using :func:`ray.data.read_images`:
   :end-at: class_dataset = read
   :dedent: 4
 
-Then, we tokenize the prompt that generated these images:
+Then, tokenize the prompt that generated these images:
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth/dataset.py
   :language: python
@@ -61,7 +63,7 @@ Then, we tokenize the prompt that generated these images:
   :dedent: 4
 
 
-And lastly, we apply a ``torchvision`` preprocessing pipeline to the images:
+And lastly, apply a ``torchvision`` preprocessing pipeline to the images:
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth/dataset.py
   :language: python
@@ -69,8 +71,7 @@ And lastly, we apply a ``torchvision`` preprocessing pipeline to the images:
   :end-before: END: image preprocessing
   :dedent: 4
 
-We apply all of this in final step:
-
+Apply all three parts in a final step:
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth/dataset.py
   :language: python
@@ -79,29 +80,28 @@ We apply all of this in final step:
   :dedent: 4
 
 
-
 Distributed training
 ^^^^^^^^^^^^^^^^^^^^
 
 
 .. note::
-    You can find the latest version of the code here: `train.py <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning/dreambooth/train.py>`_
+    Find the latest version of the code at `train.py <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning/dreambooth/train.py>`_
 
     The latest version might differ slightly from the code presented here.
 
 
-The central part of the training code is the *training function*. This function accepts a configuration dict that contains the hyperparameters. It then defines a regular PyTorch training loop.
+The central part of the training code is the :ref:`training function <train-overview-training-function>`. This function accepts a configuration dict that contains the hyperparameters. It then defines a regular PyTorch training loop.
 
-There are only a few locations where we interact with the Ray Train API. We marked them with in-line comments in the snippet below.
+You interact with the Ray Train API in only a few locations, which follow in-line comments in the snippet below.
 
-Remember that we want to do data-parallel training for all our models.
+Remember that you want to do data-parallel training for all the models.
 
 
-#. We load the data shard for each worker with session.get_dataset_shard("train")
-#. We iterate over the dataset with train_dataset.iter_torch_batches()
-#. We report results to Ray Train with session.report(results)
+#. Load the data shard for each worker with `session.get_dataset_shard("train")``
+#. Iterate over the dataset with `train_dataset.iter_torch_batches()``
+#. Report results to Ray Train with `session.report(results)``
 
-The code was compacted for brevity. The `full code <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning/dreambooth/train.py>`_ is more thoroughly annotated.
+The code is compacted for brevity. The `full code <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning/dreambooth/train.py>`_ is more thoroughly annotated.
 
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth/train.py
@@ -109,7 +109,7 @@ The code was compacted for brevity. The `full code <https://github.com/ray-proje
   :start-at: def train_fn(config)
   :end-before: END: Training loop
 
-We can then run this training loop with Ray Train's TorchTrainer:
+You can then run this training function with Ray Train's TorchTrainer:
 
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth/train.py
@@ -121,11 +121,11 @@ We can then run this training loop with Ray Train's TorchTrainer:
 Configuring the scale
 ^^^^^^^^^^^^^^^^^^^^^
 
-In the TorchTrainer, we can easily configure our scale.
-The above example uses the ``num_workers`` argument to specify the number
-of workers. This defaults to 2 workers with 1 GPU each - so 2 GPUs in total.
+In the TorchTrainer, you can easily configure the scale.
+The preceding example uses the ``num_workers`` argument to specify the number
+of workers. This argument defaults to 2 workers with 1 GPU each, totalling to 2 GPUs.
 
-To run the example on 4 GPUs, just set the number of workers to 4 using ``--num-workers=4``!
+To run the example on 4 GPUs, set the number of workers to 4 using ``--num-workers=4``.
 Or you can change the scaling config directly:
 
 .. code-block:: diff
@@ -136,16 +136,16 @@ Or you can change the scaling config directly:
     +    num_workers=4,
      )
 
-If you're running multi-node training, you should make sure that all nodes have access to a shared
-storage (e.g. via NFS or EFS). In the example script below, you can adjust this location with the
+If you're running multi-node training, make sure that all nodes have access to a shared
+storage like NFS or EFS. In the following example script, you can adjust the location with the
 ``DATA_PREFIX`` environment variable.
 
 Training throughput
 ~~~~~~~~~~~~~~~~~~~
 
-We ran training using 1,  2, and 4 workers/GPUs to compare throughput.
+Compare throughput of the preceding training runs that used 1,  2, and 4 workers or GPUs.
 
-Setup:
+Consider the following setup:
 
 * 1 GCE g2-standard-48-nvidia-l4-4 instance with 4 GPUs
 * Model as configured below
@@ -154,7 +154,7 @@ Setup:
 * Training for 4 epochs (local batch size = 2)
 * 3 runs per configuration
 
-We expect that the training time should benefit from scale and decreases when running with
+You expect that the training time should benefit from scale and decreases when running with
 more workers and GPUs.
 
 .. image:: /templates/05_dreambooth_finetuning/dreambooth/images/dreambooth_training.png
@@ -173,26 +173,26 @@ more workers and GPUs.
      - 313.25
 
 
-While the training time decreases linearly with the amount of workers/GPUs, we observe some penalty.
-Specifically, with double the amount of workers we don't get half of the training time.
+While the training time decreases linearly with the amount of workers/GPUs, you can observe some penalty.
+Specifically, with double the amount of workers you don't get half of the training time.
 
-This is most likely due to additional communication between processes and the transfer of large model
-weights. We are also only training with a batch size of one because our GPU memory is limited. On larger
-GPUs with higher batch sizes we would expect a greater benefit from scaling out.
+This penalty is most likely due to additional communication between processes and the transfer of large model
+weights. You are also only training with a batch size of one because of the GPU memory limitation. On larger
+GPUs with higher batch sizes you would expect a greater benefit from scaling out.
 
 
 Run the example
 ---------------
 
-First, we download the pre-trained stable diffusion model as a starting point.
+First, download the pre-trained Stable Diffusion model as a starting point.
 
-We will then train this model with a few images of our subject.
+Then train this model with a few images of a subject.
 
-To achieve this, we choose a non-word as an identifier, e.g. ``unqtkn``. When fine-tuning the model with our subject, we will teach it that the prompt is ``A photo of a unqtkn <class>``.
+To achieve this, choose a non-word as an identifier, such as ``unqtkn``. When fine-tuning the model with this subject, you teach the model that the prompt is ``A photo of a unqtkn <class>``.
 
-After fine-tuning we can run inference with this specific prompt.
-For instance: ``A photo of a unqtkn <class>`` will create an image of our subject.
-Similarly, ``A photo of a unqtkn <class> at the beach`` will create an image of our subject at the beach.
+After fine-tuning you can run inference with this specific prompt.
+For instance: ``A photo of a unqtkn <class>`` creates an image of the subject.
+Similarly, ``A photo of a unqtkn <class> at the beach`` creates an image of the subject at the beach.
 
 Step 0: Preparation
 ^^^^^^^^^^^^^^^^^^^
@@ -216,7 +216,7 @@ Prepare some directories and environment variables.
 Step 1: Download the pre-trained model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Download and cache a pre-trained Stable-Diffusion model locally.
+Download and cache a pre-trained Stable Diffusion model locally.
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth_run.sh
   :language: bash
@@ -228,10 +228,10 @@ You can access the downloaded model checkpoint at the ``$ORIG_MODEL_PATH``.
 Step 2: Supply images of your subject
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Use one of the sample datasets (dog, lego car), or provide your own directory
+Use one of the sample datasets, like `dog` or `lego car`, or provide your own directory
 of images, and specify the directory with the ``$INSTANCE_DIR`` environment variable.
 
-Then, we copy these images to ``$IMAGES_OWN_DIR``.
+Then, copy these images to ``$IMAGES_OWN_DIR``.
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth_run.sh
   :language: bash
@@ -247,7 +247,7 @@ Step 3: Create the regularization images
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Create a regularization image set for a class of subjects using the pre-trained
-Stable Diffusion model. This is used to regularize the fine-tuning by ensuring that
+Stable Diffusion model. This set regularizes the fine-tuning by ensuring that
 the model still produces decent images for random images of the same class,
 rather than just optimize for producing good images of the subject.
 
@@ -256,12 +256,12 @@ rather than just optimize for producing good images of the subject.
   :start-after: Step 3: START
   :end-before: Step 3: END
 
-We use Ray Data to do batch inference with 4 workers, so more images can be generated in parallel.
+Use Ray Data to do batch inference with 4 workers, to generate more images in parallel.
 
 Step 4: Fine-tune the model
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Save a few (4 to 5) images of the subject being fine-tuned
+Save a few, like 4 to 5, images of the subject being fine-tuned
 in a local directory. Then launch the training job with:
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth_run.sh
@@ -269,21 +269,28 @@ in a local directory. Then launch the training job with:
   :start-after: Step 4: START
   :end-before: Step 4: END
 
-Step 5: Generate images of our subject
+Step 5: Generate images of the subject
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Try your model with the same command line as Step 2, but point
-to your own model this time!
+to your own model this time.
 
 .. literalinclude:: /templates/05_dreambooth_finetuning/dreambooth_run.sh
   :language: bash
   :start-after: Step 5: START
   :end-before: Step 5: END
 
-Next, try replacing the prompt with something more interesting!
+Next, try replacing the prompt with something more interesting.
 
 For example, for the dog subject, you can try:
 
 - "photo of a unqtkn dog in a bucket"
 - "photo of a unqtkn dog sleeping"
-- "photo of a unqtkn dog in a doghouse"
\ No newline at end of file
+- "photo of a unqtkn dog in a doghouse"
+
+See also
+--------
+
+For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
+
+For how-to guides, see :ref:`Ray Train User Guides <train-user-guides>`.
\ No newline at end of file
diff --git a/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst b/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst
index 2955441efaf08..c3006634b86d6 100644
--- a/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst
+++ b/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst
@@ -2,7 +2,19 @@
 
 .. _torch_fashion_mnist_ex:
 
-Running Distributed Training of a PyTorch Model on Fashion MNIST with Ray Train
-===============================================================================
+Train a PyTorch Model on Fashion MNIST
+======================================
+
+This example runs distributed training of a PyTorch model on Fashion MNIST with Ray Train.
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
+
+See also
+--------
+
+For a tutorial on using Ray Train and PyTorch, see :ref:`Getting Started with PyTorch <train-pytorch>`.
+
+For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
diff --git a/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst b/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst
index d4bb78290cf5b..587fa1673dda7 100644
--- a/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst
+++ b/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst
@@ -2,7 +2,21 @@
 
 .. _transformers_torch_trainer_basic_example :
 
-Ray Train Basic Example for HuggingFace Transformers
-====================================================
+Fine-tune a Text Classifier with Hugging Face Transformers
+==========================================================
+
+This basic example of distributed training with Ray Train and Hugging Face (HF) Transformers
+fine-tunes a text classifier on the Yelp review dataset using HF Transformers and Ray Train.
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
+
+See also
+--------
+
+For a tutorial on using Ray Train and Transformers, 
+see :ref:`Getting Started with Hugging Face Transformers <train-pytorch-transformers>`.
+
+For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
diff --git a/doc/source/train/getting-started-pytorch-lightning.rst b/doc/source/train/getting-started-pytorch-lightning.rst
index 00b8af39828e0..d9ea9ed540ffb 100644
--- a/doc/source/train/getting-started-pytorch-lightning.rst
+++ b/doc/source/train/getting-started-pytorch-lightning.rst
@@ -29,7 +29,7 @@ For reference, the final code follows:
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-1. Your `train_func` is the Python code that is executed on each distributed training worker.
+1. Your `train_func` is the Python code that each distributed training worker executes.
 2. Your `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
 3. Your `TorchTrainer` launches the distributed training job.
 
@@ -147,8 +147,8 @@ Compare a PyTorch Lightning training script with and without Ray Train.
             result = trainer.fit()            
 
 
-Setting up your training function
----------------------------------
+Set up the training function
+----------------------------
 
 First, update your training code to support distributed training. 
 Begin by wrapping your code in a function:
@@ -158,7 +158,7 @@ Begin by wrapping your code in a function:
     def train_func(config):
         # Your PyTorch Lightning training code here.
 
-This function is executed on each distributed training worker.
+Each distributed training worker executes this function.
 
 
 Ray Train sets up your distributed process group on each worker. You only need to 
@@ -364,7 +364,7 @@ information about the training run, including the metrics and checkpoints report
 Next steps
 ---------- 
 
-After you have converted your PyTorch Lightningtraining script to use Ray Train:
+After you have converted your PyTorch Lightning training script to use Ray Train:
 
 * See :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
 * Browse the :ref:`Examples <train-examples>` for end-to-end examples of how to use Ray Train.
@@ -374,7 +374,7 @@ Version Compatibility
 ---------------------
 
 Ray Train is tested with `pytorch_lightning` versions `1.6.5` and `2.0.4`. For full compatibility, use ``pytorch_lightning>=1.6.5`` . 
-Earlier versions are not prohibited but may result in unexpected issues. If you run into any compatibility issues, consider upgrading your PyTorch Lightning version or 
+Earlier versions aren't prohibited but may result in unexpected issues. If you run into any compatibility issues, consider upgrading your PyTorch Lightning version or 
 `file an issue <https://github.com/ray-project/ray/issues>`_. 
 
 .. _lightning-trainer-migration-guide:
diff --git a/doc/source/train/getting-started-pytorch.rst b/doc/source/train/getting-started-pytorch.rst
index b903ac7937c13..08b78c6a8de43 100644
--- a/doc/source/train/getting-started-pytorch.rst
+++ b/doc/source/train/getting-started-pytorch.rst
@@ -7,11 +7,11 @@ This tutorial walks through the process of converting an existing PyTorch script
 
 Learn how to:
 
-1. Configure your model so that it runs distributed and is placed on the correct CPU/GPU device.
-2. Configure your dataloader so that it is sharded across the workers and place data on the correct CPU/GPU device.
-3. Configure your training function to report metrics and save checkpoints.
-4. Configure scale and CPU/GPU resource requirements for your training job.
-5. Launch your distributed training job with a :class:`~ray.train.torch.TorchTrainer`.
+1. Configure a model to run distributed and on the correct CPU/GPU device.
+2. Configure a dataloader to shard data across the workers and place data on the correct CPU/GPU device.
+3. Configure a training function to report metrics and save checkpoints.
+4. Configure scale and CPU/GPU resource requirements for a training job.
+5. Launch a distributed training job with a :class:`~ray.train.torch.TorchTrainer` class.
 
 Quickstart
 ----------
@@ -30,9 +30,9 @@ For reference, the final code follows:
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-1. Your `train_func` is the Python code that is executed on each distributed training worker.
-2. Your `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
-3. Your `TorchTrainer` launches the distributed training job.
+1. `train_func` is the Python code that executes on each distributed training worker.
+2. `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
+3. `TorchTrainer` launches the distributed training job.
 
 Compare a PyTorch training script with and without Ray Train.
 
@@ -135,19 +135,19 @@ Setting up your training function
 ---------------------------------
 
 First, update your training code to support distributed training. 
-You can begin by wrapping your code in a function:
+You can begin by wrapping your code in a :ref:`training function <train-overview-training-function>`:
 
 .. code-block:: python
 
     def train_func(config):
         # Your PyTorch training code here.
 
-This function is executed on each distributed training worker.
+Each distributed training worker executes this function.
 
 Setting up your model
 ^^^^^^^^^^^^^^^^^^^^^
 
-Use the :func:`ray.train.torch.prepare_model` utility function. This will:
+Use the :func:`ray.train.torch.prepare_model` utility function to:
 
 1. Move your model to the right device.
 2. Wrap it in ``DistributedDataParallel``.
@@ -182,8 +182,8 @@ Use the :func:`ray.train.torch.prepare_data_loader` utility function, which:
 1. Adds a ``DistributedSampler`` to your ``DataLoader``.
 2. Moves the batches to the right device. 
 
-Note that this step is not necessary if you are passing in Ray Data to your Trainer
-(see :ref:`data-ingest-torch`):
+Note that this step isn't necessary if you're passing in Ray Data to your Trainer.
+See :ref:`data-ingest-torch`.
 
 .. code-block:: diff
 
diff --git a/doc/source/train/huggingface-accelerate.rst b/doc/source/train/huggingface-accelerate.rst
index dd4e86dc65090..93dc096dda3ed 100644
--- a/doc/source/train/huggingface-accelerate.rst
+++ b/doc/source/train/huggingface-accelerate.rst
@@ -1,11 +1,11 @@
 .. _train-hf-accelerate:
 
-Training with HuggingFace Accelerate
-====================================
+Training with Hugging Face Accelerate
+=====================================
 
 The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `Accelelate <https://huggingface.co/docs/accelerate>`_  training across a distributed Ray cluster.
 
-All you need to do is run your existing training code with a TorchTrainer. You can expect the final code to look like this:
+You only need to run your existing training code with a TorchTrainer. You can expect the final code to look like this:
 
 .. code-block:: python
 
@@ -161,11 +161,11 @@ object in your training function. Below are starter examples for configuring Acc
             trainer.fit()
 
 Note that Accelerate also provides a CLI tool, `"accelerate config"`, to generate a configuration and launch your training 
-job with `"accelerate launch"`. However, it is not necessary here because Ray's `TorchTrainer` already sets up the Torch 
+job with `"accelerate launch"`. However, it's not necessary here because Ray's `TorchTrainer` already sets up the Torch 
 distributed environment and launches the training function on all workers.
 
 
-Next, check these end-to-end examples below for more details:
+Next, see these end-to-end examples below for more details:
 
 .. tabs::
 
@@ -211,6 +211,6 @@ Aside from that, the functionality of ``AccelerateTrainer`` is identical to ``To
 
 However, this caused confusion around whether this was the *only* way to run Accelerate code. 
 Because the full Accelerate functionality can be expressed with the ``Accelerator`` and ``TorchTrainer`` combination, the ``AccelerateTrainer`` will be deprecated in Ray 2.8, 
-and it is recommend to run your  Accelerate code directly with ``TorchTrainer``. 
+and it's recommend to run your  Accelerate code directly with ``TorchTrainer``. 
 
 
diff --git a/python/ray/train/examples/accelerate/accelerate_torch_trainer.py b/python/ray/train/examples/accelerate/accelerate_torch_trainer.py
index 41969a71f0210..64992f0bc2240 100644
--- a/python/ray/train/examples/accelerate/accelerate_torch_trainer.py
+++ b/python/ray/train/examples/accelerate/accelerate_torch_trainer.py
@@ -25,7 +25,7 @@
 
 
 def train_func(config):
-    """Your training function that will be launched on each worker."""
+    """Your training function that is launched on each worker."""
 
     # Unpack training configs
     lr = config["lr"]
@@ -116,7 +116,7 @@ def collate_fn(batch):
         eval_metric = metric.compute()
         accelerator.print(f"epoch {epoch}:", eval_metric)
 
-        # Report Checkpoint and metrics to Ray Train
+        # Report checkpoint and metrics to Ray Train
         # ==========================================
         with TemporaryDirectory() as tmpdir:
             if accelerator.is_main_process:
diff --git a/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py b/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
index 630177424f28c..79d3f993f3d3b 100644
--- a/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
+++ b/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
@@ -14,8 +14,8 @@
 from ray.train.torch import TorchTrainer
 
 
-# [1] Define a training function that includes all your training logics
-# =====================================================================
+# [1] Define a training function that includes all your training logic
+# ====================================================================
 def train_func(config):
     # Datasets
     dataset = load_dataset("yelp_review_full")
@@ -34,7 +34,7 @@ def tokenize_function(examples):
         "bert-base-cased", num_labels=5
     )
 
-    # Evaluation Metrics
+    # Evaluation metrics
     metric = evaluate.load("accuracy")
 
     def compute_metrics(eval_pred):
@@ -42,7 +42,7 @@ def compute_metrics(eval_pred):
         predictions = np.argmax(logits, axis=-1)
         return metric.compute(predictions=predictions, references=labels)
 
-    # HuggingFace Trainer
+    # Hugging Face Trainer
     training_args = TrainingArguments(
         output_dir="test_trainer", evaluation_strategy="epoch", report_to="none"
     )
@@ -59,7 +59,7 @@ def compute_metrics(eval_pred):
     # ===============================================
     trainer.add_callback(RayTrainReportCallback())
 
-    # [3] Prepare your trainer for Ray Data Integration
+    # [3] Prepare your trainer for Ray Data integration
     # =================================================
     trainer = prepare_trainer(trainer)
 
diff --git a/python/ray/train/torch/torch_trainer.py b/python/ray/train/torch/torch_trainer.py
index e61d87e3386fd..735c9fad19665 100644
--- a/python/ray/train/torch/torch_trainer.py
+++ b/python/ray/train/torch/torch_trainer.py
@@ -25,7 +25,9 @@ class TorchTrainer(DataParallelTrainer):
     4. Runs the input ``train_loop_per_worker(train_loop_config)``
        on all workers.
 
-    For more details, see the :ref:`PyTorch User Guide <train-pytorch>`.
+    For more details, see the :ref:`PyTorch User Guide <train-pytorch>`, 
+    :ref:`PyTorch Lightning User Guide <train-pytorch-lightning>`, 
+    or :ref:`PyTorch User Guide <train-pytorch-transformers>`.
 
     Example:
 

From b7f2a0eef18dc75a87796439f7180b1f85dff54a Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 15:20:10 -0700
Subject: [PATCH 04/13] make remaining examples consistent; add see also's

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 .github/styles/Vocab/Train/accept.txt         |   1 -
 doc/source/train/deepspeed.rst                |  12 +-
 .../train/distributed-tensorflow-keras.rst    | 105 +++++++++---------
 .../train/distributed-xgboost-lightgbm.rst    |  70 ++++++------
 doc/source/train/examples.rst                 |  10 +-
 .../accelerate/accelerate_example.rst         |   9 +-
 .../examples/deepspeed/deepspeed_example.rst  |  20 +++-
 .../examples/horovod/horovod_example.rst      |  16 ++-
 .../lightning/lightning_cola_advanced.ipynb   |  51 ++++++---
 .../lightning/lightning_mnist_example.ipynb   |   4 +-
 .../pytorch/dreambooth_finetuning.rst         |  14 ++-
 .../pytorch/torch_fashion_mnist_example.rst   |   4 +-
 .../examples/tf/tensorflow_mnist_example.rst  |  17 ++-
 .../huggingface_text_classification.ipynb     |  90 ++++++++-------
 .../transformers_torch_trainer_basic.rst      |   5 +-
 .../getting-started-pytorch-lightning.rst     |  81 +++++++-------
 doc/source/train/getting-started-pytorch.rst  |  42 +++----
 .../train/getting-started-transformers.rst    |  56 +++++-----
 doc/source/train/horovod.rst                  |  38 ++++---
 doc/source/train/huggingface-accelerate.rst   |  16 +--
 doc/source/train/more-frameworks.rst          |   4 +-
 21 files changed, 368 insertions(+), 297 deletions(-)

diff --git a/.github/styles/Vocab/Train/accept.txt b/.github/styles/Vocab/Train/accept.txt
index d0c7e09aaea0c..38f7eed079981 100644
--- a/.github/styles/Vocab/Train/accept.txt
+++ b/.github/styles/Vocab/Train/accept.txt
@@ -1,5 +1,4 @@
 Horovod
-Hugging Face
 hyperparameters?
 Keras
 LightGBM
diff --git a/doc/source/train/deepspeed.rst b/doc/source/train/deepspeed.rst
index b9e8e396c5e9c..68160530d3231 100644
--- a/doc/source/train/deepspeed.rst
+++ b/doc/source/train/deepspeed.rst
@@ -1,11 +1,11 @@
 .. _train-deepspeed:
 
-Training with DeepSpeed
-=======================
+Get Started with DeepSpeed
+==========================
 
 The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `DeepSpeed <https://www.deepspeed.ai/>`_  training across a distributed Ray cluster.
 
-All you need to do is run your existing training code with a TorchTrainer. You can expect the final code to look like this:
+You only need to run your existing training code with a TorchTrainer. You can expect the final code to look like this:
 
 .. code-block:: python
 
@@ -74,12 +74,12 @@ Below is a simple example of ZeRO-3 training with DeepSpeed only.
     keep using `deepspeed.initialize() <https://deepspeed.readthedocs.io/en/latest/initialize.html>`_ as usual to prepare everything 
     for distributed training.
 
-Running DeepSpeed with other frameworks
--------------------------------------------
+Run DeepSpeed with other frameworks
+-----------------------------------
 
 Many deep learning frameworks have integrated with DeepSpeed, including Lightning, Transformers, Accelerate, and more. You can run all these combinations in Ray Train.
 
-Please check the below examples for more details:
+Check the below examples for more details:
 
 .. list-table::
    :header-rows: 1
diff --git a/doc/source/train/distributed-tensorflow-keras.rst b/doc/source/train/distributed-tensorflow-keras.rst
index 6febebf8f1821..c8ea915a8d019 100644
--- a/doc/source/train/distributed-tensorflow-keras.rst
+++ b/doc/source/train/distributed-tensorflow-keras.rst
@@ -1,9 +1,10 @@
 .. _train-tensorflow-overview:
 
-Distributed Tensorflow & Keras
-==============================
+Get Started with TensorFlow and Keras
+=====================================
+
 Ray Train's `TensorFlow <https://www.tensorflow.org/>`__ integration enables you
-to scale your TensorFlow and Keras training loops to many machines and GPUs.
+to scale your TensorFlow and Keras training functions to many machines and GPUs.
 
 On a technical level, Ray Train schedules your training workers
 and configures ``TF_CONFIG`` for you, allowing you to run
@@ -11,8 +12,8 @@ your ``MultiWorkerMirroredStrategy`` training script. See `Distributed
 training with TensorFlow <https://www.tensorflow.org/guide/distributed_training>`_
 for more information.
 
-Most of the examples in this guide use Tensorflow with Keras, but
-Ray Train also works with vanilla Tensorflow.
+Most of the examples in this guide use TensorFlow with Keras, but
+Ray Train also works with vanilla TensorFlow.
 
 
 Quickstart
@@ -23,29 +24,27 @@ Quickstart
   :end-before: __tf_train_end__
 
 
-Updating your training function
--------------------------------
+Update your training function
+-----------------------------
 
-First, you'll want to update your training function to support distributed
+First, update your :ref:`training function <train-overview-training-function> to support distributed
 training.
 
 
 .. note::
    The current TensorFlow implementation supports
    ``MultiWorkerMirroredStrategy`` (and ``MirroredStrategy``). If there are
-   other strategies you wish to see supported by Ray Train, please let us know
-   by submitting a `feature request on GitHub <https://github.com/ray-project/ray/issues>`_.
+   other strategies you wish to see supported by Ray Train, submit a `feature request on GitHub <https://github.com/ray-project/ray/issues>`_.
 
 These instructions closely follow TensorFlow's `Multi-worker training
 with Keras <https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras>`_
-tutorial. One key difference is that Ray Train will handle the environment
+tutorial. One key difference is that Ray Train handles the environment
 variable set up for you.
 
 **Step 1:** Wrap your model in ``MultiWorkerMirroredStrategy``.
 
 The `MultiWorkerMirroredStrategy <https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy>`_
-enables synchronous distributed training. The ``Model`` *must* be built and
-compiled within the scope of the strategy.
+enables synchronous distributed training. You *must* build and compile the ``Model`` within the scope of the strategy.
 
 .. code-block:: python
 
@@ -56,9 +55,8 @@ compiled within the scope of the strategy.
 **Step 2:** Update your ``Dataset`` batch size to the *global* batch
 size.
 
-The `batch <https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch>`_
-will be split evenly across worker processes, so ``batch_size`` should be
-set appropriately.
+Set ``batch_size`` appropriately because `batch <https://www.tensorflow.org/api_docs/python/tf/data/Dataset#batch>`_
+splits evenly across worker processes.
 
 .. code-block:: diff
 
@@ -67,20 +65,20 @@ set appropriately.
 
 
 .. warning::
-    Ray will not automatically set any environment variables or configuration
-    related to local parallelism / threading
+    Ray doesn't automatically set any environment variables or configuration
+    related to local parallelism or threading
     :ref:`aside from "OMP_NUM_THREADS" <omp-num-thread-note>`.
-    If you desire greater control over TensorFlow threading, use
+    If you want greater control over TensorFlow threading, use
     the ``tf.config.threading`` module (eg.
     ``tf.config.threading.set_inter_op_parallelism_threads(num_cpus)``)
     at the beginning of your ``train_loop_per_worker`` function.
 
-Creating a :class:`~ray.train.tensorflow.TensorflowTrainer`
------------------------------------------------------------
+Create a TensorflowTrainer
+--------------------------
 
-``Trainer``\s are the primary Ray Train classes that are used to manage state and
+``Trainer``\s are the primary Ray Train classes for managing state and
 execute training. For distributed Tensorflow,
-we use a :class:`~ray.train.tensorflow.TensorflowTrainer`
+use a :class:`~ray.train.tensorflow.TensorflowTrainer`
 that you can setup like this:
 
 .. code-block:: python
@@ -109,38 +107,36 @@ To customize the backend setup, you can pass a
     )
 
 
-For more configurability, please reference the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
+For more configurability, see the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
 
 
-Running your training function
-------------------------------
+Run your training function
+--------------------------
 
 With a distributed training function and a Ray Train ``Trainer``, you are now
-ready to start training!
+ready to start training.
 
 .. code-block:: python
 
     trainer.fit()
 
-Data loading and preprocessing
-------------------------------
-Tensorflow per default uses its own internal dataset sharding policy, as described
+Load and preprocess data
+------------------------
+
+TensorFlow by default uses its own internal dataset sharding policy, as described
 `in the guide <https://www.tensorflow.org/tutorials/distribute/multi_worker_with_keras#dataset_sharding>`__.
-If your tensorflow dataset is compatible with distributed loading, you don't need to
+If your TensorFlow dataset is compatible with distributed loading, you don't need to
 change anything.
 
 If you require more advanced preprocessing, you may want to consider using Ray Data
-for distributed data ingest.
-
-There is a guide for using :ref:`Ray Data with Ray Train <data-ingest-torch>`
-in our PyTorch guide. Since Ray Data is an independent library, most concepts can
-be directly applied to TensorFlow.
+for distributed data ingest. See :ref:`Ray Data with Ray Train <data-ingest-torch>`.
+Because Ray Data is an independent library, you can directly apply most concepts to TensorFlow.
 
 The main difference is that you may want to convert your Ray Data dataset shard to
 a TensorFlow dataset in your training function so that you can use the Keras
 API for model training.
 
-`Here's a full example you can refer to <https://github.com/ray-project/ray/blob/master/python/ray/train/examples/tf/tune_tensorflow_autoencoder_example.py>`__
+`See this example <https://github.com/ray-project/ray/blob/master/python/ray/train/examples/tf/tune_tensorflow_autoencoder_example.py>`__
 for distributed data loading. The relevant parts are:
 
 .. code-block:: python
@@ -184,8 +180,8 @@ for distributed data loading. The relevant parts are:
 
 
 
-Reporting results
------------------
+Report results
+--------------
 During training, the training loop should report intermediate results and checkpoints
 to Ray Train. This reporting logs the results to the console output and appends them to
 local log files. The logging also triggers :ref:`checkpoint bookkeeping <train-dl-configure-checkpoints>`.
@@ -203,30 +199,29 @@ The easiest way to report your results with Keras is by using the
             model.fit(dataset, callbacks=[ReportCheckpointCallback()])
 
 
-This callback will automatically forward all results and checkpoints from the
-Keras training loop to Ray Train.
+This callback automatically forwards all results and checkpoints from the
+Keras training function to Ray Train.
 
 
-Aggregating results
-~~~~~~~~~~~~~~~~~~~
+Aggregate results
+~~~~~~~~~~~~~~~~~
 
 TensorFlow Keras automatically aggregates metrics from all workers. If you wish to have more
 control over that, consider implementing a `custom training loop <https://www.tensorflow.org/tutorials/distribute/custom_training>`__.
 
 
-Saving and loading checkpoints
-------------------------------
+Save and load checkpoints
+-------------------------
 
-:class:`Checkpoints <ray.train.Checkpoint>` can be saved by calling ``train.report(metrics, checkpoint=Checkpoint(...))`` in the
-training function. This will cause the checkpoint state from the distributed
-workers to be saved on the ``Trainer`` (where your python script is executed).
+You can save :class:`Checkpoints <ray.train.Checkpoint>` by calling ``train.report(metrics, checkpoint=Checkpoint(...))`` in the
+training function. This call saves the checkpoint state from the distributed
+workers on the ``Trainer``, where you executed your python script.
 
-The latest saved checkpoint can be accessed through the ``checkpoint`` attribute of
-the :py:class:`~ray.train.Result`, and the best saved checkpoints can be accessed by the ``best_checkpoints``
+You can access the latest saved checkpoint through the ``checkpoint`` attribute of
+the :py:class:`~ray.train.Result`, and access the best saved checkpoints with the ``best_checkpoints``
 attribute.
 
-Concrete examples are provided to demonstrate how checkpoints (model weights but not models) are saved
-appropriately in distributed training.
+These concrete examples demonstrate how Ray Train appropriately saves checkpoints, model weights but not models, in distributed training.
 
 
 .. code-block:: python
@@ -275,11 +270,11 @@ appropriately in distributed training.
     result = trainer.fit()
     print(result.checkpoint)
 
-By default, checkpoints will be persisted to local disk in the :ref:`log
+By default, checkpoints persist to local disk in the :ref:`log
 directory <train-log-dir>` of each run.
 
-Loading checkpoints
-~~~~~~~~~~~~~~~~~~~
+Load checkpoints
+~~~~~~~~~~~~~~~~
 
 .. code-block:: python
 
diff --git a/doc/source/train/distributed-xgboost-lightgbm.rst b/doc/source/train/distributed-xgboost-lightgbm.rst
index 41ba9e0271952..8e72716b5a887 100644
--- a/doc/source/train/distributed-xgboost-lightgbm.rst
+++ b/doc/source/train/distributed-xgboost-lightgbm.rst
@@ -1,7 +1,7 @@
 .. _train-gbdt-guide:
 
-Distributed XGBoost and LightGBM
-================================
+Get Started with XGBoost and LightGBM
+=====================================
 
 Ray Train has built-in support for XGBoost and LightGBM.
 
@@ -58,19 +58,19 @@ Ray-specific params are passed in through the trainer constructors.
 
 .. _train-gbdt-checkpoints:
 
-Saving and Loading XGBoost and LightGBM Checkpoints
----------------------------------------------------
+Save and Load XGBoost and LightGBM Checkpoints
+----------------------------------------------
 
-When a new tree is trained on every boosting round,
-it's possible to save a checkpoint to snapshot the training progress so far.
+When you train a new tree on every boosting round,
+you can save a checkpoint to snapshot the training progress so far.
 :class:`~ray.train.xgboost.XGBoostTrainer` and :class:`~ray.train.lightgbm.LightGBMTrainer`
 both implement checkpointing out of the box. These checkpoints can be loaded into memory
 using static methods :meth:`XGBoostTrainer.get_model <ray.train.xgboost.XGBoostTrainer.get_model>` and
 :meth:`LightGBMTrainer.get_model <ray.train.lightgbm.LightGBMTrainer.get_model>`.
 
 The only required change is to configure :class:`~ray.train.CheckpointConfig` to set
-the checkpointing frequency. For example, the following configuration will
-save a checkpoint on every boosting round and will only keep the latest checkpoint:
+the checkpointing frequency. For example, the following configuration
+saves a checkpoint on every boosting round and only keeps the latest checkpoint:
 
 .. literalinclude:: doc_code/key_concepts.py
     :language: python
@@ -79,7 +79,7 @@ save a checkpoint on every boosting round and will only keep the latest checkpoi
 
 .. tip::
 
-    Once checkpointing is enabled, you can follow :ref:`this guide <train-fault-tolerance>`
+    Once you enable checkpointing, you can follow :ref:`this guide <train-fault-tolerance>`
     to enable fault tolerance.
 
 
@@ -90,15 +90,15 @@ The benefit of using Ray Train is that you can seamlessly scale up your training
 adjusting the :class:`ScalingConfig <ray.train.ScalingConfig>`.
 
 .. note::
-    Ray Train does not modify or otherwise alter the working
-    of the underlying XGBoost / LightGBM distributed training algorithms.
+    Ray Train doesn't modify or otherwise alter the working
+    of the underlying XGBoost or LightGBM distributed training algorithms.
     Ray only provides orchestration, data ingest and fault tolerance.
     For more information on GBDT distributed training, refer to
     `XGBoost documentation <https://xgboost.readthedocs.io>`__ and
     `LightGBM documentation <https://lightgbm.readthedocs.io/>`__.
 
 
-Here are some examples for common use-cases:
+Following are some examples of common use-cases:
 
 .. tab-set::
 
@@ -138,44 +138,44 @@ Here are some examples for common use-cases:
             :start-after: __scaling_gpumulti_start__
             :end-before: __scaling_gpumulti_end__
 
-        Note that you just have to adjust the number of workers - everything else
-        will be handled by Ray automatically.
+        Note that you just have to adjust the number of workers. Ray handles everything else
+        automatically.
 
 
-How many remote actors should I use?
-------------------------------------
+How many remote actors should you use?
+--------------------------------------
 
 This depends on your workload and your cluster setup.
 Generally there is no inherent benefit of running more than
 one remote actor per node for CPU-only training. This is because
-XGBoost can already leverage multiple CPUs via threading.
+XGBoost can already leverage multiple CPUs with threading.
 
-However, there are some cases when you should consider starting
+However, in some cases, you should consider some starting
 more than one actor per node:
 
 * For **multi GPU training**, each GPU should have a separate
   remote actor. Thus, if your machine has 24 CPUs and 4 GPUs,
-  you will want to start 4 remote actors with 6 CPUs and 1 GPU
+  you want to start 4 remote actors with 6 CPUs and 1 GPU
   each
 * In a **heterogeneous cluster** , you might want to find the
   `greatest common divisor <https://en.wikipedia.org/wiki/Greatest_common_divisor>`_
   for the number of CPUs.
-  E.g. for a cluster with three nodes of 4, 8, and 12 CPUs, respectively,
+  For example, for a cluster with three nodes of 4, 8, and 12 CPUs, respectively,
   you should set the number of actors to 6 and the CPUs per
   actor to 4.
 
 How to use GPUs for training?
 -----------------------------
 
-Ray Train enables multi GPU training for XGBoost and LightGBM. The core backends
-will automatically leverage NCCL2 for cross-device communication.
-All you have to do is to start one actor per GPU and set GPU-compatible parameters,
-e.g. XGBoost's ``tree_method`` to ``gpu_hist`` (see XGBoost
-documentation for more details.)
+Ray Train enables multi-GPU training for XGBoost and LightGBM. The core backends
+automatically leverage NCCL2 for cross-device communication.
+All you have to do is to start one actor per GPU and set GPU-compatible parameters.
+For example, XGBoost's ``tree_method`` to ``gpu_hist``. See XGBoost
+documentation for more details.
 
-For instance, if you have 2 machines with 4 GPUs each, you will want
+For instance, if you have 2 machines with 4 GPUs each, you want
 to start 8 workers, and set ``use_gpu=True``. There is usually
-no benefit in allocating less (e.g. 0.5) or more than one GPU per actor.
+no benefit in allocating less (for example, 0.5) or more than one GPU per actor.
 
 You should divide the CPUs evenly across actors per machine, so if your
 machines have 16 CPUs in addition to the 4 GPUs, each actor should have
@@ -229,7 +229,7 @@ results.
   Total size: 5,000,000 KiB
 * XGBoost DMatrix size: ~10,000,000 KiB
 
-This dataset will fit exactly on this node for training.
+This dataset fits exactly on this node for training.
 
 Note that the DMatrix size might be lower on a 32 bit system.
 
@@ -239,10 +239,10 @@ Generally, the same memory requirements exist for GPU-based
 training. Additionally, the GPU must have enough memory
 to hold the dataset.
 
-In the example above, the GPU must have at least
+In the preceding example, the GPU must have at least
 10,000,000 KiB (about 9.6 GiB) memory. However,
-empirically we found that using a ``DeviceQuantileDMatrix``
-seems to show more peak GPU memory usage, possibly
+empirical data shows that using a ``DeviceQuantileDMatrix``
+seems to result in more peak GPU memory usage, possibly
 for intermediate storage when loading data (about 10%).
 
 **Best practices**
@@ -251,9 +251,9 @@ In order to reduce peak memory usage, consider the following
 suggestions:
 
 
-* Store data as ``float32`` or less. More precision is often
-  not needed, and keeping data in a smaller format will
-  help reduce peak memory usage for initial data loading.
+* Store data as ``float32`` or less. You often don't need
+  more precision is often, and keeping data in a smaller format
+  helps reduce peak memory usage for initial data loading.
 * Pass the ``dtype`` when loading data from CSV. Otherwise,
-  floating point values will be loaded as ``np.float64``
+  floating point values are loaded as ``np.float64``
   per default, increasing peak memory usage by 33%.
diff --git a/doc/source/train/examples.rst b/doc/source/train/examples.rst
index ac1252e92e9e5..0a1811364e67c 100644
--- a/doc/source/train/examples.rst
+++ b/doc/source/train/examples.rst
@@ -22,15 +22,15 @@ Beginner
   * - Lightning
     - :ref:`Train an MNIST Image Classifier with Lightning <lightning_mnist_example>`
   * - Transformers
-    - :ref:`Fine-tune a Text Classifier on the Yelp Reviews Dataset with HF Transformers <transformers_torch_trainer_basic_example>`
+    - :ref:`Fine-tune a Text Classifier on the Yelp Reviews Dataset with Hugging Face Transformers <transformers_torch_trainer_basic_example>`
   * - Accelerate
-    - :ref:`Distributed Data Parallel Training with HF Accelerate <accelerate_example>`
+    - :ref:`Distributed Data Parallel Training with Hugging Face Accelerate <accelerate_example>`
   * - DeepSpeed
     - :ref:`Train with DeepSpeed ZeRO-3 <deepspeed_example>`
   * - TensorFlow
     - :ref:`Train with TensorFlow MNIST <tensorflow_mnist_example>`
   * - Horovod
-    - :ref:`End-to-end Horovod Training Example <horovod_example>`
+    - :ref:`Train with Horovod and PyTorch <horovod_example>`
 
 Intermediate
 ------------
@@ -44,9 +44,9 @@ Intermediate
   * - PyTorch
     - :ref:`Fine-tune of Stable Diffusion with DreamBooth and Ray Train <torch_finetune_dreambooth_ex>`
   * - Lightning
-    - :ref:`Model Training with PyTorch Lightning and Ray Data <lightning_advanced_example>`
+    - :ref:`Train with PyTorch Lightning and Ray Data <lightning_advanced_example>`
   * - Accelerate
-    - :ref:`Fine-tune a text classifier on GLUE Benchmark with HF Accelerate <train_transformers_accelerate_example>`
+    - :ref:`Fine-tune a text classifier on GLUE Benchmark with Hugging Face Accelerate <train_transformers_accelerate_example>`
 
 
 Advanced
diff --git a/doc/source/train/examples/accelerate/accelerate_example.rst b/doc/source/train/examples/accelerate/accelerate_example.rst
index e082bf11f2a30..086082d090f15 100644
--- a/doc/source/train/examples/accelerate/accelerate_example.rst
+++ b/doc/source/train/examples/accelerate/accelerate_example.rst
@@ -2,8 +2,8 @@
 
 .. _accelerate_example:
 
-Distributed Training Example with Hugging Face Accelerate
-=========================================================
+Distributed Training with Hugging Face Accelerate
+=================================================
 
 This example does distributed data parallel training
 with Hugging Face (HF) Accelerate, Ray Train, and Ray Data.
@@ -19,7 +19,6 @@ Code example
 See also
 --------
 
-For a tutorial on using Ray Train and HF Accelerate, 
-see :ref:`Training with Hugging Face Accelerate <train-hf-accelerate>`.
+* :ref:`Get Started with Hugging Face Accelerate <train-hf-accelerate>` for a tutorial on using Ray Train and HF Accelerate
 
-For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
+* :ref:`Ray Train Examples <train-examples>` for more use cases
diff --git a/doc/source/train/examples/deepspeed/deepspeed_example.rst b/doc/source/train/examples/deepspeed/deepspeed_example.rst
index b35311546dec9..5ed89be69d7a7 100644
--- a/doc/source/train/examples/deepspeed/deepspeed_example.rst
+++ b/doc/source/train/examples/deepspeed/deepspeed_example.rst
@@ -2,7 +2,23 @@
 
 .. _deepspeed_example:
 
-DeepSpeed ZeRO-3 Distributed Training Example with Ray Train
-============================================================
+Train with DeepSpeed ZeRO-3 and Ray Train
+=========================================
+
+This is an intermediate example that shows how to do distributed training with DeepSpeed ZeRO-3 and Ray Train.
+It demonstrates how to use :ref:`Ray Dataset <data>` with DeepSpeed ZeRO-3 and Ray Train.
+If you just want to quickly convert your existing TorchTrainer scripts into Ray Train, you can refer to the :ref:`Train with DeepSpeed <train-deepspeed>`.
+
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/deepspeed/deepspeed_torch_trainer.py
+
+
+See also
+--------
+
+* :ref:`Ray Train Examples <train-examples>` for more use cases.
+
+* :ref:`Get Started with DeepSpeed <train-horovod>` for a tutorial.    
diff --git a/doc/source/train/examples/horovod/horovod_example.rst b/doc/source/train/examples/horovod/horovod_example.rst
index 0593a275be095..5a88fc22c0616 100644
--- a/doc/source/train/examples/horovod/horovod_example.rst
+++ b/doc/source/train/examples/horovod/horovod_example.rst
@@ -2,7 +2,19 @@
 
 .. _horovod_example:
 
-Horovod Distributed Training Example with PyTorch & Ray Train
-=============================================================
+Run Horovod Distributed Training with PyTorch and Ray Train
+===========================================================
+
+This basic example demonstrates how to run Horovod distributed training with PyTorch and Ray Train.
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/horovod/horovod_example.py
+
+
+See also
+--------
+
+* :ref:`Get Started with Horovod <train-horovod>` for a tutorial on using Horovod with Ray Train
+* :ref:`Ray Train Examples <train-examples>` for more use cases
diff --git a/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb b/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
index 13b2697343559..e2e007d0f7961 100644
--- a/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
+++ b/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
@@ -11,13 +11,13 @@
     "\n",
     ":::{note}\n",
     "\n",
-    "This is an intermediate example demonstrates how to use {ref}`Ray Dataset <data>` with PyTorch Lightning in Ray Train.\n",
+    "This is an intermediate example demonstrates how to use [Ray Dataset](data) with PyTorch Lightning in Ray Train.\n",
     "\n",
-    "If you just want to quickly convert your existing PyTorch Lightning scripts into Ray Train, you can refer to the {ref}`Lightning Quick Start Guide <train-pytorch-lightning>`.\n",
+    "If you just want to quickly convert your existing PyTorch Lightning scripts into Ray Train, you can refer to the [Lightning Quick Start Guide](train-pytorch-lightning).\n",
     "\n",
     ":::\n",
     "\n",
-    "In this demo, we will introduce how to fine-tune a text classifier on the [CoLA(The Corpus of Linguistic Acceptability)](https://nyu-mll.github.io/CoLA/) dataset using a pre-trained BERT model. In particular, we will:\n",
+    "This demo introduces how to fine-tune a text classifier on the [CoLA(The Corpus of Linguistic Acceptability)](https://nyu-mll.github.io/CoLA/) dataset using a pre-trained BERT model. In particular, it follows three steps:\n",
     "- Preprocess the CoLA dataset with Ray Data.\n",
     "- Define a training function with PyTorch Lightning.\n",
     "- Launch distributed training with Ray Train's TorchTrainer."
@@ -58,7 +58,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Let's start by importing the needed libraries:"
+    "Start by importing the needed libraries:"
    ]
   },
   {
@@ -90,6 +90,11 @@
     "from datasets import load_dataset, load_metric"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
   {
    "attachments": {},
    "cell_type": "markdown",
@@ -97,7 +102,7 @@
    "source": [
     "## Pre-process CoLA Dataset\n",
     "\n",
-    "CoLA is a dataset for binary sentence classification with 10.6K training examples. First, we download the dataset and metrics using the Hugging Face datasets API, and create a  Ray Dataset for each split accordingly."
+    "CoLA is a dataset for binary sentence classification with 10.6K training examples. First, download the dataset and metrics using the Hugging Face datasets API, and create a Ray Dataset for each split accordingly."
    ]
   },
   {
@@ -117,9 +122,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Next, tokenize the input sentences and pad the ID sequence to length 128 using the `bert-base-uncased` tokenizer. The {meth}`map_batches <ray.data.Dataset.map_batches>` will apply this preprocessing function on all data samples."
+    "Next, tokenize the input sentences and pad the ID sequence to length 128 using the `bert-base-uncased` tokenizer. The {meth}`map_batches <ray.data.Dataset.map_batches>` applies this preprocessing function on all data samples."
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
   {
    "cell_type": "code",
    "execution_count": 5,
@@ -148,9 +158,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Define a PyTorch Lightning Model\n",
+    "## Define a PyTorch Lightning model\n",
     "\n",
-    "You don't have to make any change of your `LightningModule` definition. Just copy and paste your code here:"
+    "You don't have to make any changes to your `LightningModule` definition. Just copy and paste your code here:"
    ]
   },
   {
@@ -214,9 +224,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Define your Training Function\n",
+    "## Define a training function\n",
     "\n",
-    "Define a function that includes all your lightning training logics. This function will be launched by {class}`TorchTrainer <ray.train.torch.TorchTrainer>` on each worker in parallel. \n"
+    "Define a [training function](train-overview-training-function) that includes all of your lightning training logic. {class}`TorchTrainer <ray.train.torch.TorchTrainer>` launches this function on each worker in parallel. \n"
    ]
   },
   {
@@ -284,11 +294,11 @@
     "- {class}`~ray.train.lightning.RayTrainReportCallback`\n",
     "\n",
     "\n",
-    "To ingest Ray Data with Lightning Trainer, we need to take the following 3 steps:\n",
+    "To ingest Ray Data with Lightning Trainer, follow these three steps:\n",
     "\n",
     "- Feed the full Ray dataset to Ray `TorchTrainer` (details in the next section).\n",
     "- Use {meth}`ray.train.get_dataset_shard <ray.train.get_dataset_shard>` to fetch the sharded dataset on each worker.\n",
-    "- Use {meth}`ds.iter_torch_batches <ray.data.Dataset.iter_torch_batches>` to create a Ray data Loader for Lightning Trainer.\n",
+    "- Use {meth}`ds.iter_torch_batches <ray.data.Dataset.iter_torch_batches>` to create a Ray data loader for Lightning Trainer.\n",
     "\n",
     ":::{seealso}\n",
     "\n",
@@ -318,11 +328,11 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Distributed training with Ray TorchTrainer\n",
+    "## Distributed training with Ray TorchTrainer\n",
     "\n",
     "Next, define a {class}`TorchTrainer <ray.train.torch.TorchTrainer>` to launch your training function on 4 GPU workers. \n",
     "\n",
-    "Here, you can pass the full Ray dataset to the `datasets` argument of ``TorchTrainer``. TorchTrainer automatically shards the datasets among multiple workers."
+    "You can pass the full Ray dataset to the `datasets` argument of ``TorchTrainer``. TorchTrainer automatically shards the datasets among multiple workers."
    ]
   },
   {
@@ -1050,7 +1060,7 @@
    "metadata": {},
    "source": [
     ":::{note}\n",
-    "Note that we are using Ray Data for data ingestion for faster preprocessing here, but you can also continue to use the native `PyTorch DataLoader` or `LightningDataModule`. See {ref}`this example <lightning_mnist_example>`. \n",
+    "Note that this examples uses Ray Data for data ingestion for faster preprocessing, but you can also continue to use the native `PyTorch DataLoader` or `LightningDataModule`. See {ref}`Train a Pytorch Lightning Image Classifier <lightning_mnist_example>`. \n",
     "\n",
     ":::"
    ]
@@ -1087,6 +1097,17 @@
    "source": [
     "result"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## See also\n",
+    "\n",
+    "* [Ray Train Examples](train-examples) for more use cases\n",
+    "\n",
+    "* [Ray Train User Guides](train-user-guides) for how-to guides"
+   ]
   }
  ],
  "metadata": {
diff --git a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
index 6686b958f9827..508ad2eeda457 100644
--- a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
+++ b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
@@ -910,9 +910,9 @@
             "source": [
                 "## See also\n",
                 "\n",
-                "For a tutorial on using Ray Train and PyTorch Lightning, see {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>`.\n",
+                "* {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>` for a tutorial on using Ray Train and PyTorch Lightning \n",
                 "\n",
-                "For more Train examples, see :ref:`Ray Train Examples <train-examples>`."
+                "* {ref}`Ray Train Examples <train-examples>` for more use cases"
             ]
         }
     ],
diff --git a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
index 05bd3e37bf8a1..46a1097627219 100644
--- a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
+++ b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
@@ -5,7 +5,11 @@
 Fine-tune of Stable Diffusion with DreamBooth and Ray Train
 ===========================================================
 
-This example shows how to do DreamBooth fine-tuning of a Stable Diffusion model using Ray Train.
+This is an intermediate example that shows how to do DreamBooth fine-tuning of a Stable Diffusion model using Ray Train.
+It demonstrates how to use :ref:`Ray Dataset <data>` with PyTorch Lightning in Ray Train.
+If you just want to quickly convert your existing Transformer scripts into Ray Train, you can refer to the :ref:`Getting Started with Transformers <train-pytorch-transformers>`.
+
+
 See the original `DreamBooth project homepage <https://dreambooth.github.io/>`_ for more details on what this fine-tuning method achieves.
 
 .. image:: https://dreambooth.github.io/DreamBooth_files/high_level.png
@@ -118,8 +122,8 @@ You can then run this training function with Ray Train's TorchTrainer:
   :end-at: trainer.fit()
   :dedent: 4
 
-Configuring the scale
-^^^^^^^^^^^^^^^^^^^^^
+Configure the scale
+^^^^^^^^^^^^^^^^^^^
 
 In the TorchTrainer, you can easily configure the scale.
 The preceding example uses the ``num_workers`` argument to specify the number
@@ -291,6 +295,6 @@ For example, for the dog subject, you can try:
 See also
 --------
 
-For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
+* :ref:`Ray Train Examples <train-examples>` for more use cases
 
-For how-to guides, see :ref:`Ray Train User Guides <train-user-guides>`.
\ No newline at end of file
+* :ref:`Ray Train User Guides <train-user-guides>` for how-to guides
\ No newline at end of file
diff --git a/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst b/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst
index c3006634b86d6..860fb745d864e 100644
--- a/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst
+++ b/doc/source/train/examples/pytorch/torch_fashion_mnist_example.rst
@@ -15,6 +15,6 @@ Code example
 See also
 --------
 
-For a tutorial on using Ray Train and PyTorch, see :ref:`Getting Started with PyTorch <train-pytorch>`.
+* :ref:`Get Started with PyTorch <train-pytorch>` for a tutorial on using Ray Train and PyTorch 
 
-For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
+* :ref:`Ray Train Examples <train-examples>` for more use cases
diff --git a/doc/source/train/examples/tf/tensorflow_mnist_example.rst b/doc/source/train/examples/tf/tensorflow_mnist_example.rst
index 0a03a9462d761..1c7a04a97d016 100644
--- a/doc/source/train/examples/tf/tensorflow_mnist_example.rst
+++ b/doc/source/train/examples/tf/tensorflow_mnist_example.rst
@@ -2,7 +2,20 @@
 
 .. _tensorflow_mnist_example:
 
-Running Distributed Training of a TensorFlow Model on MNIST with Ray Train
-==========================================================================
+Training with TensorFlow and Ray Train
+======================================
+
+This basic example runs distributed training of a TensorFlow model on MNIST with Ray Train.
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/tf/tensorflow_mnist_example.py
+
+
+See also
+--------
+
+* :ref:`Ray Train Examples <train-examples>` for more use cases.
+
+* :ref:`Distributed Tensorflow & Keras <train-tensorflow-overview>` for a tutorial. 
\ No newline at end of file
diff --git a/doc/source/train/examples/transformers/huggingface_text_classification.ipynb b/doc/source/train/examples/transformers/huggingface_text_classification.ipynb
index 15cc0c916d188..9551e0ee543e7 100644
--- a/doc/source/train/examples/transformers/huggingface_text_classification.ipynb
+++ b/doc/source/train/examples/transformers/huggingface_text_classification.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "(train_transformers_accelerate_example)=\n",
     "\n",
-    "# Fine-tune a 🤗 Transformers model"
+    "# Fine-tune a Hugging Face Transformers Model"
    ]
   },
   {
@@ -15,9 +15,9 @@
     "id": "VaFMt6AIhYbK"
    },
    "source": [
-    "This notebook is based on [an official 🤗 notebook - \"How to fine-tune a model on text classification\"](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). The main aim of this notebook is to show the process of conversion from vanilla 🤗 to Ray Train without changing the training logic unless necessary.\n",
+    "This notebook is based on an official Hugging Face  (HF) notebook, [How to fine-tune a model on text classification](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). This notebook shows the process of conversion from vanilla HF to Ray Train without changing the training logic unless necessary.\n",
     "\n",
-    "In this notebook, we will:\n",
+    "This notebook consists of the following steps:\n",
     "1. [Set up Ray](#setup)\n",
     "2. [Load the dataset](#load)\n",
     "3. [Preprocess the dataset with Ray Data](#preprocess)\n",
@@ -31,7 +31,7 @@
     "id": "sQbdfyWQhYbO"
    },
    "source": [
-    "Uncomment and run the following line in order to install all the necessary dependencies (this notebook is being tested with `transformers==4.19.1`):"
+    "Uncomment and run the following line to install all the necessary dependencies. (This notebook is being tested with `transformers==4.19.1`.):"
    ]
   },
   {
@@ -60,7 +60,7 @@
     "id": "LRdL3kWBhYbQ"
    },
    "source": [
-    "We will use `ray.init()` to initialize a local cluster. By default, this cluster will be comprised of only the machine you are running this notebook on. You can also run this notebook on an Anyscale cluster."
+    "Use `ray.init()` to initialize a local cluster. By default, this cluster contains only the machine you are running this notebook on. You can also run this notebook on an [Anyscale](https://www.anyscale.com/) cluster."
    ]
   },
   {
@@ -88,7 +88,7 @@
     "id": "oJiSdWy2hYbR"
    },
    "source": [
-    "We can check the resources our cluster is composed of. If you are running this notebook on your local machine or Google Colab, you should see the number of CPU cores and GPUs available on the said machine."
+    "Check the resources our cluster is composed of. If you are running this notebook on your local machine or Google Colab, you should see the number of CPU cores and GPUs available on the your machine."
    ]
   },
   {
@@ -127,9 +127,9 @@
     "id": "uS6oeJELhYbS"
    },
    "source": [
-    "In this notebook, we will see how to fine-tune a [🤗 Transformers](https://github.com/huggingface/transformers) model for one of the text classification task of the [GLUE Benchmark](https://gluebenchmark.com/). We will be running the training using Ray Train.\n",
+    "This notebook fine-tunes a [HF Transformers](https://github.com/huggingface/transformers) model for one of the text classification task of the [GLUE Benchmark](https://gluebenchmark.com/). It runs the training using Ray Train.\n",
     "\n",
-    "You can change those two variables to control whether the training (which we will get to later) uses CPUs or GPUs, and how many workers should be spawned. Each worker will claim one CPU or GPU. Make sure not to request more resources than the resources present! By default, we will run the training with one GPU worker."
+    "You can change these two variables to control whether the training, which happens later, uses CPUs or GPUs, and how many workers to spawn. Each worker claims one CPU or GPU. Make sure to not request more resources than the resources present. By default, the training runs with one GPU worker."
    ]
   },
   {
@@ -142,7 +142,7 @@
    "outputs": [],
    "source": [
     "use_gpu = True  # set this to False to run on CPUs\n",
-    "num_workers = 1  # set this to number of GPUs/CPUs you want to use"
+    "num_workers = 1  # set this to number of GPUs or CPUs you want to use"
    ]
   },
   {
@@ -151,7 +151,7 @@
     "id": "rEJBSTyZIrIb"
    },
    "source": [
-    "## Fine-tuning a model on a text classification task"
+    "## Fine-tune a model on a text classification task"
    ]
   },
   {
@@ -160,9 +160,9 @@
     "id": "kTCFado4IrIc"
    },
    "source": [
-    "The GLUE Benchmark is a group of nine classification tasks on sentences or pairs of sentences. If you would like to learn more, refer to the [original notebook](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb).\n",
+    "The GLUE Benchmark is a group of nine classification tasks on sentences or pairs of sentences. To learn more, see the [original notebook](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb).\n",
     "\n",
-    "Each task is named by its acronym, with `mnli-mm` standing for the mismatched version of MNLI (so same training set as `mnli` but different validation and test sets):"
+    "Each task has a name that is its acronym, with `mnli-mm` to indicate that it is a mismatched version of MNLI. Each one has the same training set as `mnli` but different validation and test sets."
    ]
   },
   {
@@ -194,7 +194,7 @@
     "id": "4RRkXuteIrIh"
    },
    "source": [
-    "This notebook is built to run on any of the tasks in the list above, with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a version with a classification head. Depending on your model and the GPU you are using, you might need to adjust the batch size to avoid out-of-memory errors. Set those three parameters, then the rest of the notebook should run smoothly:"
+    "This notebook runs on any of the tasks in the list above, with any model checkpoint from the [Model Hub](https://huggingface.co/models) as long as that model has a version with a classification head. Depending on the model and the GPU you are using, you might need to adjust the batch size to avoid out-of-memory errors. Set these three parameters, and the rest of the notebook should run smoothly:"
    ]
   },
   {
@@ -226,11 +226,11 @@
     "id": "W7QYTpxXIrIl"
    },
    "source": [
-    "We will use the [🤗 Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric we need to use for evaluation (to compare our model to the benchmark). This can be easily done with the functions `load_dataset` and `load_metric`.\n",
+    "Use the [HF Datasets](https://github.com/huggingface/datasets) library to download the data and get the metric to use for evaluation and to compare your model to the benchmark. You can do this comparison easily with the `load_dataset` and `load_metric` functions.\n",
     "\n",
-    "Apart from `mnli-mm` being a special code, we can directly pass our task name to those functions.\n",
+    "Apart from `mnli-mm` being special code, you can directly pass the task name to those functions.\n",
     "\n",
-    "We will run the normal 🤗 Datasets code to load the dataset from the Hub."
+    "Run the normal HF Datasets code to load the dataset from the Hub."
    ]
   },
   {
@@ -281,7 +281,7 @@
     "id": "RzfPtOMoIrIu"
    },
    "source": [
-    "The `dataset` object itself is [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation, and test set (with more keys for the mismatched validation and test set in the special case of `mnli`)."
+    "The `dataset` object itself is a [`DatasetDict`](https://huggingface.co/docs/datasets/package_reference/main_classes.html#datasetdict), which contains one key for the training, validation, and test set, with more keys for the mismatched validation and test set in the special case of `mnli`."
    ]
   },
   {
@@ -299,12 +299,12 @@
     "id": "YVx71GdAIrJH"
    },
    "source": [
-    "Before we can feed those texts to our model, we need to preprocess them. This is done by a 🤗 Transformers' `Tokenizer`, which will (as the name indicates) tokenize the inputs (including converting the tokens to their corresponding IDs in the pretrained vocabulary) and put it in a format the model expects, as well as generate the other inputs that model requires.\n",
+    "Before you can feed these texts to the model, you need to preprocess them. Preprocess them with a HF Transformers' `Tokenizer`, which tokenizes the inputs, including converting the tokens to their corresponding IDs in the pretrained vocabulary, and puts them in a format the model expects. It also generates the other inputs that the model requires.\n",
     "\n",
-    "To do all of this, we instantiate our tokenizer with the `AutoTokenizer.from_pretrained` method, which will ensure that:\n",
+    "To do all of this preprocessing, instantiate your tokenizer with the `AutoTokenizer.from_pretrained` method, which ensures that you:\n",
     "\n",
-    "- we get a tokenizer that corresponds to the model architecture we want to use,\n",
-    "- we download the vocabulary used when pretraining this specific checkpoint."
+    "- Get a tokenizer that corresponds to the model architecture you want to use.\n",
+    "- Download the vocabulary used when pretraining this specific checkpoint."
    ]
   },
   {
@@ -332,7 +332,7 @@
     "id": "Vl6IidfdIrJK"
    },
    "source": [
-    "We pass along `use_fast=True` to the call above to use one of the fast tokenizers (backed by Rust) from the 🤗 Tokenizers library. Those fast tokenizers are available for almost all models, but if you got an error with the previous call, remove that argument."
+    "Pass `use_fast=True` to the preceding call to use one of the fast tokenizers, backed by Rust, from the HF Tokenizers library. These fast tokenizers are available for almost all models, but if you get an error with the previous call, remove the argument."
    ]
   },
   {
@@ -341,7 +341,7 @@
     "id": "qo_0B1M2IrJM"
    },
    "source": [
-    "To preprocess our dataset, we will thus need the names of the columns containing the sentence(s). The following dictionary keeps track of the correspondence task to column names:"
+    "To preprocess the dataset, you need the names of the columns containing the sentence(s). The following dictionary keeps track of the correspondence task to column names:"
    ]
   },
   {
@@ -373,7 +373,7 @@
     "id": "256fOuzjhYbY"
    },
    "source": [
-    "Instead of using 🤗 Dataset objects directly, we will convert them to [Ray Data](data). Both are backed by Arrow tables, so the conversion is straightforward. We will use the built-in {meth}`~ray.data.from_huggingface` function."
+    "Instead of using HF Dataset objects directly, convert them to [Ray Data](data). Arrow tables back both of them, so the conversion is straightforward. Use the built-in {meth}`~ray.data.from_huggingface` function."
    ]
   },
   {
@@ -425,7 +425,7 @@
     "id": "2C0hcmp9IrJQ"
    },
    "source": [
-    "We can then write the function that will preprocess our samples. We just feed them to the `tokenizer` with the argument `truncation=True`. This will ensure that an input longer than what the model selected can handle will be truncated and pad to the longest sequence in the batch."
+    "You can then write the function that preprocesses the samples. Feed them to the `tokenizer` with the argument `truncation=True`. This configuration ensures that the `tokenizer` truncates and pads to the longest sequence in the batch, any input longer than what the model selected can handle."
    ]
   },
   {
@@ -484,11 +484,11 @@
     "id": "FBiW8UpKIrJW"
    },
    "source": [
-    "Now that our data is ready, we can download the pretrained model and fine-tune it.\n",
+    "Now that the data is ready, download the pretrained model and fine-tune it.\n",
     "\n",
-    "Since all of our tasks involve sentence classification, we will use the `AutoModelForSequenceClassification` class. We will not delve into the specifics of each individual training component. For more information, see the [original notebook](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). The tokenizer used is the same one we used to encode the dataset previously.\n",
+    "Because all of the tasks involve sentence classification, use the `AutoModelForSequenceClassification` class. For more specifics about each individual training component, see the [original notebook](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). The original notebook uses the same tokenizer used to encode the dataset in this notebook's preceding example.\n",
     "\n",
-    "The main difference when using Ray Train is that we need to define our training logic as a function (`train_func`). This function will be passed to the {class}`~ray.train.torch.TorchTrainer` and will run on every Ray worker. The training will then proceed using PyTorch DDP.\n",
+    "The main difference when using Ray Train is that you need to define the training logic as a function (`train_func`). You pass this [training function](train-overview-training-function) to the {class}`~ray.train.torch.TorchTrainer` to on every Ray worker. The training then proceeds using PyTorch DDP.\n",
     "\n",
     "\n",
     "```{note}\n",
@@ -622,7 +622,7 @@
     "id": "CdzABDVcIrJg"
    },
    "source": [
-    "With our `train_func` complete, we can now instantiate the {class}`~ray.train.torch.TorchTrainer`. Aside from the function, we set the `scaling_config`, controlling the amount of workers and resources used, and the `datasets` we will use for training and evaluation."
+    "With your `train_func` complete, you can now instantiate the {class}`~ray.train.torch.TorchTrainer`. Aside from calling the function, set the `scaling_config`, which controls the amount of workers and resources used, and the `datasets` to use for training and evaluation."
    ]
   },
   {
@@ -660,7 +660,7 @@
     "id": "XvS136zKhYba"
    },
    "source": [
-    "Finally, we call the `fit` method to start training with Ray Train. We will save the `Result` object to a variable so we can access metrics and checkpoints."
+    "Finally, call the `fit` method to start training with Ray Train. Save the `Result` object to a variable so you can access metrics and checkpoints."
    ]
   },
   {
@@ -1061,9 +1061,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "If we would like to tune any hyperparameters of the model, we can do so by simply passing our `TorchTrainer` into a `Tuner` and defining the search space.\n",
+    "To tune any hyperparameters of the model, pass your `TorchTrainer` into a `Tuner` and define the search space.\n",
     "\n",
-    "We can also take advantage of the advanced search algorithms and schedulers provided by Ray Tune. In this example, we will use an `ASHAScheduler` to aggresively terminate underperforming trials."
+    "You can also take advantage of the advanced search algorithms and schedulers from Ray Tune. This example uses an `ASHAScheduler` to aggresively terminate underperforming trials."
    ]
   },
   {
@@ -1744,7 +1744,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We can view the results of the tuning run as a dataframe, and obtain the best result."
+    "View the results of the tuning run as a dataframe, and find the best result."
    ]
   },
   {
@@ -1969,11 +1969,11 @@
     "id": "mS8PId_NhYbb"
    },
    "source": [
-    "To be able to share your model with the community, there are a few more steps to follow.\n",
+    "To share the model with the community, a few more steps follow.\n",
     "\n",
-    "We have conducted the training on the Ray cluster, but share the model from the local enviroment - this will allow us to easily authenticate.\n",
+    "You conducted the training on the Ray cluster, but want share the model from the local enviroment. This configuration allows you to easily authenticate.\n",
     "\n",
-    "First you have to store your authentication token from the Hugging Face website (sign up [here](https://huggingface.co/join) if you haven't already!) then execute the following cell and input your username and password:"
+    "First, store your authentication token from the Hugging Face website. Sign up [here](https://huggingface.co/join) if you haven't already. Then execute the following cell and input your username and password:"
    ]
   },
   {
@@ -2021,7 +2021,7 @@
     "id": "5fr6E0e8hYbb"
    },
    "source": [
-    "Now, load the model and tokenizer locally, and recreate the 🤗 Transformers `Trainer`:"
+    "Now, load the model and tokenizer locally, and recreate the HF Transformers `Trainer`:"
    ]
   },
   {
@@ -2047,9 +2047,16 @@
     "id": "tgV2xKfFhYbc"
    },
    "source": [
-    "You can now upload the result of the training to the Hub, just execute this instruction:"
+    "You can now upload the result of the training to the Hub. Execute this instruction:"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -2070,7 +2077,7 @@
     "id": "UL-Boc4dhYbc"
    },
    "source": [
-    "You can now share this model with all your friends, family, favorite pets: they can all load it with the identifier `\"your-username/the-name-you-picked\"` so for instance:\n",
+    "You can now share this model. Others can load it with the identifier `\"your-username/the-name-you-picked\"`. For example:\n",
     "\n",
     "```python\n",
     "from transformers import AutoModelForSequenceClassification\n",
@@ -2083,9 +2090,10 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Next steps\n",
+    "## See also\n",
     "\n",
-    "- {ref}`End-to-end: Offline Batch Inference <batch_inference_home>`"
+    "* {ref}`Ray Train Examples <train-examples>` for more use cases\n",
+    "* {ref}`Ray Train User Guides <train-user-guides>` for how-to guides\n"
    ]
   }
  ],
diff --git a/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst b/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst
index 587fa1673dda7..c7259be27f33e 100644
--- a/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst
+++ b/doc/source/train/examples/transformers/transformers_torch_trainer_basic.rst
@@ -16,7 +16,6 @@ Code example
 See also
 --------
 
-For a tutorial on using Ray Train and Transformers, 
-see :ref:`Getting Started with Hugging Face Transformers <train-pytorch-transformers>`.
+* :ref:`Get Started with Hugging Face Transformers <train-pytorch-transformers>` for a tutorial
 
-For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
+* :ref:`Ray Train Examples <train-examples>` for more use cases
diff --git a/doc/source/train/getting-started-pytorch-lightning.rst b/doc/source/train/getting-started-pytorch-lightning.rst
index d9ea9ed540ffb..8349c25324463 100644
--- a/doc/source/train/getting-started-pytorch-lightning.rst
+++ b/doc/source/train/getting-started-pytorch-lightning.rst
@@ -1,21 +1,21 @@
 .. _train-pytorch-lightning:
 
-Getting Started with PyTorch Lightning
-======================================
+Get Started with PyTorch Lightning
+==================================
 
 This tutorial walks through the process of converting an existing PyTorch Lightning script to use Ray Train.
 
 Learn how to:
 
-1. Configure your Lightning Trainer so that it runs distributed with Ray and is placed on the correct CPU/GPU device.
-2. Configure your training function to report metrics and save checkpoints.
-3. Configure scale and CPU/GPU resource requirements for your training job.
-4. Launch your distributed training job with a :class:`~ray.train.torch.TorchTrainer`.
+1. Configure the Lightning Trainer so that it runs distributed with Ray and on the correct CPU or GPU device.
+2. Configure the training function to report metrics and save checkpoints.
+3. Configure scale and CPU or GPU resource requirements for a training job.
+4. Launch a distributed training job with a :class:`~ray.train.torch.TorchTrainer`.
 
 Quickstart
 ----------
 
-For reference, the final code follows:
+For reference, the final code is as follows:
 
 .. code-block:: python
 
@@ -147,11 +147,11 @@ Compare a PyTorch Lightning training script with and without Ray Train.
             result = trainer.fit()            
 
 
-Set up the training function
-----------------------------
+Set up a training function
+--------------------------
 
 First, update your training code to support distributed training. 
-Begin by wrapping your code in a function:
+Begin by wrapping your code in a :ref:`training function <train-overview-training-function>`:
 
 .. code-block:: python
 
@@ -189,12 +189,12 @@ make a few changes to your Lightning Trainer definition.
         
          trainer.fit(model, datamodule=datamodule)
 
-We now go over each change.
+The following sections discuss each change.
 
-Configuring distributed strategy
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Configure the distributed strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Ray Train offers several subclassed distributed strategies for Lightning. 
+Ray Train offers several sub-classed distributed strategies for Lightning. 
 These strategies retain the same argument list as their base strategy classes. 
 Internally, they configure the root device and the distributed 
 sampler arguments.
@@ -220,11 +220,11 @@ sampler arguments.
          )
          ...
 
-Configuring Ray cluster environment plugin
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Configure the Ray cluster environment plugin
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-Ray Train also provides :class:`~ray.train.lightning.RayLightningEnvironment` 
-as a specification for Ray Cluster. This utility class configures the worker's 
+Ray Train also provides a :class:`~ray.train.lightning.RayLightningEnvironment` class
+as a specification for the Ray Cluster. This utility class configures the worker's 
 local, global, and node rank and world size.
 
 
@@ -245,8 +245,8 @@ local, global, and node rank and world size.
          ...
 
 
-Configuring parallel devices
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Configure parallel devices
+^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 In addition, Ray TorchTrainer has already configured the correct 
 ``CUDA_VISIBLE_DEVICES`` for you. One should always use all available 
@@ -270,8 +270,8 @@ GPUs by setting ``devices="auto"`` and ``acelerator="auto"``.
 
 
 
-Reporting checkpoints and metrics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Report checkpoints and metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 To persist your checkpoints and monitor training progress, add a 
 :class:`ray.train.lightning.RayTrainReportCallback` utility callback to your Trainer. 
@@ -293,10 +293,10 @@ To persist your checkpoints and monitor training progress, add a
 
 
 Reporting metrics and checkpoints to Ray Train enables you to support :ref:`fault-tolerant training <train-fault-tolerance>` and :ref:`hyperparameter optimization <train-tune>`. 
-Note that the :class:`ray.train.lightning.RayTrainReportCallback` only provides a simple implementation, and can be :ref:`further customized <train-dl-saving-checkpoints>`.
+Note that the :class:`ray.train.lightning.RayTrainReportCallback` class only provides a simple implementation, and can be :ref:`further customized <train-dl-saving-checkpoints>`.
 
-Preparing your Lightning Trainer
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Prepare your Lightning Trainer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Finally, pass your Lightning Trainer into
 :meth:`~ray.train.lightning.prepare_trainer` to validate 
@@ -315,8 +315,8 @@ your configurations.
          ...
 
 
-Configuring scale and GPUs
----------------------------
+Configure scale and GPUs
+------------------------
 
 Outside of your training function, create a :class:`~ray.train.ScalingConfig` object to configure:
 
@@ -331,8 +331,8 @@ Outside of your training function, create a :class:`~ray.train.ScalingConfig` ob
 
 For more details, see :ref:`train_scaling_config`.
 
-Launching your training job
----------------------------
+Launch a training job
+---------------------
 
 Tying this all together, you can now launch a distributed training job 
 with a :class:`~ray.train.torch.TorchTrainer`.
@@ -344,12 +344,12 @@ with a :class:`~ray.train.torch.TorchTrainer`.
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-Please also refer to :ref:`train-run-config` for more configuration options for `TorchTrainer`.
+See :ref:`train-run-config` for more configuration options for `TorchTrainer`.
 
-Accessing training results
---------------------------
+Access training results
+-----------------------
 
-After training completes, a :class:`~ray.train.Result` object will be returned which contains
+After training completes, Ray Train returns a :class:`~ray.train.Result` object, which contains
 information about the training run, including the metrics and checkpoints reported during training.
 
 .. code-block:: python
@@ -368,7 +368,7 @@ After you have converted your PyTorch Lightning training script to use Ray Train
 
 * See :ref:`User Guides <train-user-guides>` to learn more about how to perform specific tasks.
 * Browse the :ref:`Examples <train-examples>` for end-to-end examples of how to use Ray Train.
-* Dive into the :ref:`API Reference <train-api>` for more details on the classes and methods used in this tutorial.
+* Consult the :ref:`API Reference <train-api>` for more details on the classes and methods from this tutorial.
 
 Version Compatibility
 ---------------------
@@ -379,21 +379,20 @@ Earlier versions aren't prohibited but may result in unexpected issues. If you r
 
 .. _lightning-trainer-migration-guide:
 
-``LightningTrainer`` Migration Guide
-------------------------------------
+LightningTrainer Migration Guide
+--------------------------------
 
-The `LightningTrainer` was added in Ray 2.4, and exposes a  
+Ray 2.4 introduced the `LightningTrainer`, and exposed a  
 `LightningConfigBuilder` to define configurations for `pl.LightningModule` 
 and `pl.Trainer`. 
 
 It then instantiates the model and trainer objects and runs a pre-defined 
-training loop in a black box.
-
+training function in a black box.
 
 This version of the LightningTrainer API was constraining and limited 
-the users' ability to manage the training functionality.
+your ability to manage the training functionality.
 
-Ray 2.7 introduces the newly unified :class:`~ray.train.torch.TorchTrainer` API, which offers 
+Ray 2.7 introduced the newly unified :class:`~ray.train.torch.TorchTrainer` API, which offers 
 enhanced transparency, flexibility, and simplicity. This API is more aligned
 with standard PyTorch Lightning scripts, ensuring users have better 
 control over their native Lightning code.
diff --git a/doc/source/train/getting-started-pytorch.rst b/doc/source/train/getting-started-pytorch.rst
index 08b78c6a8de43..4ffa26fa8ade7 100644
--- a/doc/source/train/getting-started-pytorch.rst
+++ b/doc/source/train/getting-started-pytorch.rst
@@ -1,22 +1,22 @@
 .. _train-pytorch:
 
-Getting Started with PyTorch
-============================
+Get Started with PyTorch
+========================
 
 This tutorial walks through the process of converting an existing PyTorch script to use Ray Train.
 
 Learn how to:
 
 1. Configure a model to run distributed and on the correct CPU/GPU device.
-2. Configure a dataloader to shard data across the workers and place data on the correct CPU/GPU device.
+2. Configure a dataloader to shard data across the workers and place data on the correct CPU or GPU device.
 3. Configure a training function to report metrics and save checkpoints.
-4. Configure scale and CPU/GPU resource requirements for a training job.
+4. Configure scale and CPU or GPU resource requirements for a training job.
 5. Launch a distributed training job with a :class:`~ray.train.torch.TorchTrainer` class.
 
 Quickstart
 ----------
 
-For reference, the final code follows:
+For reference, the final code is as follows:
 
 .. code-block:: python
 
@@ -131,11 +131,11 @@ Compare a PyTorch training script with and without Ray Train.
             trainer = TorchTrainer(train_func, scaling_config=scaling_config)
             result = trainer.fit()
 
-Setting up your training function
----------------------------------
+Set up a training function
+--------------------------
 
 First, update your training code to support distributed training. 
-You can begin by wrapping your code in a :ref:`training function <train-overview-training-function>`:
+Begin by wrapping your code in a :ref:`training function <train-overview-training-function>`:
 
 .. code-block:: python
 
@@ -144,12 +144,12 @@ You can begin by wrapping your code in a :ref:`training function <train-overview
 
 Each distributed training worker executes this function.
 
-Setting up your model
-^^^^^^^^^^^^^^^^^^^^^
+Set up a model
+^^^^^^^^^^^^^^
 
 Use the :func:`ray.train.torch.prepare_model` utility function to:
 
-1. Move your model to the right device.
+1. Move your model to the correct device.
 2. Wrap it in ``DistributedDataParallel``.
 
 .. code-block:: diff
@@ -172,8 +172,8 @@ Use the :func:`ray.train.torch.prepare_model` utility function to:
          
          ...
 
-Setting up your dataset
-^^^^^^^^^^^^^^^^^^^^^^^
+Set up a dataset
+^^^^^^^^^^^^^^^^
 
 .. TODO: Update this to use Ray Data.
 
@@ -216,8 +216,8 @@ See :ref:`data-ingest-torch`.
         global_batch_size = worker_batch_size * ray.train.get_context().get_world_size()
 
 
-Reporting checkpoints and metrics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Report checkpoints and metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 To monitor progress, you can report intermediate metrics and checkpoints using the :func:`ray.train.report` utility function.
 
@@ -239,8 +239,8 @@ To monitor progress, you can report intermediate metrics and checkpoints using t
 For more details, see :ref:`train-monitoring-and-logging` and :ref:`train-checkpointing`.
 
 
-Configuring scale and GPUs
----------------------------
+Configure scale and GPUs
+------------------------
 
 Outside of your training function, create a :class:`~ray.train.ScalingConfig` object to configure:
 
@@ -255,8 +255,8 @@ Outside of your training function, create a :class:`~ray.train.ScalingConfig` ob
 
 For more details, see :ref:`train_scaling_config`.
 
-Launching your training job
----------------------------
+Launch a training job
+---------------------
 
 Tying this all together, you can now launch a distributed training job 
 with a :class:`~ray.train.torch.TorchTrainer`.
@@ -268,8 +268,8 @@ with a :class:`~ray.train.torch.TorchTrainer`.
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-Accessing training results
---------------------------
+Access training results
+-----------------------
 
 After training completes, a :class:`~ray.train.Result` object is returned which contains
 information about the training run, including the metrics and checkpoints reported during training.
diff --git a/doc/source/train/getting-started-transformers.rst b/doc/source/train/getting-started-transformers.rst
index 9c9f84081b95f..3bc38c592f7b3 100644
--- a/doc/source/train/getting-started-transformers.rst
+++ b/doc/source/train/getting-started-transformers.rst
@@ -1,7 +1,7 @@
 .. _train-pytorch-transformers:
 
-Getting Started with Hugging Face Transformers
-==============================================
+Get Started with Hugging Face Transformers
+==========================================
 
 This tutorial walks through the process of converting an existing Hugging Face Transformers script to use Ray Train.
 
@@ -28,9 +28,9 @@ For reference, the final code follows:
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-1. Your `train_func` is the Python code that is executed on each distributed training worker.
-2. Your :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and computing resources (e.g. GPUs).
-3. Your :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.
+1. `train_func` is the Python code that executes on each distributed training worker.
+2. :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and computing resources (e.g. GPUs).
+3. :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.
 
 Compare a Hugging Face Transformers training script with and without Ray Train.
 
@@ -171,8 +171,8 @@ Compare a Hugging Face Transformers training script with and without Ray Train.
             ray_trainer.fit()
 
 
-Setting up your training function
----------------------------------
+Set up a training function
+--------------------------
 
 First, update your training code to support distributed training. 
 You can begin by wrapping your code in a function:
@@ -182,21 +182,21 @@ You can begin by wrapping your code in a function:
     def train_func(config):
         # Your Transformers training code here.
 
-This function is executed on each distributed training worker. Ray Train will set up the distributed 
+This function executes on each distributed training worker. Ray Train sets up the distributed 
 process group on each worker before entering this function.
 
-Please put all the logics into this function, including dataset construction and preprocessing, 
+Put all the logic into this function, including dataset construction and preprocessing, 
 model initialization, transformers trainer definition and more.
 
 .. note::
 
     If you are using Hugging Face Datasets or Evaluate, make sure to call ``datasets.load_dataset`` and ``evaluate.load`` 
-    inside the training function. Do not pass the loaded datasets and metrics from outside of the training 
+    inside the training function. Don't pass the loaded datasets and metrics from outside of the training 
     function, because it might cause serialization errors while transferring the objects to the workers.
 
 
-Reporting checkpoints and metrics
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Report checkpoints and metrics
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 To persist your checkpoints and monitor training progress, add a 
 :class:`ray.train.huggingface.transformers.RayTrainReportCallback` utility callback to your Trainer. 
@@ -215,11 +215,11 @@ To persist your checkpoints and monitor training progress, add a
 
 
 Reporting metrics and checkpoints to Ray Train ensures that you can use Ray Tune and :ref:`fault-tolerant training <train-fault-tolerance>`. 
-Note that the :class:`ray.train.huggingface.transformers.RayTrainReportCallback` only provides a simple implementation, and can be :ref:`further customized <train-dl-saving-checkpoints>`.
+Note that the :class:`ray.train.huggingface.transformers.RayTrainReportCallback` only provides a simple implementation, and you can :ref:`further customize <train-dl-saving-checkpoints>` it.
 
 
-Preparing your Transformers Trainer
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Prepare a Transformers Trainer
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Finally, pass your Transformers Trainer into
 :meth:`~ray.train.huggingface.transformers.prepare_trainer` to validate 
@@ -239,8 +239,8 @@ your configurations and enable Ray Data Integration.
          ...
 
 
-Configuring scale and GPUs
----------------------------
+Configure scale and GPUs
+------------------------
 
 Outside of your training function, create a :class:`~ray.train.ScalingConfig` object to configure:
 
@@ -255,8 +255,8 @@ Outside of your training function, create a :class:`~ray.train.ScalingConfig` ob
 
 For more details, see :ref:`train_scaling_config`.
 
-Launching your training job
----------------------------
+Launch a training job
+---------------------
 
 Tying this all together, you can now launch a distributed training job 
 with a :class:`~ray.train.torch.TorchTrainer`.
@@ -270,8 +270,8 @@ with a :class:`~ray.train.torch.TorchTrainer`.
 
 Refer to :ref:`train-run-config` for more configuration options for `TorchTrainer`.
 
-Accessing training results
---------------------------
+Access training results
+-----------------------
 
 After training completes, a :class:`~ray.train.Result` object is returned which contains
 information about the training run, including the metrics and checkpoints reported during training.
@@ -297,15 +297,15 @@ After you have converted your Hugging Face Transformers training script to use R
 
 .. _transformers-trainer-migration-guide:
 
-``TransformersTrainer`` Migration Guide
----------------------------------------
+TransformersTrainer Migration Guide
+-----------------------------------
 
-The `TransformersTrainer` was added in Ray 2.1. It exposes a `trainer_init_per_worker` interface 
-to define `transformers.Trainer`, then runs a pre-defined training loop in a black box.
+Ray 2.1 introduced the `TransformersTrainer`, which exposes a `trainer_init_per_worker` interface 
+to define `transformers.Trainer`, then runs a pre-defined training function in a black box.
 
-Ray 2.7 introduces the newly unified :class:`~ray.train.torch.TorchTrainer` API, 
-which offers enhanced transparency, flexibility, and simplicity. This API is more aligned
-with standard Hugging Face Transformers scripts, ensuring users have better control over their 
+Ray 2.7 introduced the newly unified :class:`~ray.train.torch.TorchTrainer` API, 
+which offers enhanced transparency, flexibility, and simplicity. This API aligns more
+with standard Hugging Face Transformers scripts, ensuring that you have better control over your 
 native Transformers training code.
 
 
diff --git a/doc/source/train/horovod.rst b/doc/source/train/horovod.rst
index 1165eaccd5274..6632c8f9164a0 100644
--- a/doc/source/train/horovod.rst
+++ b/doc/source/train/horovod.rst
@@ -1,9 +1,12 @@
-Horovod
-=======
+
+.. _train-horovod:
+
+Get Started with Horovod
+========================
 
 Ray Train configures the Horovod environment and Rendezvous
 server for you, allowing you to run your ``DistributedOptimizer`` training
-script. See `Horovod documentation <https://horovod.readthedocs.io/en/stable/index.html>`_
+script. See the `Horovod documentation <https://horovod.readthedocs.io/en/stable/index.html>`_
 for more information.
 
 Quickstart
@@ -13,10 +16,10 @@ Quickstart
 
 
 
-Updating your training function
--------------------------------
+Update your training function
+-----------------------------
 
-First, update your training function to support distributed
+First, update your :ref:`training function <train-overview-training-function>` to support distributed
 training.
 
 If you have a training function that already runs with the `Horovod Ray
@@ -27,11 +30,11 @@ To onboard onto Horovod, visit the `Horovod guide
 <https://horovod.readthedocs.io/en/stable/index.html#get-started>`_.
 
 
-Creating a :class:`~ray.train.horovod.HorovodTrainer`
------------------------------------------------------
+Create a HorovodTrainer
+-----------------------
 
-``Trainer``\s are the primary Ray Train classes that are used to manage state and
-execute training. For Horovod, we use a :class:`~ray.train.horovod.HorovodTrainer`
+``Trainer``\s are the primary Ray Train classes to use to manage state and
+execute training. For Horovod, use a :class:`~ray.train.horovod.HorovodTrainer`
 that you can setup like this:
 
 .. code-block:: python
@@ -45,7 +48,7 @@ that you can setup like this:
         scaling_config=ScalingConfig(use_gpu=use_gpu, num_workers=2)
     )
 
-When training with Horovod, we will always use a HorovodTrainer,
+When training with Horovod, always use a HorovodTrainer,
 irrespective of the training framework, for example, PyTorch or TensorFlow.
 
 To customize the backend setup, you can pass a
@@ -64,8 +67,8 @@ To customize the backend setup, you can pass a
 
 For more configurability, see the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
 
-Running your training function
-------------------------------
+Run a training function
+-----------------------
 
 With a distributed training function and a Ray Train ``Trainer``, you are now
 ready to start training.
@@ -77,6 +80,7 @@ ready to start training.
 
 Further reading
 ---------------
+
 Ray Train's :class:`~ray.train.horovod.HorovodTrainer` replaces the distributed
 communication backend of the native libraries with its own implementation.
 Thus, the remaining integration points remain the same. If you're using Horovod
@@ -85,6 +89,8 @@ refer to the respective guides for further configuration
 and information.
 
 If you are implementing your own Horovod-based training routine without using any of
-the training libraries, we still encourage you to read through the
-:ref:`User Guides <train-user-guides>`, as many of the contents are applicable
-to generic use cases and can be easily adapted.
+the training libraries, read through the
+:ref:`User Guides <train-user-guides>`, as you can apply much of the content
+to generic use cases and adapt them easily.
+
+
diff --git a/doc/source/train/huggingface-accelerate.rst b/doc/source/train/huggingface-accelerate.rst
index 93dc096dda3ed..320fff684c893 100644
--- a/doc/source/train/huggingface-accelerate.rst
+++ b/doc/source/train/huggingface-accelerate.rst
@@ -1,7 +1,7 @@
 .. _train-hf-accelerate:
 
-Training with Hugging Face Accelerate
-=====================================
+Get Started with Hugging Face Accelerate
+========================================
 
 The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `Accelelate <https://huggingface.co/docs/accelerate>`_  training across a distributed Ray cluster.
 
@@ -50,11 +50,11 @@ You only need to run your existing training code with a TorchTrainer. You can ex
     Model and data preparation for distributed training is completely handled by the `Accelerator <https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator>`_ 
     object and its `Accelerator.prepare() <https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator.prepare>`_  method.
     
-    Unlike with native PyTorch, PyTorch Lightning, or HuggingFace Transformers, you do **not** call any additional Ray Train utilities 
+    Unlike with native PyTorch, PyTorch Lightning, or HuggingFace Transformers, **don't** call any additional Ray Train utilities 
     like :meth:`~ray.train.torch.prepare_model` or :meth:`~ray.train.torch.prepare_data_loader` in your training function. 
 
-Configuring Accelerate
------------------------
+Configure Accelerate
+--------------------
 
 In Ray Train, you can set configurations through the `accelerate.Accelerator <https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator>`_ 
 object in your training function. Below are starter examples for configuring Accelerate.
@@ -201,8 +201,8 @@ You may also find these user guides helpful:
 - :ref:`How to use Ray Data with Ray Train <data-ingest-torch>`
 
 
-`AccelerateTrainer` Migration Guide 
------------------------------------
+AccelerateTrainer Migration Guide 
+---------------------------------
 
 Before Ray 2.7, Ray Train's :class:`AccelerateTrainer <ray.train.huggingface.AccelerateTrainer>` API was the 
 recommended way to run Accelerate code. As a subclass of :class:`TorchTrainer <ray.train.torch.TorchTrainer>`,  
@@ -210,7 +210,7 @@ the AccelerateTrainer takes in a configuration file generated by ``accelerate co
 Aside from that, the functionality of ``AccelerateTrainer`` is identical to ``TorchTrainer``.
 
 However, this caused confusion around whether this was the *only* way to run Accelerate code. 
-Because the full Accelerate functionality can be expressed with the ``Accelerator`` and ``TorchTrainer`` combination, the ``AccelerateTrainer`` will be deprecated in Ray 2.8, 
+Because you can express the full Accelerate functionality with the ``Accelerator`` and ``TorchTrainer`` combination, the plan is to deprecate the ``AccelerateTrainer`` in Ray 2.8, 
 and it's recommend to run your  Accelerate code directly with ``TorchTrainer``. 
 
 
diff --git a/doc/source/train/more-frameworks.rst b/doc/source/train/more-frameworks.rst
index 1f2dd89ff64ec..dce706c1d5368 100644
--- a/doc/source/train/more-frameworks.rst
+++ b/doc/source/train/more-frameworks.rst
@@ -29,7 +29,7 @@ More Frameworks
 
         .. button-ref:: distributed-tensorflow-keras
 
-            TensorFlow & Keras
+            TensorFlow and Keras
 
     .. grid-item-card::
         :img-top: /images/xgboost_logo.png
@@ -37,7 +37,7 @@ More Frameworks
 
         .. button-ref:: distributed-xgboost-lightgbm
 
-            XGBoost & LightGBM
+            XGBoost and LightGBM
 
     .. grid-item-card::
         :img-top: /images/horovod.png

From 7904767070ad1e962ba195d3d5411fa23879172c Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 15:46:57 -0700
Subject: [PATCH 05/13] fixed missing inline end string

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 doc/source/train/distributed-tensorflow-keras.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/train/distributed-tensorflow-keras.rst b/doc/source/train/distributed-tensorflow-keras.rst
index c8ea915a8d019..01687b916d267 100644
--- a/doc/source/train/distributed-tensorflow-keras.rst
+++ b/doc/source/train/distributed-tensorflow-keras.rst
@@ -27,7 +27,7 @@ Quickstart
 Update your training function
 -----------------------------
 
-First, update your :ref:`training function <train-overview-training-function> to support distributed
+First, update your :ref:`training function <train-overview-training-function>` to support distributed
 training.
 
 

From 828d20295cee21b3c874c6e5591194ba6d2555f3 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 16:07:24 -0700
Subject: [PATCH 06/13] trying to fix premerge error

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 python/ray/train/torch/torch_trainer.py | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/python/ray/train/torch/torch_trainer.py b/python/ray/train/torch/torch_trainer.py
index 735c9fad19665..072fdfa08c19f 100644
--- a/python/ray/train/torch/torch_trainer.py
+++ b/python/ray/train/torch/torch_trainer.py
@@ -25,9 +25,9 @@ class TorchTrainer(DataParallelTrainer):
     4. Runs the input ``train_loop_per_worker(train_loop_config)``
        on all workers.
 
-    For more details, see the :ref:`PyTorch User Guide <train-pytorch>`, 
-    :ref:`PyTorch Lightning User Guide <train-pytorch-lightning>`, 
-    or :ref:`PyTorch User Guide <train-pytorch-transformers>`.
+    For more details, see the :ref:`PyTorch Guide <train-pytorch>`, 
+    :ref:`PyTorch Lightning Guide <train-pytorch-lightning>`, 
+    or :ref:`Hugging Face Transformers Guide <train-pytorch-transformers>`.
 
     Example:
 

From 6446a30f2ae315a6ae0379cb745512ac7d0ecdf2 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 16:21:14 -0700
Subject: [PATCH 07/13] trying to fix premerge error again with removing
 trailing spaces

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 python/ray/train/torch/torch_trainer.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/ray/train/torch/torch_trainer.py b/python/ray/train/torch/torch_trainer.py
index 072fdfa08c19f..e3f593ae89c2e 100644
--- a/python/ray/train/torch/torch_trainer.py
+++ b/python/ray/train/torch/torch_trainer.py
@@ -25,8 +25,8 @@ class TorchTrainer(DataParallelTrainer):
     4. Runs the input ``train_loop_per_worker(train_loop_config)``
        on all workers.
 
-    For more details, see the :ref:`PyTorch Guide <train-pytorch>`, 
-    :ref:`PyTorch Lightning Guide <train-pytorch-lightning>`, 
+    For more details, see :ref:`PyTorch Guide <train-pytorch>`,
+    :ref:`PyTorch Lightning Guide <train-pytorch-lightning>`,
     or :ref:`Hugging Face Transformers Guide <train-pytorch-transformers>`.
 
     Example:

From 87097bab94ac5a257aa9878dda243186e1d0235a Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 17:35:34 -0700
Subject: [PATCH 08/13] adding some links from guides to the overview pages;
 fix typos

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 doc/source/train/deepspeed.rst                       |  3 +++
 doc/source/train/distributed-tensorflow-keras.rst    |  4 ++--
 doc/source/train/distributed-xgboost-lightgbm.rst    | 12 ++++++------
 .../train/getting-started-pytorch-lightning.rst      |  6 +++---
 doc/source/train/getting-started-pytorch.rst         |  6 +++---
 doc/source/train/getting-started-transformers.rst    |  8 ++++----
 doc/source/train/huggingface-accelerate.rst          |  2 +-
 7 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/doc/source/train/deepspeed.rst b/doc/source/train/deepspeed.rst
index 68160530d3231..704c6a2b48ef2 100644
--- a/doc/source/train/deepspeed.rst
+++ b/doc/source/train/deepspeed.rst
@@ -5,6 +5,9 @@ Get Started with DeepSpeed
 
 The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `DeepSpeed <https://www.deepspeed.ai/>`_  training across a distributed Ray cluster.
 
+Code example
+------------
+
 You only need to run your existing training code with a TorchTrainer. You can expect the final code to look like this:
 
 .. code-block:: python
diff --git a/doc/source/train/distributed-tensorflow-keras.rst b/doc/source/train/distributed-tensorflow-keras.rst
index 01687b916d267..29d58c9cc4c5f 100644
--- a/doc/source/train/distributed-tensorflow-keras.rst
+++ b/doc/source/train/distributed-tensorflow-keras.rst
@@ -110,8 +110,8 @@ To customize the backend setup, you can pass a
 For more configurability, see the :py:class:`~ray.train.data_parallel_trainer.DataParallelTrainer` API.
 
 
-Run your training function
---------------------------
+Run a training function
+-----------------------
 
 With a distributed training function and a Ray Train ``Trainer``, you are now
 ready to start training.
diff --git a/doc/source/train/distributed-xgboost-lightgbm.rst b/doc/source/train/distributed-xgboost-lightgbm.rst
index 8e72716b5a887..7f1ec33db81d5 100644
--- a/doc/source/train/distributed-xgboost-lightgbm.rst
+++ b/doc/source/train/distributed-xgboost-lightgbm.rst
@@ -25,7 +25,7 @@ Quickstart
            :end-before: __lightgbm_end__
 
 
-Basic Training with Tree-Based Models in Train
+Basic training with tree-based models in Train
 ----------------------------------------------
 
 Just as in the original `xgboost.train() <https://xgboost.readthedocs.io/en/stable/parameter.html>`__ and
@@ -53,12 +53,12 @@ training parameters are passed as the ``params`` dictionary.
             :end-before: __lightgbm_end__
 
 
-Ray-specific params are passed in through the trainer constructors.
+Trainer constructors pass Ray-specific parameters.
 
 
 .. _train-gbdt-checkpoints:
 
-Save and Load XGBoost and LightGBM Checkpoints
+Save and load XGBoost and LightGBM checkpoints
 ----------------------------------------------
 
 When you train a new tree on every boosting round,
@@ -209,13 +209,13 @@ How to optimize XGBoost memory usage?
 XGBoost uses a compute-optimized datastructure, the ``DMatrix``,
 to hold training data. When converting a dataset to a ``DMatrix``,
 XGBoost creates intermediate copies and ends up
-holding a complete copy of the full data. The data will be converted
-into the local dataformat (on a 64 bit system these are 64 bit floats.)
+holding a complete copy of the full data. XGBoost converts the data
+into the local data format. On a 64-bit system the format is 64-bit floats.
 Depending on the system and original dataset dtype, this matrix can
 thus occupy more memory than the original dataset.
 
 The **peak memory usage** for CPU-based training is at least
-**3x** the dataset size (assuming dtype ``float32`` on a 64bit system)
+**3x** the dataset size, assuming dtype ``float32`` on a 64-bit system,
 plus about **400,000 KiB** for other resources,
 like operating system requirements and storing of intermediate
 results.
diff --git a/doc/source/train/getting-started-pytorch-lightning.rst b/doc/source/train/getting-started-pytorch-lightning.rst
index 8349c25324463..fa198c4d3c6bc 100644
--- a/doc/source/train/getting-started-pytorch-lightning.rst
+++ b/doc/source/train/getting-started-pytorch-lightning.rst
@@ -8,8 +8,8 @@ This tutorial walks through the process of converting an existing PyTorch Lightn
 Learn how to:
 
 1. Configure the Lightning Trainer so that it runs distributed with Ray and on the correct CPU or GPU device.
-2. Configure the training function to report metrics and save checkpoints.
-3. Configure scale and CPU or GPU resource requirements for a training job.
+2. Configure :ref:`training function <train-overview-training-function>` to report metrics and save checkpoints.
+3. Configure :ref:`scaling <train-overview-scaling-config>` and CPU or GPU resource requirements for a training job.
 4. Launch a distributed training job with a :class:`~ray.train.torch.TorchTrainer`.
 
 Quickstart
@@ -29,7 +29,7 @@ For reference, the final code is as follows:
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-1. Your `train_func` is the Python code that each distributed training worker executes.
+1. Your `train_func` is the Python code that each distributed training :ref:`worker <train-overview-worker>` executes.
 2. Your `ScalingConfig` defines the number of distributed training workers and whether to use GPUs.
 3. Your `TorchTrainer` launches the distributed training job.
 
diff --git a/doc/source/train/getting-started-pytorch.rst b/doc/source/train/getting-started-pytorch.rst
index 4ffa26fa8ade7..aa9d891bbeeb1 100644
--- a/doc/source/train/getting-started-pytorch.rst
+++ b/doc/source/train/getting-started-pytorch.rst
@@ -8,9 +8,9 @@ This tutorial walks through the process of converting an existing PyTorch script
 Learn how to:
 
 1. Configure a model to run distributed and on the correct CPU/GPU device.
-2. Configure a dataloader to shard data across the workers and place data on the correct CPU or GPU device.
-3. Configure a training function to report metrics and save checkpoints.
-4. Configure scale and CPU or GPU resource requirements for a training job.
+2. Configure a dataloader to shard data across the :ref:`workers <train-overview-worker>` and place data on the correct CPU or GPU device.
+3. Configure a :ref:`training function <train-overview-training-function>` to report metrics and save checkpoints.
+4. Configure :ref:`scaling <train-overview-scaling-config>` and CPU or GPU resource requirements for a training job.
 5. Launch a distributed training job with a :class:`~ray.train.torch.TorchTrainer` class.
 
 Quickstart
diff --git a/doc/source/train/getting-started-transformers.rst b/doc/source/train/getting-started-transformers.rst
index 3bc38c592f7b3..95bb99fafd108 100644
--- a/doc/source/train/getting-started-transformers.rst
+++ b/doc/source/train/getting-started-transformers.rst
@@ -7,8 +7,8 @@ This tutorial walks through the process of converting an existing Hugging Face T
 
 Learn how to:
 
-1. Configure your training function to report metrics and save checkpoints.
-2. Configure scale and CPU/GPU resource requirements for your training job.
+1. Configure a :ref:`training function <train-overview-training-function>` to report metrics and save checkpoints.
+2. Configure :ref:`scaling <train-overview-scaling-config>` and CPU or GPU resource requirements for your training job.
 3. Launch your distributed training job with a :class:`~ray.train.torch.TorchTrainer`.
 
 Quickstart
@@ -28,7 +28,7 @@ For reference, the final code follows:
     trainer = TorchTrainer(train_func, scaling_config=scaling_config)
     result = trainer.fit()
 
-1. `train_func` is the Python code that executes on each distributed training worker.
+1. `train_func` is the Python code that executes on each distributed training :ref:`worker <train-overview-worker>`.
 2. :class:`~ray.train.ScalingConfig` defines the number of distributed training workers and computing resources (e.g. GPUs).
 3. :class:`~ray.train.torch.TorchTrainer` launches the distributed training job.
 
@@ -175,7 +175,7 @@ Set up a training function
 --------------------------
 
 First, update your training code to support distributed training. 
-You can begin by wrapping your code in a function:
+You can begin by wrapping your code in a :ref:`training function <train-overview-training-function>`:
 
 .. code-block:: python
 
diff --git a/doc/source/train/huggingface-accelerate.rst b/doc/source/train/huggingface-accelerate.rst
index 320fff684c893..9966256f2e956 100644
--- a/doc/source/train/huggingface-accelerate.rst
+++ b/doc/source/train/huggingface-accelerate.rst
@@ -3,7 +3,7 @@
 Get Started with Hugging Face Accelerate
 ========================================
 
-The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `Accelelate <https://huggingface.co/docs/accelerate>`_  training across a distributed Ray cluster.
+The :class:`~ray.train.torch.TorchTrainer` can help you easily launch your `Accelerate <https://huggingface.co/docs/accelerate>`_  training across a distributed Ray cluster.
 
 You only need to run your existing training code with a TorchTrainer. You can expect the final code to look like this:
 

From 0d35ba9037b2b8fa4f8871597dea03cc4c680783 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 18:14:22 -0700
Subject: [PATCH 09/13] Apply suggestions from code review

Reverting changes to docstrings

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 .../accelerate/accelerate_torch_trainer.py         |  4 ++--
 .../pytorch/torch_fashion_mnist_example.py         | 14 +++++++-------
 .../transformers_torch_trainer_basic.py            | 10 +++++-----
 python/ray/train/torch/torch_trainer.py            |  4 +---
 4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/python/ray/train/examples/accelerate/accelerate_torch_trainer.py b/python/ray/train/examples/accelerate/accelerate_torch_trainer.py
index 64992f0bc2240..41969a71f0210 100644
--- a/python/ray/train/examples/accelerate/accelerate_torch_trainer.py
+++ b/python/ray/train/examples/accelerate/accelerate_torch_trainer.py
@@ -25,7 +25,7 @@
 
 
 def train_func(config):
-    """Your training function that is launched on each worker."""
+    """Your training function that will be launched on each worker."""
 
     # Unpack training configs
     lr = config["lr"]
@@ -116,7 +116,7 @@ def collate_fn(batch):
         eval_metric = metric.compute()
         accelerator.print(f"epoch {epoch}:", eval_metric)
 
-        # Report checkpoint and metrics to Ray Train
+        # Report Checkpoint and metrics to Ray Train
         # ==========================================
         with TemporaryDirectory() as tmpdir:
             if accelerator.is_main_process:
diff --git a/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py b/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
index b6db1451216d9..d5aba832806f3 100644
--- a/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
+++ b/python/ray/train/examples/pytorch/torch_fashion_mnist_example.py
@@ -19,7 +19,7 @@ def get_dataloaders(batch_size):
     transform = transforms.Compose([ToTensor(), Normalize((0.5,), (0.5,))])
 
     with FileLock(os.path.expanduser("~/data.lock")):
-        # Download training data from open datasets
+        # Download training data from open datasets.
         training_data = datasets.FashionMNIST(
             root="~/data",
             train=True,
@@ -27,7 +27,7 @@ def get_dataloaders(batch_size):
             transform=transform,
         )
 
-        # Download test data from open datasets
+        # Download test data from open datasets.
         test_data = datasets.FashionMNIST(
             root="~/data",
             train=False,
@@ -35,7 +35,7 @@ def get_dataloaders(batch_size):
             transform=transform,
         )
 
-    # Create data loaders
+    # Create data loaders.
     train_dataloader = DataLoader(training_data, batch_size=batch_size)
     test_dataloader = DataLoader(test_data, batch_size=batch_size)
 
@@ -69,7 +69,7 @@ def train_func_per_worker(config: Dict):
     epochs = config["epochs"]
     batch_size = config["batch_size_per_worker"]
 
-    # Get dataloaders inside the worker training function
+    # Get dataloaders inside worker training function
     train_dataloader, test_dataloader = get_dataloaders(batch_size=batch_size)
 
     # [1] Prepare Dataloader for distributed training
@@ -81,7 +81,7 @@ def train_func_per_worker(config: Dict):
     model = NeuralNetwork()
 
     # [2] Prepare and wrap your model with DistributedDataParallel
-    # Move the model to the correct GPU/CPU device
+    # Move the model the correct GPU/CPU device
     # ============================================================
     model = ray.train.torch.prepare_model(model)
 
@@ -137,9 +137,9 @@ def train_fashion_mnist(num_workers=2, use_gpu=False):
         scaling_config=scaling_config,
     )
 
-    # [4] Start distributed training
+    # [4] Start Distributed Training
     # Run `train_func_per_worker` on all workers
-    # ==========================================
+    # =============================================
     result = trainer.fit()
     print(f"Training result: {result}")
 
diff --git a/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py b/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
index 79d3f993f3d3b..630177424f28c 100644
--- a/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
+++ b/python/ray/train/examples/transformers/transformers_torch_trainer_basic.py
@@ -14,8 +14,8 @@
 from ray.train.torch import TorchTrainer
 
 
-# [1] Define a training function that includes all your training logic
-# ====================================================================
+# [1] Define a training function that includes all your training logics
+# =====================================================================
 def train_func(config):
     # Datasets
     dataset = load_dataset("yelp_review_full")
@@ -34,7 +34,7 @@ def tokenize_function(examples):
         "bert-base-cased", num_labels=5
     )
 
-    # Evaluation metrics
+    # Evaluation Metrics
     metric = evaluate.load("accuracy")
 
     def compute_metrics(eval_pred):
@@ -42,7 +42,7 @@ def compute_metrics(eval_pred):
         predictions = np.argmax(logits, axis=-1)
         return metric.compute(predictions=predictions, references=labels)
 
-    # Hugging Face Trainer
+    # HuggingFace Trainer
     training_args = TrainingArguments(
         output_dir="test_trainer", evaluation_strategy="epoch", report_to="none"
     )
@@ -59,7 +59,7 @@ def compute_metrics(eval_pred):
     # ===============================================
     trainer.add_callback(RayTrainReportCallback())
 
-    # [3] Prepare your trainer for Ray Data integration
+    # [3] Prepare your trainer for Ray Data Integration
     # =================================================
     trainer = prepare_trainer(trainer)
 
diff --git a/python/ray/train/torch/torch_trainer.py b/python/ray/train/torch/torch_trainer.py
index f21c8c0cbd744..2629a1623269e 100644
--- a/python/ray/train/torch/torch_trainer.py
+++ b/python/ray/train/torch/torch_trainer.py
@@ -23,9 +23,7 @@ class TorchTrainer(DataParallelTrainer):
     4. Runs the input ``train_loop_per_worker(train_loop_config)``
        on all workers.
 
-    For more details, see :ref:`PyTorch Guide <train-pytorch>`,
-    :ref:`PyTorch Lightning Guide <train-pytorch-lightning>`,
-    or :ref:`Hugging Face Transformers Guide <train-pytorch-transformers>`.
+    For more details, see the :ref:`PyTorch User Guide <train-pytorch>`.
 
     Example:
 

From 29c93583db5a10221004b26798f44d84dc54cc03 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 21:05:10 -0700
Subject: [PATCH 10/13] Apply suggestions from code review

Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 doc/source/train/distributed-tensorflow-keras.rst           | 1 -
 doc/source/train/examples/pytorch/dreambooth_finetuning.rst | 2 +-
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/doc/source/train/distributed-tensorflow-keras.rst b/doc/source/train/distributed-tensorflow-keras.rst
index 29d58c9cc4c5f..0326930ef3039 100644
--- a/doc/source/train/distributed-tensorflow-keras.rst
+++ b/doc/source/train/distributed-tensorflow-keras.rst
@@ -130,7 +130,6 @@ change anything.
 
 If you require more advanced preprocessing, you may want to consider using Ray Data
 for distributed data ingest. See :ref:`Ray Data with Ray Train <data-ingest-torch>`.
-Because Ray Data is an independent library, you can directly apply most concepts to TensorFlow.
 
 The main difference is that you may want to convert your Ray Data dataset shard to
 a TensorFlow dataset in your training function so that you can use the Keras
diff --git a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
index 46a1097627219..23a6e05d23110 100644
--- a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
+++ b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
@@ -251,7 +251,7 @@ Step 3: Create the regularization images
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Create a regularization image set for a class of subjects using the pre-trained
-Stable Diffusion model. This set regularizes the fine-tuning by ensuring that
+Stable Diffusion model. This regularization set ensures that
 the model still produces decent images for random images of the same class,
 rather than just optimize for producing good images of the subject.
 

From 2fc77b707ea9861487594203447c32c1916e2eba Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Tue, 12 Sep 2023 21:05:32 -0700
Subject: [PATCH 11/13] Update
 doc/source/train/examples/transformers/huggingface_text_classification.ipynb

Co-authored-by: Yunxuan Xiao <xiaoyunxuan1998@gmail.com>
Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 .../examples/transformers/huggingface_text_classification.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/train/examples/transformers/huggingface_text_classification.ipynb b/doc/source/train/examples/transformers/huggingface_text_classification.ipynb
index 9551e0ee543e7..da4267beb2df6 100644
--- a/doc/source/train/examples/transformers/huggingface_text_classification.ipynb
+++ b/doc/source/train/examples/transformers/huggingface_text_classification.ipynb
@@ -15,7 +15,7 @@
     "id": "VaFMt6AIhYbK"
    },
    "source": [
-    "This notebook is based on an official Hugging Face  (HF) notebook, [How to fine-tune a model on text classification](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). This notebook shows the process of conversion from vanilla HF to Ray Train without changing the training logic unless necessary.\n",
+    "This notebook is based on an official Hugging Face example, [How to fine-tune a model on text classification](https://github.com/huggingface/notebooks/blob/6ca682955173cc9d36ffa431ddda505a048cbe80/examples/text_classification.ipynb). This notebook shows the process of conversion from vanilla HF to Ray Train without changing the training logic unless necessary.\n",
     "\n",
     "This notebook consists of the following steps:\n",
     "1. [Set up Ray](#setup)\n",

From 82db390501b22e6d173f9c72b3550b23e380a47f Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Wed, 13 Sep 2023 10:00:07 -0700
Subject: [PATCH 12/13] Apply suggestions from code review

feedback from code review

Co-authored-by: matthewdeng <matthew.j.deng@gmail.com>
Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 doc/source/train/examples.rst                                 | 2 +-
 doc/source/train/examples/accelerate/accelerate_example.rst   | 2 +-
 doc/source/train/examples/deepspeed/deepspeed_example.rst     | 4 ++--
 .../train/examples/lightning/lightning_cola_advanced.ipynb    | 2 +-
 doc/source/train/examples/pytorch/dreambooth_finetuning.rst   | 3 +--
 doc/source/train/huggingface-accelerate.rst                   | 2 +-
 6 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/doc/source/train/examples.rst b/doc/source/train/examples.rst
index 0a1811364e67c..b06d8fe5659af 100644
--- a/doc/source/train/examples.rst
+++ b/doc/source/train/examples.rst
@@ -28,7 +28,7 @@ Beginner
   * - DeepSpeed
     - :ref:`Train with DeepSpeed ZeRO-3 <deepspeed_example>`
   * - TensorFlow
-    - :ref:`Train with TensorFlow MNIST <tensorflow_mnist_example>`
+    - :ref:`Train an MNIST Image Classifier with TensorFlow <tensorflow_mnist_example>`
   * - Horovod
     - :ref:`Train with Horovod and PyTorch <horovod_example>`
 
diff --git a/doc/source/train/examples/accelerate/accelerate_example.rst b/doc/source/train/examples/accelerate/accelerate_example.rst
index 086082d090f15..140312ce90bfc 100644
--- a/doc/source/train/examples/accelerate/accelerate_example.rst
+++ b/doc/source/train/examples/accelerate/accelerate_example.rst
@@ -6,7 +6,7 @@ Distributed Training with Hugging Face Accelerate
 =================================================
 
 This example does distributed data parallel training
-with Hugging Face (HF) Accelerate, Ray Train, and Ray Data.
+with Hugging Face Accelerate, Ray Train, and Ray Data.
 It fine-tunes a BERT model and is adapted from
 https://github.com/huggingface/accelerate/blob/main/examples/nlp_example.py
 
diff --git a/doc/source/train/examples/deepspeed/deepspeed_example.rst b/doc/source/train/examples/deepspeed/deepspeed_example.rst
index 5ed89be69d7a7..15cab93e30ba9 100644
--- a/doc/source/train/examples/deepspeed/deepspeed_example.rst
+++ b/doc/source/train/examples/deepspeed/deepspeed_example.rst
@@ -6,7 +6,7 @@ Train with DeepSpeed ZeRO-3 and Ray Train
 =========================================
 
 This is an intermediate example that shows how to do distributed training with DeepSpeed ZeRO-3 and Ray Train.
-It demonstrates how to use :ref:`Ray Dataset <data>` with DeepSpeed ZeRO-3 and Ray Train.
+It demonstrates how to use :ref:`Ray Data <data>` with DeepSpeed ZeRO-3 and Ray Train.
 If you just want to quickly convert your existing TorchTrainer scripts into Ray Train, you can refer to the :ref:`Train with DeepSpeed <train-deepspeed>`.
 
 
@@ -21,4 +21,4 @@ See also
 
 * :ref:`Ray Train Examples <train-examples>` for more use cases.
 
-* :ref:`Get Started with DeepSpeed <train-horovod>` for a tutorial.    
+* :ref:`Get Started with DeepSpeed <train-deepspeed>` for a tutorial.    
diff --git a/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb b/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
index e2e007d0f7961..4bd8e7c427b7c 100644
--- a/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
+++ b/doc/source/train/examples/lightning/lightning_cola_advanced.ipynb
@@ -11,7 +11,7 @@
     "\n",
     ":::{note}\n",
     "\n",
-    "This is an intermediate example demonstrates how to use [Ray Dataset](data) with PyTorch Lightning in Ray Train.\n",
+    "This is an intermediate example demonstrates how to use [Ray Data](data) with PyTorch Lightning in Ray Train.\n",
     "\n",
     "If you just want to quickly convert your existing PyTorch Lightning scripts into Ray Train, you can refer to the [Lightning Quick Start Guide](train-pytorch-lightning).\n",
     "\n",
diff --git a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
index 23a6e05d23110..7db3e96c82124 100644
--- a/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
+++ b/doc/source/train/examples/pytorch/dreambooth_finetuning.rst
@@ -6,8 +6,7 @@ Fine-tune of Stable Diffusion with DreamBooth and Ray Train
 ===========================================================
 
 This is an intermediate example that shows how to do DreamBooth fine-tuning of a Stable Diffusion model using Ray Train.
-It demonstrates how to use :ref:`Ray Dataset <data>` with PyTorch Lightning in Ray Train.
-If you just want to quickly convert your existing Transformer scripts into Ray Train, you can refer to the :ref:`Getting Started with Transformers <train-pytorch-transformers>`.
+It demonstrates how to use :ref:`Ray Data <data>` with PyTorch Lightning in Ray Train.
 
 
 See the original `DreamBooth project homepage <https://dreambooth.github.io/>`_ for more details on what this fine-tuning method achieves.
diff --git a/doc/source/train/huggingface-accelerate.rst b/doc/source/train/huggingface-accelerate.rst
index 9966256f2e956..480ae9b148a9b 100644
--- a/doc/source/train/huggingface-accelerate.rst
+++ b/doc/source/train/huggingface-accelerate.rst
@@ -50,7 +50,7 @@ You only need to run your existing training code with a TorchTrainer. You can ex
     Model and data preparation for distributed training is completely handled by the `Accelerator <https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator>`_ 
     object and its `Accelerator.prepare() <https://huggingface.co/docs/accelerate/main/en/package_reference/accelerator#accelerate.Accelerator.prepare>`_  method.
     
-    Unlike with native PyTorch, PyTorch Lightning, or HuggingFace Transformers, **don't** call any additional Ray Train utilities 
+    Unlike with native PyTorch, PyTorch Lightning, or Hugging Face Transformers, **don't** call any additional Ray Train utilities 
     like :meth:`~ray.train.torch.prepare_model` or :meth:`~ray.train.torch.prepare_data_loader` in your training function. 
 
 Configure Accelerate

From 264223cf3e9faaef24c23dc2745274b78fd09718 Mon Sep 17 00:00:00 2001
From: angelinalg <122562471+angelinalg@users.noreply.github.com>
Date: Wed, 13 Sep 2023 13:30:20 -0700
Subject: [PATCH 13/13] change button text to with instead of and to be
 consistent with other buttons

Signed-off-by: angelinalg <122562471+angelinalg@users.noreply.github.com>
---
 doc/source/train/train.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/doc/source/train/train.rst b/doc/source/train/train.rst
index d779cf297dcf1..2ceb336d77354 100644
--- a/doc/source/train/train.rst
+++ b/doc/source/train/train.rst
@@ -97,7 +97,7 @@ Get started
             :outline:
             :expand:
 
-            Try Ray Train and Lightning
+            Try Ray Train with Lightning
 
     .. grid-item-card::