polish examples: make titles more consistent, add links to guides

Signed-off-by: angelinalg <[email protected]>
angelinalg · Sep 11, 2023 · 9d793ab · 9d793ab
1 parent 6991189
commit 9d793ab
Show file tree

Hide file tree

Showing 13 changed files with 182 additions and 118 deletions.
diff --git a/.github/styles/Vocab/Train/accept.txt b/.github/styles/Vocab/Train/accept.txt
@@ -1,5 +1,6 @@
 Horovod
 Hugging Face
+hyperparameters?
 Keras
 LightGBM
 PyTorch

diff --git a/doc/source/train/examples.rst b/doc/source/train/examples.rst
@@ -3,7 +3,7 @@
 Ray Train Examples
 ==================
 
-.. Example .rst files should be organized in the same manner as the
+.. Organize example .rst files in the same manner as the
  .py files in ray/python/ray/train/examples.
 
 Below are examples for using Ray Train with a variety of frameworks and use cases.
@@ -18,17 +18,17 @@ Beginner
  * - Framework
  - Example
  * - PyTorch
- - :ref:`Training an Fashion MNIST Image Classifier with PyTorch <torch_fashion_mnist_ex>`
+ - :ref:`Train a Fashion MNIST Image Classifier with PyTorch <torch_fashion_mnist_ex>`
  * - Lightning
- - :ref:`Training an MNIST Image Classifier with Lightning <lightning_mnist_example>`
+ - :ref:`Train an MNIST Image Classifier with Lightning <lightning_mnist_example>`
  * - Transformers
- - :ref:`Fine-tuning a Text Classifier on Yelp Reviews Dataset with HF Transformers <transformers_torch_trainer_basic_example>`
+ - :ref:`Fine-tune a Text Classifier on the Yelp Reviews Dataset with HF Transformers <transformers_torch_trainer_basic_example>`
  * - Accelerate
  - :ref:`Distributed Data Parallel Training with HF Accelerate <accelerate_example>`
  * - DeepSpeed
- - :ref:`Distributed Training with DeepSpeed ZeRO-3 <deepspeed_example>`
+ - :ref:`Train with DeepSpeed ZeRO-3 <deepspeed_example>`
  * - TensorFlow
- - :ref:`TensorFlow MNIST Training Example <tensorflow_mnist_example>`
+ - :ref:`Train with TensorFlow MNIST <tensorflow_mnist_example>`
  * - Horovod
  - :ref:`End-to-end Horovod Training Example <horovod_example>`
 
@@ -42,11 +42,11 @@ Intermediate
  * - Framework
  - Example
  * - PyTorch
- - `DreamBooth fine-tuning of Stable Diffusion with Ray Train <https://github.com/ray-project/ray/tree/master/doc/source/templates/05_dreambooth_finetuning>`_
+ - :ref:`Fine-tune of Stable Diffusion with DreamBooth and Ray Train <torch_finetune_dreambooth_ex>`
  * - Lightning
  - :ref:`Model Training with PyTorch Lightning and Ray Data <lightning_advanced_example>`
  * - Accelerate
- - :ref:`Fine-tuning a Text Classifier on GLUE Benchmark with HF Accelerate. <train_transformers_accelerate_example>`
+ - :ref:`Fine-tune a text classifier on GLUE Benchmark with HF Accelerate <train_transformers_accelerate_example>`
 
 
 Advanced
@@ -59,10 +59,10 @@ Advanced
  * - Framework
  - Example
  * - Accelerate, DeepSpeed
- - `Fine-tuning Llama-2 series models with Deepspeed, Accelerate, and Ray Train TorchTrainer <https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed>`_
+ - `Fine-tune Llama-2 series models with Deepspeed, Accelerate, and Ray Train TorchTrainer <https://github.com/ray-project/ray/tree/master/doc/source/templates/04_finetuning_llms_with_deepspeed>`_
  * - Transformers, DeepSpeed
- - :ref:`Fine-tuning GPT-J-6B with Ray Train and DeepSpeed <gptj_deepspeed_finetune>`
+ - :ref:`Fine-tune GPT-J-6B with Ray Train and DeepSpeed <gptj_deepspeed_finetune>`
  * - Lightning, DeepSpeed
- - :ref:`Fine-tuning vicuna-13b with PyTorch Lightning and DeepSpeed <vicuna_lightning_deepspeed_finetuning>`
+ - :ref:`Fine-tune vicuna-13b with PyTorch Lightning and DeepSpeed <vicuna_lightning_deepspeed_finetuning>`
  * - Lightning
- - :ref:`Fine-tuning dolly-v2-7b with PyTorch Lightning and FSDP <dolly_lightning_fsdp_finetuning>`
+ - :ref:`Fine-tune dolly-v2-7b with PyTorch Lightning and FSDP <dolly_lightning_fsdp_finetuning>`
diff --git a/doc/source/train/examples/accelerate/accelerate_example.rst b/doc/source/train/examples/accelerate/accelerate_example.rst
@@ -2,7 +2,24 @@
 
 .. _accelerate_example:
 
-Hugging Face Accelerate Distributed Training Example with Ray Train
-===================================================================
+Distributed Training Example with Hugging Face Accelerate
+=========================================================
+
+This example does distributed data parallel training
+with Hugging Face (HF) Accelerate, Ray Train, and Ray Data.
+It fine-tunes a BERT model and is adapted from
+https://github.com/huggingface/accelerate/blob/main/examples/nlp_example.py
+
+
+Code example
+------------
 
 .. literalinclude:: /../../python/ray/train/examples/accelerate/accelerate_torch_trainer.py
+
+See also
+--------
+
+For a tutorial on using Ray Train and HF Accelerate, 
+see :ref:`Training with Hugging Face Accelerate <train-hf-accelerate>`.
+
+For more Train examples, see :ref:`Ray Train Examples <train-examples>`.
diff --git a/doc/source/train/examples/lightning/lightning_mnist_example.ipynb b/doc/source/train/examples/lightning/lightning_mnist_example.ipynb
@@ -51,7 +51,7 @@
  "source": [
  "## Prepare a dataset and module\n",
  "\n",
- "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can keep using them without any changes with Ray Train. "
+ "The Pytorch Lightning Trainer takes either `torch.utils.data.DataLoader` or `pl.LightningDataModule` as data inputs. You can continue using them without any changes with Ray Train. "
  ]
  },
  {
@@ -75,7 +75,7 @@
  " self.data_dir, train=True, download=True, transform=self.transform\n",
  " )\n",
  "\n",
- " # split data into train and val sets\n",
+ " # Split data into train and val sets\n",
  " self.mnist_train, self.mnist_val = random_split(mnist, [55000, 5000])\n",
  "\n",
  " def train_dataloader(self):\n",
@@ -175,26 +175,26 @@
  "cell_type": "markdown",
  "metadata": {},
  "source": [
- "You don't need to make any change to the definition of PyTorch Lightning model and datamodule."
+ "You don't need to modify the definition of the PyTorch Lightning model or datamodule."
  ]
  },
  {
  "attachments": {},
  "cell_type": "markdown",
  "metadata": {},
  "source": [
- "## Define the training loop\n",
+ "## Define a training function\n",
  "\n",
- "This code defines a training loop for each worker. Comparing the training loop with the original PyTorch Lightning code, there are 3 main differences:\n",
+ "This code defines a {ref}`training function <train-overview-training-function>` for each worker. Comparing the training fuction with the original PyTorch Lightning code, notice three main differences:\n",
  "\n",
  "- Distributed strategy: Use {class}`RayDDPStrategy <ray.train.lightning.RayDDPStrategy>`.\n",
  "- Cluster environment: Use {class}`RayLightningEnvironment <ray.train.lightning.RayLightningEnvironment>`.\n",
- "- Parallel devices: Always sets to `devices=\"auto\"` to use all available devices configured by ``TorchTrainer``.\n",
+ "- Parallel devices: Always set to `devices=\"auto\"` to use all available devices configured by ``TorchTrainer``.\n",
  "\n",
  "See {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>` for more information.\n",
  "\n",
  "\n",
- "For checkpoint reportining, Ray Train provides a minimal {class}`RayTrainReportCallback <ray.train.lightning.RayTrainReportCallback>` that reports metrics and checkpoint on each train epoch end. For more complex checkpoint logic, please implement custom callbacks as described in {ref}`Saving and Loading Checkpoint <train-checkpointing>` user guide."
+ "For checkpoint reporting, Ray Train provides a minimal {class}`RayTrainReportCallback <ray.train.lightning.RayTrainReportCallback>` class that reports metrics and checkpoints at the end of each train epoch. For more complex checkpoint logic, implement custom callbacks. See {ref}`Saving and Loading Checkpoint <train-checkpointing>`."
  ]
  },
  {
@@ -203,7 +203,7 @@
  "metadata": {},
  "outputs": [],
  "source": [
- "use_gpu = True # Set it to False if you want to run without GPUs\n",
+ "use_gpu = True # Set to False if you want to run without GPUs\n",
  "num_workers = 4"
  ]
  },
@@ -804,7 +804,7 @@
  "cell_type": "markdown",
  "metadata": {},
  "source": [
- "## Check the Training Results and Checkpoints"
+ "## Check training results and checkpoints"
  ]
  },
  {
@@ -857,9 +857,9 @@
  "cell_type": "markdown",
  "metadata": {},
  "source": [
- "As we can see, three checkpoints(`checkpoint_000007`, `checkpoint_000008`, `checkpoint_000009`) have been saved in the trial directory. To retrieve the latest checkpoint from the fit results and load it back into the model, follow these steps.\n",
+ "Ray Train saved three checkpoints(`checkpoint_000007`, `checkpoint_000008`, `checkpoint_000009`) in the trial directory. The following code retrieves the latest checkpoint from the fit results and loads it back into the model.\n",
  "\n",
- "If you lost the in-memory result object, you can also restore the model from the checkpoint file. Here the checkpoint path is: `/tmp/ray_results/ptl-mnist-example/TorchTrainer_eb925_00000_0_2023-08-07_23-15-06/checkpoint_000009/checkpoint.ckpt`."
+ "If you lost the in-memory result object, you can restore the model from the checkpoint file. The checkpoint path is: `/tmp/ray_results/ptl-mnist-example/TorchTrainer_eb925_00000_0_2023-08-07_23-15-06/checkpoint_000009/checkpoint.ckpt`."
  ]
  },
  {
@@ -903,6 +903,17 @@
  "\n",
  "best_model"
  ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## See also\n",
+ "\n",
+ "For a tutorial on using Ray Train and PyTorch Lightning, see {ref}`Getting Started with PyTorch Lightning <train-pytorch-lightning>`.\n",
+ "\n",
+ "For more Train examples, see :ref:`Ray Train Examples <train-examples>`."
+ ]
  }
  ],
  "metadata": {