[Train] Restructure Ray Train Example Page #38814

woshiyyya · 2023-08-24T00:03:23Z

Why are these changes needed?

Rendered doc: https://anyscale-ray--38814.com.readthedocs.build/en/38814/train/examples.html

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: woshiyyya <[email protected]>

…_example_page

Signed-off-by: woshiyyya <[email protected]>

matthewdeng · 2023-08-28T02:02:42Z

doc/source/train/examples.rst

I tried two other ways of organizing it. LMK what you think @woshiyyya, @angelinalg . I can push a commit if one of these makes sense to you!

Table Tabs + Table

Thanks Matt that looks nice👍 For me, the first one(table only) seems better because people can see all examples at once, no need to take another click.

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng · 2023-08-28T19:21:24Z

doc/source/train/examples.rst

@woshiyyya should we update the ToC to match these examples?

ray/doc/source/_toc.yml

Lines 85 to 109 in 0562409

sections:

- file: train/examples/pytorch/torch_fashion_mnist_example

title: "PyTorch Fashion MNIST Example"

- file: train/examples/transformers/transformers_torch_trainer_basic

title: "Hugging Face Transformers Basic Example"

- file: train/examples/lightning/lightning_mnist_example

title: "PyTorch Lightning Basic Example"

- file: train/examples/lightning/lightning_cola_advanced

title: "PyTorch Lightning Advanced Example"

- file: train/examples/lightning/lightning_exp_tracking

title: "PyTorch Lightning with Experiment Tracking Tools"

- file: train/examples/tf/tensorflow_mnist_example

title: "TensorFlow MNIST Example"

- file: train/examples/horovod/horovod_example

title: "Horovod Example"

- file: train/examples/tf/tune_tensorflow_mnist_example

title: "Tune & TensorFlow Example"

- file: train/examples/pytorch/tune_cifar_torch_pbt_example

title: "Tune & PyTorch Example"

- file: train/examples/pytorch/torch_data_prefetch_benchmark/benchmark_example

title: "Torch Data Prefetching Benchmark"

- file: train/examples/pytorch/pytorch_resnet_finetune

title: "PyTorch Finetuning ResNet Example"

- file: train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune

title: "Fine-tune Vicuna-13B with DeepSpeed and PyTorch Lightning"

I have an even more aggressive idea. Should we delete this secondary level directly? Putting the full name of each example in the side menu would make it too long. Additionally, the narrow width of the sidebar would result in poor readability.

doc/source/train/examples.rst

Signed-off-by: matthewdeng <[email protected]>

Signed-off-by: woshiyyya <[email protected]>

angelinalg

I like the non-table version as well. Thanks for doing this. This is such an improvement. It's actually beautiful!

Signed-off-by: woshiyyya <[email protected]>

…_example_page

Signed-off-by: woshiyyya <[email protected]>

…into restructure_train_example_page

Signed-off-by: woshiyyya <[email protected]>

…_example_page

Signed-off-by: woshiyyya <[email protected]>

@justinvyu

* [train] enable new persistence mode for core and serve tests (#38938) Signed-off-by: Matthew Deng <[email protected]> * [train] New persistence mode: Update 🐠 `ML Libraries w/ Ray Client Examples (Python 3.7)` (#38923) Signed-off-by: Justin Yu <[email protected]> * [train] remove non-URI assertion (#38944) Signed-off-by: Matthew Deng <[email protected]> * [train] New persistence mode: Update 📖 `Doc tests and examples (excluding Ray AIR examples)` (#38940) Signed-off-by: Justin Yu <[email protected]> Signed-off-by: Matthew Deng <[email protected]> Co-authored-by: Matthew Deng <[email protected]> * disable legacy sync config logic in trainable (#38952) Signed-off-by: Justin Yu <[email protected]> * [2.7 CI][New Persistent Mode][6/n] 📖 ✈️ Ray AIR examples (#38918) Signed-off-by: woshiyyya <[email protected]> * [2.7 CI][New Persistent Mode][2/n] 📺 📖 Doc GPU tests and examples (#38905) Signed-off-by: woshiyyya <[email protected]> * [2.7 CI][New Persistent Mode][4/n] 📺 🚂 Train GPU tests & 🚂 Datasets Train Integration GPU Tests and Examples (#38910) Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Justin Yu <[email protected]> Co-authored-by: Justin Yu <[email protected]> * [2.7 CI][New Persistent Mode][1/n] 📺 ✈️ AIR GPU tests (ray/air) & ⚡ :python: Lightning 2.0 Train GPU tests (#38903) Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Yunxuan Xiao <[email protected]> * [train] Fix broken tune tests and support ray storage (#38950) This PR re-introduces support for ray storage ray.init(storage="s3:https://...") and fixes a broken tune controller test. Signed-off-by: Justin Yu <[email protected]> * [train] New persistence mode: Finish migrating `xgb`, `lgbm` and `sklearn` trainers, checkpoints + tests (#38959) Signed-off-by: Justin Yu <[email protected]> * [2.7 CI][New Persistent Mode][5/n] 📖 Doc examples for external code (#38915) Signed-off-by: woshiyyya <[email protected]> * [train][rllib] temporarily disable new persistence mode for rllib tests (#38965) Signed-off-by: Matthew Deng <[email protected]> * [2.7 CI][New Persistent Mode][8/n] ✈️ AIR tests (ray/air) (#38932) Signed-off-by: woshiyyya <[email protected]> * [tune] Storage: 🐙 🧠 Tune tests and examples {using RLlib} migration (#38895) Signed-off-by: Kai Fricke <[email protected]> Co-authored-by: matthewdeng <[email protected]> * [train] Fix MosaicTrainer example and unit test (#38970) Signed-off-by: Justin Yu <[email protected]> * [air/release] Fix dreambooth example image preprocessing logic (#39020) Signed-off-by: Justin Yu <[email protected]> * [train] clean up ray.train._checkpoint imports (#38951) Signed-off-by: Matthew Deng <[email protected]> * [train] high level cleanup of Ray Train docs (#38971) Signed-off-by: Matthew Deng <[email protected]> * [wip][docs] update FrameworkPredictor examples (#38634) Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: matthewdeng <[email protected]> * [train] Add documentation for using metadata argument to save preprocessors (#38701) * [Train] Restructure Ray Train Example Page (#38814) Signed-off-by: woshiyyya <[email protected]> * [air] Deprecate some fields/classes that are supposed to be gone in 2.6. (#38794) Signed-off-by: xwjiang2010 <[email protected]> * [tune/storage] Fix Tune multinode tests (#39050) Fixes multinode tests by using the new train.report() API. Signed-off-by: Kai Fricke <[email protected]> * [tune] Fix BOHB example for new storage (#38983) The new storage path does not create "empty" checkpoints per default anymore. Previously, when no checkpoint is saved, PAUSEing a trial would create a dummy checkpoint that only contains trial metadata (such as the iteration number). This is not the case anymore. Examples now have to implement checkpointing to properly restore previous state. This was also true previously - but some of our simple examples (e.g. the one in this PR) didn't implement it and still "worked". I think it's fine to keep the functionality as is and require our examples to show checkpointing implementations. This will ensure that users don't shoot their feet trying to use e.g. BOHB. Separately, BOHB was malfunctioning as trials were repeatedly PAUSED and restarted as they've never been removed from `bracket.trials_to_unpause`. @justinvyu mentioned this in the review where it was introduced and I believed at the time it wasn't necessary - turns out it is, as we can end up in a situation where a bracket is never finished because trials are constantly running. This was not caught by any tests. We should add one in a follow-up - for now we can proceed with this PR to pick onto Ray 2.7. Signed-off-by: Kai Fricke <[email protected]> * [Release Test] Fix `long_running_horovod_tune_test`. (#39012) Signed-off-by: Yunxuan Xiao <[email protected]> Signed-off-by: Yunxuan Xiao <[email protected]> * [train] New persistence mode: `StorageContext` unit tests (#39023) Signed-off-by: Justin Yu <[email protected]> * [train] enable train + tune tests and examples (#39021) Signed-off-by: Matthew Deng <[email protected]> * [rllib] Fix storage-path related tests (#38947) This PR fixes rllib-related tests that didn't pass changes related to the new storage context. Signed-off-by: Kai Fricke <[email protected]> Signed-off-by: matthewdeng <[email protected]> Co-authored-by: matthewdeng <[email protected]> * [train] New persistence mode: Migrate 🐙 `Tune tests and examples (medium)` (#39081) Signed-off-by: Justin Yu <[email protected]> --------- Signed-off-by: Matthew Deng <[email protected]> Signed-off-by: Justin Yu <[email protected]> Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Yunxuan Xiao <[email protected]> Signed-off-by: Kai Fricke <[email protected]> Signed-off-by: matthewdeng <[email protected]> Signed-off-by: xwjiang2010 <[email protected]> Signed-off-by: Yunxuan Xiao <[email protected]> Co-authored-by: Justin Yu <[email protected]> Co-authored-by: Yunxuan Xiao <[email protected]> Co-authored-by: Kai Fricke <[email protected]> Co-authored-by: Eric Liang <[email protected]> Co-authored-by: xwjiang2010 <[email protected]>

Signed-off-by: woshiyyya <[email protected]> Signed-off-by: e428265 <[email protected]>

Signed-off-by: woshiyyya <[email protected]>

Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Jim Thompson <[email protected]>

Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Victor <[email protected]>

woshiyyya added 4 commits August 23, 2023 17:02

init

031a9e2

Signed-off-by: woshiyyya <[email protected]>

Merge remote-tracking branch 'upstream/master' into restructure_train…

e284059

…_example_page

wip

9f5b0b2

Signed-off-by: woshiyyya <[email protected]>

wip

dcce4d6

Signed-off-by: woshiyyya <[email protected]>

woshiyyya mentioned this pull request Aug 21, 2023

[Train][Ray 2.7] Revamp Ray Train examples with new APIs #38681

Closed

woshiyyya marked this pull request as ready for review August 24, 2023 17:18

woshiyyya requested review from richardliaw, gjoliver, krfricke, xwjiang2010, amogkam, matthewdeng, Yard1, maxpumperla and a team as code owners August 24, 2023 17:18

woshiyyya assigned woshiyyya, matthewdeng and angelinalg and unassigned woshiyyya Aug 24, 2023

matthewdeng reviewed Aug 28, 2023

View reviewed changes

matthewdeng added 2 commits August 27, 2023 19:06

table

ab6baca

Signed-off-by: Matthew Deng <[email protected]>

update to table format

c4fe8d6

Signed-off-by: Matthew Deng <[email protected]>

matthewdeng reviewed Aug 28, 2023

View reviewed changes

doc/source/train/examples.rst Outdated Show resolved Hide resolved

doc/source/train/examples.rst Outdated Show resolved Hide resolved

doc/source/train/examples.rst Outdated Show resolved Hide resolved

matthewdeng and others added 2 commits August 28, 2023 12:23

Deepspeed -> DeepSpeed

4596a64

Signed-off-by: matthewdeng <[email protected]>

rm inconsistent example names in the side bar

25f6240

Signed-off-by: woshiyyya <[email protected]>

angelinalg approved these changes Aug 28, 2023

View reviewed changes

woshiyyya and others added 3 commits August 28, 2023 15:31

tag some notebooks as orphan page

df4a553

Signed-off-by: woshiyyya <[email protected]>

Merge remote-tracking branch 'upstream/master' into restructure_train…

56a10d2

…_example_page

Merge branch 'master' into restructure_train_example_page

d89607b

woshiyyya added 5 commits August 28, 2023 16:37

wip

f8969c3

Signed-off-by: woshiyyya <[email protected]>

Merge remote-tracking branch 'origin/restructure_train_example_page' …

52d1a14

…into restructure_train_example_page

add orphan to notebook metadata

0eedad3

Signed-off-by: woshiyyya <[email protected]>

add back smoke test flag

4665f13

Signed-off-by: woshiyyya <[email protected]>

Merge remote-tracking branch 'upstream/master' into restructure_train…

627a3bc

…_example_page

matthewdeng added the v2.7.0-pick label Aug 29, 2023

matthewdeng approved these changes Aug 29, 2023

View reviewed changes

matthewdeng merged commit 47c84c6 into ray-project:master Aug 29, 2023
2 checks passed

matthewdeng pushed a commit to matthewdeng/ray that referenced this pull request Aug 30, 2023

[Train] Restructure Ray Train Example Page (ray-project#38814)

43e8db1

Signed-off-by: woshiyyya <[email protected]>

arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023

[Train] Restructure Ray Train Example Page (ray-project#38814)

2a263e1

Signed-off-by: woshiyyya <[email protected]> Signed-off-by: e428265 <[email protected]>

LeonLuttenberger pushed a commit to jaidisido/ray that referenced this pull request Sep 5, 2023

[Train] Restructure Ray Train Example Page (ray-project#38814)

4235476

Signed-off-by: woshiyyya <[email protected]>

jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023

[Train] Restructure Ray Train Example Page (ray-project#38814)

7292225

Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Jim Thompson <[email protected]>

vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023

[Train] Restructure Ray Train Example Page (ray-project#38814)

c9bb596

Signed-off-by: woshiyyya <[email protected]> Signed-off-by: Victor <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Train] Restructure Ray Train Example Page #38814

[Train] Restructure Ray Train Example Page #38814

woshiyyya commented Aug 24, 2023 •

edited

Loading

matthewdeng Aug 28, 2023

woshiyyya Aug 28, 2023 •

edited

Loading

matthewdeng Aug 28, 2023

woshiyyya Aug 28, 2023

angelinalg left a comment

	sections:
	- file: train/examples/pytorch/torch_fashion_mnist_example
	title: "PyTorch Fashion MNIST Example"
	- file: train/examples/transformers/transformers_torch_trainer_basic
	title: "Hugging Face Transformers Basic Example"
	- file: train/examples/lightning/lightning_mnist_example
	title: "PyTorch Lightning Basic Example"
	- file: train/examples/lightning/lightning_cola_advanced
	title: "PyTorch Lightning Advanced Example"
	- file: train/examples/lightning/lightning_exp_tracking
	title: "PyTorch Lightning with Experiment Tracking Tools"
	- file: train/examples/tf/tensorflow_mnist_example
	title: "TensorFlow MNIST Example"
	- file: train/examples/horovod/horovod_example
	title: "Horovod Example"
	- file: train/examples/tf/tune_tensorflow_mnist_example
	title: "Tune & TensorFlow Example"
	- file: train/examples/pytorch/tune_cifar_torch_pbt_example
	title: "Tune & PyTorch Example"
	- file: train/examples/pytorch/torch_data_prefetch_benchmark/benchmark_example
	title: "Torch Data Prefetching Benchmark"
	- file: train/examples/pytorch/pytorch_resnet_finetune
	title: "PyTorch Finetuning ResNet Example"
	- file: train/examples/lightning/vicuna_13b_lightning_deepspeed_finetune
	title: "Fine-tune Vicuna-13B with DeepSpeed and PyTorch Lightning"

[Train] Restructure Ray Train Example Page #38814

[Train] Restructure Ray Train Example Page #38814

Conversation

woshiyyya commented Aug 24, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

matthewdeng Aug 28, 2023

Choose a reason for hiding this comment

woshiyyya Aug 28, 2023 • edited Loading

Choose a reason for hiding this comment

matthewdeng Aug 28, 2023

Choose a reason for hiding this comment

woshiyyya Aug 28, 2023

Choose a reason for hiding this comment

angelinalg left a comment

Choose a reason for hiding this comment

woshiyyya commented Aug 24, 2023 •

edited

Loading

woshiyyya Aug 28, 2023 •

edited

Loading