forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Datasets] Add basic e2e Datasets example on NYC taxi dataset (ray-pr…
…oject#24874) This PR adds a dedicated docs page for examples, and adds a basic e2e tabular data processing example on the NYC taxi dataset. The goal of this example is to demonstrate basic data reading, inspection, transformations, and shuffling, along with ingestion into dummy model trainers and doing dummy batch inference, for tabular (Parquet) data.
- Loading branch information
1 parent
399334d
commit 6c0a457
Showing
7 changed files
with
1,286 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -210,3 +210,6 @@ workflow_data/ | |
|
||
# vscode java extention generated | ||
.factorypath | ||
|
||
# Jupyter Notebooks | ||
**/.ipynb_checkpoints/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,22 @@ | ||
load("//bazel:python.bzl", "py_test_run_all_notebooks") | ||
|
||
filegroup( | ||
name = "data_examples", | ||
srcs = glob(["*.ipynb"]), | ||
visibility = ["//doc:__subpackages__"] | ||
) | ||
) | ||
|
||
# -------------------------------------------------------------------- | ||
# Test all doc/source/data/examples notebooks. | ||
# -------------------------------------------------------------------- | ||
|
||
# big_data_ingestion.ipynb is not tested right now due to large resource requirements | ||
# and a need of a general overhaul. | ||
|
||
py_test_run_all_notebooks( | ||
size = "medium", | ||
include = ["*.ipynb"], | ||
exclude = ["big_data_ingestion.ipynb"], | ||
data = ["//doc/source/data/examples:data_examples"], | ||
tags = ["exclusive", "team:ml"], | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
.. _datasets-examples-ref: | ||
|
||
======== | ||
Examples | ||
======== | ||
|
||
.. tip:: Check out the Datasets :ref:`User Guide <data_user_guide>` to learn more about | ||
Datasets' features in-depth. | ||
|
||
.. _datasets-recipes: | ||
|
||
Simple Data Processing Examples | ||
------------------------------- | ||
|
||
Ray Datasets is a data processing engine that supports multiple data | ||
modalities and types. Here you will find a few end-to-end examples of some basic data | ||
processing with Ray Datasets on tabular data, text (coming soon!), and imagery (coming | ||
soon!). | ||
|
||
.. panels:: | ||
:container: container pb-4 | ||
:column: col-md-4 px-2 py-2 | ||
:img-top-cls: pt-5 w-75 d-block mx-auto | ||
|
||
--- | ||
:img-top: /images/taxi.png | ||
|
||
+++ | ||
.. link-button:: nyc_taxi_basic_processing | ||
:type: ref | ||
:text: Processing NYC taxi data using Ray Datasets | ||
:classes: btn-link btn-block stretched-link | ||
|
||
Scaling Out Datasets Workloads | ||
------------------------------ | ||
|
||
These examples demonstrate using Ray Datasets on large-scale data over a multi-node Ray | ||
cluster. | ||
|
||
.. panels:: | ||
:container: container pb-4 | ||
:column: col-md-4 px-2 py-2 | ||
:img-top-cls: pt-5 w-75 d-block mx-auto | ||
|
||
--- | ||
:img-top: /images/dataset-repeat-2.svg | ||
|
||
+++ | ||
.. link-button:: big_data_ingestion | ||
:type: ref | ||
:text: Large-scale ML Ingest | ||
:classes: btn-link btn-block stretched-link |
Oops, something went wrong.