Skip to content

Commit

Permalink
[doc] Bump dataset to beta for 1.8 and add backlink to SGD (ray-proje…
Browse files Browse the repository at this point in the history
  • Loading branch information
ericl committed Oct 13, 2021
1 parent df6d06b commit 430a5f4
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Ray is packaged with the following libraries for accelerating machine learning w
- `Tune`_: Scalable Hyperparameter Tuning
- `RLlib`_: Scalable Reinforcement Learning
- `RaySGD <https://docs.ray.io/en/master/raysgd/raysgd.html>`__: Distributed Training Wrappers
- `Datasets`_: Flexible Distributed Data Loading (alpha)
- `Datasets`_: Flexible Distributed Data Loading (beta)

As well as libraries for taking ML and distributed apps to production:

Expand Down
4 changes: 2 additions & 2 deletions doc/source/data/dataset.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Datasets: Flexible Distributed Data Loading

.. tip::

Datasets is available as **alpha** in Ray 1.6+. Please file feature requests and bug reports on GitHub Issues or join the discussion on the `Ray Slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`__.
Datasets is available as **beta** in Ray 1.8+. Please file feature requests and bug reports on GitHub Issues or join the discussion on the `Ray Slack <https://forms.gle/9TSdDYUgxYs8SA9e8>`__.

Ray Datasets are the standard way to load and exchange data in Ray libraries and applications. Datasets provide basic distributed data transformations such as ``map``, ``filter``, and ``repartition``, and are compatible with a variety of file formats, datasources, and distributed frameworks.

Expand All @@ -16,7 +16,7 @@ Ray Datasets are the standard way to load and exchange data in Ray libraries and
Concepts
--------
Ray Datasets implement `Distributed Arrow <https://arrow.apache.org/>`__. A Dataset consists of a list of Ray object references to *blocks*. Each block holds a set of items in either an `Arrow table <https://arrow.apache.org/docs/python/data.html#tables>`__ or a Python list (for Arrow incompatible objects). Having multiple blocks in a dataset allows for parallel transformation and ingest of the data.
Ray Datasets implement `Distributed Arrow <https://arrow.apache.org/>`__. A Dataset consists of a list of Ray object references to *blocks*. Each block holds a set of items in either an `Arrow table <https://arrow.apache.org/docs/python/data.html#tables>`__ or a Python list (for Arrow incompatible objects). Having multiple blocks in a dataset allows for parallel transformation and ingest of the data (e.g., into :ref:`Ray SGD <sgd-v2-docs>` for ML training).

The following figure visualizes a Dataset that has three Arrow table blocks, each block holding 1000 rows each:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-libraries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Ray also comes packaged with several libraries solving problems in the machine l
- :doc:`../tune/index`
- :ref:`rllib-index`
- :ref:`sgd-index`
- :ref:`datasets` (alpha)
- :ref:`datasets` (beta)

As well as libraries for taking ML and distributed apps to production:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-overview/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ On top of **Ray Core** are several libraries for solving problems in machine lea
- :doc:`../tune/index`
- :ref:`rllib-index`
- :ref:`sgd-index`
- :ref:`datasets` (alpha)
- :ref:`datasets` (beta)

As well as libraries for taking ML and distributed apps to production:

Expand Down

0 comments on commit 430a5f4

Please sign in to comment.