Skip to content

Commit

Permalink
[Data] [Docs] Adding in references to explain how to use credentials …
Browse files Browse the repository at this point in the history
…with Ray Data (ray-project#44205)

Signed-off-by: Matthew Owen <[email protected]>
  • Loading branch information
omatthew98 committed Mar 21, 2024
1 parent 4ee4a69 commit e1d7025
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 3 deletions.
14 changes: 13 additions & 1 deletion doc/source/data/loading-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,10 @@ To read formats other than Parquet, see the :ref:`Input/Output reference <input-
petal.width double
variety string

Ray Data relies on PyArrow for authenticaion with Amazon S3. For more on how to configure
your credentials to be compatible with PyArrow, see their
`S3 Filesytem docs <https://arrow.apache.org/docs/python/filesystems.html#s3>`_.

.. tab-item:: GCS

To read files from Google Cloud Storage, install the
Expand All @@ -227,7 +231,7 @@ To read formats other than Parquet, see the :ref:`Input/Output reference <input-

filesystem = gcsfs.GCSFileSystem(project="my-google-project")
ds = ray.data.read_parquet(
"s3:https://anonymous@ray-example-data/iris.parquet",
"gcs:https://anonymous@ray-example-data/iris.parquet",
filesystem=filesystem
)

Expand All @@ -243,6 +247,10 @@ To read formats other than Parquet, see the :ref:`Input/Output reference <input-
petal.width double
variety string

Ray Data relies on PyArrow for authenticaion with Google Cloud Storage. For more on how
to configure your credentials to be compatible with PyArrow, see their
`GCS Filesytem docs <https://arrow.apache.org/docs/python/filesystems.html#google-cloud-storage-file-system>`_.

.. tab-item:: ABS

To read files from Azure Blob Storage, install the
Expand Down Expand Up @@ -277,6 +285,10 @@ To read formats other than Parquet, see the :ref:`Input/Output reference <input-
petal.width double
variety string

Ray Data relies on PyArrow for authenticaion with Azure Blob Storage. For more on how
to configure your credentials to be compatible with PyArrow, see their
`fsspec-compatible filesystems docs <https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow>`_.

Reading files from NFS
~~~~~~~~~~~~~~~~~~~~~~

Expand Down
15 changes: 13 additions & 2 deletions doc/source/data/saving-data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ with your cloud service provider. Then, call a method like
:meth:`Dataset.write_parquet <ray.data.Dataset.write_parquet>` and specify a URI with
the appropriate scheme. URI can point to buckets or folders.

To write data to formats other than Parquet, read the :ref:`Input/Output reference <input-output>`.

.. tab-set::

.. tab-item:: S3
Expand All @@ -62,6 +64,10 @@ the appropriate scheme. URI can point to buckets or folders.

ds.write_parquet("s3:https://my-bucket/my-folder")

Ray Data relies on PyArrow for authenticaion with Amazon S3. For more on how to configure
your credentials to be compatible with PyArrow, see their
`S3 Filesytem docs <https://arrow.apache.org/docs/python/filesystems.html#s3>`_.

.. tab-item:: GCS

To save data to Google Cloud Storage, install the
Expand All @@ -83,6 +89,10 @@ the appropriate scheme. URI can point to buckets or folders.
filesystem = gcsfs.GCSFileSystem(project="my-google-project")
ds.write_parquet("gcs:https://my-bucket/my-folder", filesystem=filesystem)

Ray Data relies on PyArrow for authenticaion with Google Cloud Storage. For more on how
to configure your credentials to be compatible with PyArrow, see their
`GCS Filesytem docs <https://arrow.apache.org/docs/python/filesystems.html#google-cloud-storage-file-system>`_.

.. tab-item:: ABS

To save data to Azure Blob Storage, install the
Expand All @@ -104,8 +114,9 @@ the appropriate scheme. URI can point to buckets or folders.
filesystem = adlfs.AzureBlobFileSystem(account_name="azureopendatastorage")
ds.write_parquet("az:https://my-bucket/my-folder", filesystem=filesystem)

To write data to formats other than Parquet, read the
:ref:`Input/Output reference <input-output>`.
Ray Data relies on PyArrow for authenticaion with Azure Blob Storage. For more on how
to configure your credentials to be compatible with PyArrow, see their
`fsspec-compatible filesystems docs <https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow>`_.

Writing data to NFS
~~~~~~~~~~~~~~~~~~~
Expand Down

0 comments on commit e1d7025

Please sign in to comment.