[train+tune][doc] Remove docs sections recommending `RAY_AIR_LOCAL_CA…

…CHE_DIR` (#44284) Removes docs that recommend using the `RAY_AIR_LOCAL_CACHE_DIR` env variable. --------- Signed-off-by: Justin Yu <[email protected]>
ray-project · Apr 1, 2024 · 731b53d · 731b53d
1 parent 834eb29
commit 731b53d
Show file tree

Hide file tree

Showing 4 changed files with 61 additions and 58 deletions.
diff --git a/doc/source/train/user-guides/persistent-storage.rst b/doc/source/train/user-guides/persistent-storage.rst
@@ -337,10 +337,11 @@ In the example above, we saved some artifacts within the training loop to the wo
 If you were training a stable diffusion model, you could save
 some sample generated images every so often as a training artifact.
 
-By default, the worker's current working directory is set to the local version of the "trial directory."
-For example, this would be ``~/ray_results/experiment_name/TorchTrainer_46367_00000_0_...`` in the example above.
-See :ref:`below <train-working-directory>` for how to disable this change in the working directory,
-if you want your training workers to keep their original working directories.
+By default, Ray Train changes the current working directory of each worker to be inside the run's
+:ref:`local staging directory <train-local-staging-dir>`.
+This way, all distributed training workers share the same absolute path as the working directory.
+See :ref:`below <train-working-directory>` for how to disable this default behavior,
+which is useful if you want your training workers to keep their original working directories.
 
 If :class:`RunConfig(SyncConfig(sync_artifacts=True)) <ray.train.SyncConfig>`, then
 all artifacts saved in this directory will be persisted to storage.
@@ -378,25 +379,57 @@ Note that this behavior is off by default.
  ...
 
 
+.. _train-storage-advanced:
+
 Advanced configuration
 ----------------------
 
-Setting the intermediate local directory
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.. _train-local-staging-dir:
 
-When a ``storage_path`` is specified, training outputs are saved to an
-*intermediate local directory*, then persisted (copied/uploaded) to the ``storage_path``.
-By default, this intermediate local directory is a sub-directory of ``~/ray_results``.
+Setting the local staging directory
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Customize this intermediate local directory with the ``RAY_AIR_LOCAL_CACHE_DIR`` environment variable:
+.. warning::
 
-.. testcode::
- :skipif: True
+ Prior to 2.10, the ``RAY_AIR_LOCAL_CACHE_DIR`` environment variable and ``RunConfig(local_dir)``
+ were ways to configure the local staging directory to be outside of the home directory (``~/ray_results``).
 
- import os
- os.environ["RAY_AIR_LOCAL_CACHE_DIR"] = "/tmp/custom/"
+ **These configurations are no longer used to configure the local staging directory.
+ Please instead use** ``RunConfig(storage_path)`` **to configure where your
+ run's outputs go.**
+
+
+Apart from files such as checkpoints written directly to the ``storage_path``,
+Ray Train also writes some logfiles and metadata files to an intermediate
+*local staging directory* before they get persisted (copied/uploaded) to the ``storage_path``.
+The current working directory of each worker is set within this local staging directory.
+
+By default, the local staging directory is a sub-directory of the Ray session
+directory (e.g., ``/tmp/ray/session_latest``), which is also where other temporary Ray files are dumped.
+
+Customize the location of the staging directory by :ref:`setting the location of the
+temporary Ray session directory <temp-dir-log-files>`.
+
+Here's an example of what the local staging directory looks like:
+
+.. code-block:: text
+
+ /tmp/ray/session_latest/artifacts/<ray-train-job-timestamp>/
+ └── experiment_name
+ ├── driver_artifacts <- These are all uploaded to storage periodically
+ │ ├── Experiment state snapshot files needed for resuming training
+ │ └── Metrics logfiles
+ └── working_dirs <- These are uploaded to storage if `SyncConfig(sync_artifacts=True)`
+ └── Current working directory of training workers, which contains worker artifacts
+
+.. warning::
+
+ You should not need to look into the local staging directory.
+ The ``storage_path`` should be the only path that you need to interact with.
+
+ The structure of the local staging directory is subject to change
+ in future versions of Ray Train -- do not rely on these local staging files in your application.
 
- ...
 
 .. _train-working-directory:
 
@@ -439,9 +472,8 @@ directory you launched the training script from.
  # NOTE: The working directory is copied to each worker and is read only.
  assert os.path.exists("./data.txt"), os.getcwd()
 
- # If `SyncConfig(sync_artifacts=True)`, write artifacts that you want to
- # persist in the trial directory.
- # Artifacts written in the current working directory will NOT be persisted.
+ # To use artifact syncing with `SyncConfig(sync_artifacts=True)`,
+ # write artifacts here, instead of the current working directory:
  ray.train.get_context().get_trial_dir()
 
  trainer = TorchTrainer(
@@ -466,5 +498,5 @@ environment variable.
 For instance, if you set ``RAY_STORAGE="s3:https://my_bucket/train_results"``, your
 results will automatically persisted there.
 
-If you manually set a :attr:`RunConfig.storage_path <ray.train.RunConfig.storage_path>`, it
-will take precedence over this environment variable.
+If you manually set a :attr:`RunConfig.storage_path <ray.train.RunConfig.storage_path>`,
+it will take precedence over this environment variable.
diff --git a/doc/source/tune/doc_code/faq.py b/doc/source/tune/doc_code/faq.py
@@ -224,17 +224,6 @@ def f(config, data=None):
 tuner.fit()
 # __large_data_end__
 
-MyTrainableClass = None
-
-if not MOCK:
- # __log_1_start__
- tuner = tune.Tuner(
- MyTrainableClass,
- run_config=train.RunConfig(storage_path="s3:https://my-log-dir"),
- )
- tuner.fit()
- # __log_1_end__
-
 
 import ray
 

diff --git a/doc/source/tune/faq.rst b/doc/source/tune/faq.rst
@@ -570,17 +570,7 @@ be automatically fetched and passed to your trainable as a parameter.
 How can I upload my Tune results to cloud storage?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-If an upload directory is provided, Tune will automatically sync results from the ``RAY_AIR_LOCAL_CACHE_DIR`` to the given directory,
-natively supporting standard URIs for systems like S3, gsutil or HDFS. You can add more filesystems by installing
-`fs-spec <https://filesystem-spec.readthedocs.io/en/latest/>`_-compatible filesystems e.g. using pip.
-
-Here is an example of uploading to S3, using a bucket called ``my-log-dir``:
-
-.. literalinclude:: doc_code/faq.py
- :dedent:
- :language: python
- :start-after: __log_1_start__
- :end-before: __log_1_end__
+See :ref:`tune-cloud-checkpointing`.
 
 Make sure that worker nodes have the write access to the cloud storage.
 Failing to do so would cause error messages like ``Error message (1): fatal error: Unable to locate credentials``.

diff --git a/doc/source/tune/tutorials/tune-storage.rst b/doc/source/tune/tutorials/tune-storage.rst
@@ -169,13 +169,6 @@ that implements saving and loading checkpoints.
  from ray import train, tune
  from your_module import my_trainable
 
- # Look for the existing cluster and connect to it
- ray.init()
-
- # Set the local caching directory. Results will be stored here
- # before they are synced to remote storage. This env variable is ignored
- # if `storage_path` below is set to a local directory.
- os.environ["RAY_AIR_LOCAL_CACHE_DIR"] = "/tmp/mypath"
 
  tuner = tune.Tuner(
  my_trainable,
@@ -198,14 +191,7 @@ that implements saving and loading checkpoints.
  # This starts the run!
  results = tuner.fit()
 
-In this example, here's how trial checkpoints will be saved:
-
-- On head node where we are running from:
- - ``/tmp/mypath/my-tune-exp/<trial_name>/checkpoint_<step>`` (but only for trials running on this node)
-- On worker nodes:
- - ``/tmp/mypath/my-tune-exp/<trial_name>/checkpoint_<step>`` (but only for trials running on this node)
-- S3:
- - ``s3:https://my-checkpoints-bucket/path/my-tune-exp/<trial_name>/checkpoint_<step>`` (all trials)
+In this example, trial checkpoints will be saved to: ``s3:https://my-checkpoints-bucket/path/my-tune-exp/<trial_name>/checkpoint_<step>``
 
 .. _tune-syncing-restore-from-uri:
 
@@ -218,7 +204,7 @@ you can resume it any time starting from the experiment state saved in the cloud
  tuner = tune.Tuner.restore(
  "s3:https://my-checkpoints-bucket/path/my-tune-exp",
  trainable=my_trainable,
- resume_errored=True
+ resume_errored=True,
  )
  tuner.fit()
 
@@ -228,3 +214,9 @@ There are a few options for restoring an experiment:
 Please see the documentation of
 :meth:`Tuner.restore() <ray.tune.tuner.Tuner.restore>` for more details.
 
+
+Advanced configuration
+----------------------
+
+See :ref:`Ray Train's section on advanced storage configuration <train-storage-advanced>`.
+All of the configurations also apply to Ray Tune.