Skip to content

Commit

Permalink
[docs] fixing broken references, links, note (ray-project#35694)
Browse files Browse the repository at this point in the history
  • Loading branch information
angelinalg committed May 25, 2023
1 parent 55315e8 commit 98a446b
Show file tree
Hide file tree
Showing 11 changed files with 76 additions and 70 deletions.
1 change: 0 additions & 1 deletion doc/source/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,6 @@ parts:
- file: ray-observability/user-guides/debug-apps/debug-failures
- file: ray-observability/user-guides/debug-apps/optimize-performance
- file: ray-observability/user-guides/debug-apps/ray-debugging
- file: ray-observability/user-guides/debug-apps/ray-core-profiling
- file: ray-observability/user-guides/cli-sdk
- file: ray-observability/user-guides/configure-logging
- file: ray-observability/user-guides/add-app-metrics
Expand Down
42 changes: 0 additions & 42 deletions doc/source/cluster/kubernetes/user-guides/logging.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,48 +144,6 @@ kubectl logs raycluster-complete-logs-head-xxxxx -c fluentbit
[KubDoc]: https://kubernetes.io/docs/concepts/cluster-administration/logging/
[ConfigLink]: https://raw.githubusercontent.com/ray-project/ray/releases/2.4.0/doc/source/cluster/kubernetes/configs/ray-cluster.log.yaml

## Customizing Worker Loggers

When using Ray, all tasks and actors are executed remotely in Ray's worker processes.

:::{note}
To stream logs to a driver, they should be flushed to stdout and stderr.
:::

```python
import ray
import logging
# Initiate a driver.
ray.init()

@ray.remote
class Actor:
def __init__(self):
# Basic config automatically configures logs to
# be streamed to stdout and stderr.
# Set the severity to INFO so that info logs are printed to stdout.
logging.basicConfig(level=logging.INFO)

def log(self, msg):
logger = logging.getLogger(__name__)
logger.info(msg)

actor = Actor.remote()
ray.get(actor.log.remote("A log message for an actor."))

@ray.remote
def f(msg):
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info(msg)

ray.get(f.remote("A log message for a task."))
```

```bash
(Actor pid=179641) INFO:__main__:A log message for an actor.
(f pid=177572) INFO:__main__:A log message for a task.
```
## Using structured logging

The metadata of tasks or actors may be obtained by Ray's :ref:`runtime_context APIs <runtime-context-apis>`.
Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-air/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -216,4 +216,4 @@ Next Steps
- :ref:`air-examples-ref`
- :ref:`API reference <air-api-ref>`
- :ref:`Technical whitepaper <whitepaper>`
- To check how your application is doing, you can use the :ref:`Ray dashboard<robservability-getting-started>`.
- To check how your application is doing, you can use the :ref:`Ray dashboard<observability-getting-started>`.
13 changes: 8 additions & 5 deletions doc/source/ray-observability/key-concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ internal stats (e.g., number of actors in the cluster, number of worker failures
and custom metrics (e.g., metrics defined by users). All stats can be exported as time series data (to Prometheus by default) and used
to monitor the cluster over time.

See :ref:`Ray Metrics <ray-metrics>` for more details.
See :ref:`Ray Metrics <dash-metrics-view>` for more details.

Exceptions
----------
Expand All @@ -93,9 +93,9 @@ See :ref:`Ray Debugger <ray-debugger>` for more details.

Profiling
---------
Ray is compatible with Python profiling tools such as ``CProfile``. It also supports its built-in profiling tool such as :ref:```ray timeline`` <ray-timeline-doc>`.
Ray is compatible with Python profiling tools such as ``CProfile``. It also supports its built-in profiling tool such as :ref:`ray timeline <ray-timeline-doc>`.

See :ref:`Profiling <ray-core-profiling>` for more details.
See :ref:`Profiling <dashboard-cprofile>` for more details.

Tracing
-------
Expand Down Expand Up @@ -166,13 +166,16 @@ Actor log messages look like the following by default.
(MyActor pid=480956) actor log message
.. _logging-directory-structure:

Logging directory structure
~~~~~~~~~~~~~~~~~~~~~~~~~~~

By default, Ray logs are stored in a ``/tmp/ray/session_*/logs`` directory.

..{note}:
The default temp directory is ``/tmp/ray`` (for Linux and MacOS). To change the temp directory, specify it when you call ``ray start`` or ``ray.init()``.
.. note::

The default temp directory is ``/tmp/ray`` (for Linux and MacOS). To change the temp directory, specify it when you call ``ray start`` or ``ray.init()``.

A new Ray instance creates a new session ID to the temp directory. The latest session ID is symlinked to ``/tmp/ray/session_latest``.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ There are currently three metrics supported: Counter, Gauge, and Histogram.
These metrics correspond to the same `Prometheus metric types <https://prometheus.io/docs/concepts/metric_types/>`_.
Below is a simple example of an actor that exports metrics using these APIs:

.. literalinclude:: doc_code/metrics_example.py
.. literalinclude:: ../doc_code/metrics_example.py
:language: python

While the script is running, the metrics are exported to ``localhost:8080`` (this is the endpoint that Prometheus would be configured to scrape).
Expand Down
52 changes: 49 additions & 3 deletions doc/source/ray-observability/user-guides/configure-logging.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Configuring Logging
This guide helps you modify the default configuration of Ray's logging system.


Internal Ray Logging Configuration
Internal Ray logging configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When ``import ray`` is executed, Ray's logger is initialized, generating a sensible configuration given in ``python/ray/_private/log.py``. The default logging level is ``logging.INFO``.

Expand Down Expand Up @@ -40,7 +40,7 @@ Similarly, to modify the logging configuration for any Ray subcomponent, specify
# Here's how to add an aditional file handler for ray tune:
ray_tune_logger.addHandler(logging.FileHandler("extra_ray_tune_log.log"))
For more information about logging in workers, see :ref:`Customizing worker loggers`.
For more information about logging in workers, see :ref:`Customizing worker loggers <customize-worker-loggers>`.

Disabling logging to the driver
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -106,7 +106,7 @@ By default Ray prints Actor logs prefixes in light blue:
Users may instead activate multi-color prefixes by setting the environment variable ``RAY_COLOR_PREFIX=1``.
This will index into an array of colors modulo the PID of each process.

.. image:: ./images/coloring-actor-log-prefixes.png
.. image:: ../images/coloring-actor-log-prefixes.png
:align: center

Distributed progress bars (tqdm)
Expand All @@ -129,3 +129,49 @@ Limitations:

By default, the builtin print will also be patched to use `ray.experimental.tqdm_ray.safe_print` when `tqdm_ray` is used.
This avoids progress bar corruption on driver print statements. To disable this, set `RAY_TQDM_PATCH_PRINT=0`.

.. _customize-worker-loggers:

Customizing worker loggers
~~~~~~~~~~~~~~~~~~~~~~~~~~

When using Ray, all tasks and actors are executed remotely in Ray's worker processes.

.. note::

To stream logs to a driver, they should be flushed to stdout and stderr.

.. code-block:: python
import ray
import logging
# Initiate a driver.
ray.init()
@ray.remote
class Actor:
def __init__(self):
# Basic config automatically configures logs to
# be streamed to stdout and stderr.
# Set the severity to INFO so that info logs are printed to stdout.
logging.basicConfig(level=logging.INFO)
def log(self, msg):
logger = logging.getLogger(__name__)
logger.info(msg)
actor = Actor.remote()
ray.get(actor.log.remote("A log message for an actor."))
@ray.remote
def f(msg):
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
logger.info(msg)
ray.get(f.remote("A log message for a task."))
.. code-block:: bash
(Actor pid=179641) INFO:__main__:A log message for an actor.
(f pid=177572) INFO:__main__:A log message for a task.
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ it will raise an exception with one of the following error messages (which indic
Also, you can use the `dmesg <https://phoenixnap.com/kb/dmesg-linux#:~:text=The%20dmesg%20command%20is%20a,take%20place%20during%20system%20startup.>`_ CLI command to verify the processes are killed by the Linux out-of-memory killer.

.. image:: ../images/dmsg.png
.. image:: ../../images/dmsg.png
:align: center

If the worker is killed by Ray's memory monitor, they are automatically retried (see the :ref:`link <ray-oom-retry-policy>` for the detail).
Expand Down Expand Up @@ -130,10 +130,10 @@ Ray memory monitor also periodically prints the aggregated out-of-memory killer
Ray Dashboard's :ref:`metrics page <dash-metrics-view>` and :ref:`event page <dash-event>` also provides the out-of-memory killer-specific events and metrics.

.. image:: ../images/oom-metrics.png
.. image:: ../../images/oom-metrics.png
:align: center

.. image:: ../images/oom-events.png
.. image:: ../../images/oom-events.png
:align: center

.. _troubleshooting-out-of-memory-task-actor-mem-usage:
Expand All @@ -150,7 +150,7 @@ The memory usage from the per component graph uses RSS - SHR. See the below for

Alternatively, you can also use the CLI command `htop <https://htop.dev/>`_.

.. image:: ../images/htop.png
.. image:: ../../images/htop.png
:align: center

See the ``allocate_memory`` row. See two columns, RSS and SHR.
Expand All @@ -173,12 +173,12 @@ Head Node Out-of-Memory Error

First, check the head node memory usage from the metrics page. Find the head node address from the cluster page.

.. image:: ../images/head-node-addr.png
.. image:: ../../images/head-node-addr.png
:align: center

And then check the memory usage from the head node from the node memory usage view inside the Dashboard :ref:`metrics view <dash-metrics-view>`.

.. image:: ../images/metrics-node-view.png
.. image:: ../../images/metrics-node-view.png
:align: center

Ray head node has more memory-demanding system components such as GCS or the dashboard.
Expand All @@ -201,10 +201,10 @@ You can verify it by looking at the :ref:`per task and actor memory usage graph
First, see the memory usage of a ``allocate_memory`` task. It is total 18GB.
At the same time, you can verify 15 concurrent tasks running.

.. image:: ../images/component-memory.png
.. image:: ../../images/component-memory.png
:align: center

.. image:: ../images/tasks-graph.png
.. image:: ../../images/tasks-graph.png
:align: center

It means each task uses about 18GB / 15 == 1.2 GB. To reduce the parallelism,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
.. _ray-core-profiling:

.. _ray-core-mem-profiling:

Debugging Memory Issues
Expand All @@ -22,7 +20,7 @@ This will allow you to download profiling files from other nodes.

.. tab-item:: Actors

.. literalinclude:: ../doc_code/memray_profiling.py
.. literalinclude:: ../../doc_code/memray_profiling.py
:language: python
:start-after: __memray_profiling_start__
:end-before: __memray_profiling_end__
Expand All @@ -31,19 +29,19 @@ This will allow you to download profiling files from other nodes.

Note that tasks have a shorter lifetime, so there could be lots of memory profiling files.

.. literalinclude:: ../doc_code/memray_profiling.py
.. literalinclude:: ../../doc_code/memray_profiling.py
:language: python
:start-after: __memray_profiling_task_start__
:end-before: __memray_profiling_task_end__

Once the task or actor runs, go to the :ref:`Logs View <dash-logs-view>` of the dashboard. Find and click the log file name.

.. image:: ../images/memory-profiling-files.png
.. image:: ../../images/memory-profiling-files.png
:align: center

Click the download button.

.. image:: ../images/download-memory-profiling-files.png
.. image:: ../../images/download-memory-profiling-files.png
:align: center

Now, you have the memory profiling file. Running
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
(observability-user-guides)=
(observability-debug-apps)=

# Troubleshooting Applications

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ Then open `chrome:https://tracing`_ in the Chrome web browser, and load
Python CPU Profiling in the Dashboard
-------------------------------------

The :ref:`ray-dashboard` lets you profile Ray worker processes by clicking on the "Stack Trace" or "CPU Flame Graph"
The :ref:`Ray dashboard <observability-getting-started>` lets you profile Ray worker processes by clicking on the "Stack Trace" or "CPU Flame Graph"
actions for active workers, actors, and jobs.

.. image:: /images/profile.png
Expand Down Expand Up @@ -119,6 +119,8 @@ not have root permissions, the dashboard will prompt with instructions on how to
Alternatively, you can start Ray with passwordless sudo / root permissions.
.. _dashboard-cprofile:

Profiling Using Python's CProfile
---------------------------------

Expand Down
2 changes: 1 addition & 1 deletion doc/source/ray-observability/user-guides/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ These guides help you monitor and debug your Ray applications and clusters.

The guides include:
* {ref}`observability-general-troubleshoot`
* {ref}`observability-user-guides`
* {ref}`observability-debug-apps`
* {ref}`observability-programmatic`
* {ref}`configure-logging`
* {ref}`application-level-metrics`
Expand Down

0 comments on commit 98a446b

Please sign in to comment.