diff --git a/doc/source/cluster/faq.rst b/doc/source/cluster/faq.rst index a44cc38177f17..f476995292ba2 100644 --- a/doc/source/cluster/faq.rst +++ b/doc/source/cluster/faq.rst @@ -91,4 +91,15 @@ reported: starting ray to verify that the allocations are as expected. For more detailed information see :ref:`ray-slurm-deploy`. -.. _`known OpenBLAS limitation`: https://github.com/xianyi/OpenBLAS/wiki/faq#how-can-i-use-openblas-in-multi-threaded-applications +.. _`known OpenBLAS limitation`: https://github.com/xianyi/OpenBLAS/wiki/faq#how-can-i-use-openblas-in-multi-threaded-applications + +Where does my Ray Job entrypoint script run? On the head node or worker nodes? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +By default, jobs submitted using the :ref:`Ray Job API ` run +their `entrypoint` script on the head node. You can change this by specifying +any of the options `--entrypoint-num-cpus`, `--entrypoint-num-gpus`, +`--entrypoint-resources` or `--entrypoint-memory` to `ray job submit`, or the +corresponding arguments if using the Python SDK. If these are specified, the +job entrypoint will be scheduled on a node that has the requested resources +available. \ No newline at end of file diff --git a/doc/source/cluster/running-applications/job-submission/quickstart.rst b/doc/source/cluster/running-applications/job-submission/quickstart.rst index 53001cbc08838..78ee0bca61b7d 100644 --- a/doc/source/cluster/running-applications/job-submission/quickstart.rst +++ b/doc/source/cluster/running-applications/job-submission/quickstart.rst @@ -111,12 +111,19 @@ Make sure to specify the path to the working directory in the ``--working-dir`` # Job 'raysubmit_inB2ViQuE29aZRJ5' succeeded # ------------------------------------------ -This command will run the script on the Ray Cluster and wait until the job has finished. Note that it also streams the stdout of the job back to the client (``hello world`` in this case). Ray will also make the contents of the directory passed as `--working-dir` available to the Ray job by downloading the directory to all nodes in your cluster. +This command will run the entrypoint script on the Ray Cluster's head node and wait until the job has finished. Note that it also streams the `stdout` and `stderr` of the entrypoint script back to the client (``hello world`` in this case). Ray will also make the contents of the directory passed as `--working-dir` available to the Ray job by downloading the directory to all nodes in your cluster. .. note:: The double dash (`--`) separates the arguments for the entrypoint command (e.g. `python script.py --arg1=val1`) from the arguments to `ray job submit`. +.. note:: + + By default the entrypoint script is run on the head node. To override this, specify any of the arguments + `--entrypoint-num-cpus`, `--entrypoint-num-gpus`, `--entrypoint-resources`, or + `--entrypoint-memory` to the `ray job submit` command. + See :ref:`Specifying CPU and GPU resources ` for more details. + Interacting with Long-running Jobs ---------------------------------- diff --git a/doc/source/cluster/running-applications/job-submission/sdk.rst b/doc/source/cluster/running-applications/job-submission/sdk.rst index 2517fafa56baa..b3d71af46c830 100644 --- a/doc/source/cluster/running-applications/job-submission/sdk.rst +++ b/doc/source/cluster/running-applications/job-submission/sdk.rst @@ -183,15 +183,19 @@ Using the Python SDK, the syntax looks something like this: For full details, see the :ref:`API Reference `. +.. _ray-job-cpu-gpu-resources: + Specifying CPU and GPU resources -------------------------------- -We recommend doing heavy computation within Ray tasks, actors, or Ray libraries, not directly in the top level of your entrypoint script. +By default, the job entrypoint script always runs on the head node. We recommend doing heavy computation within Ray tasks, actors, or Ray libraries, not directly in the top level of your entrypoint script. No extra configuration is needed to do this. However, if you need to do computation directly in the entrypoint script and would like to reserve CPU and GPU resources for the entrypoint script, you may specify the ``entrypoint_num_cpus``, ``entrypoint_num_gpus``, ``entrypoint_memory`` and ``entrypoint_resources`` arguments to ``submit_job``. These arguments function identically to the ``num_cpus``, ``num_gpus``, ``resources``, and ``_memory`` arguments to ``@ray.remote()`` decorator for tasks and actors as described in :ref:`resource-requirements`. +If any of these arguments are specified, the entrypoint script will be scheduled on a node with at least the specified resources, instead of the head node, which is the default. For example, the following code will schedule the entrypoint script on a node with at least 1 GPU: + .. code-block:: python job_id = client.submit_job(