Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] [Job] Add notes about where Ray Job entrypoint runs and how to specify it #41319

Merged
merged 6 commits into from
Nov 22, 2023

Conversation

architkulkarni
Copy link
Contributor

Why are these changes needed?

There is recurring user confusion about where the job entrypoint script runs and how to make it run on a worker node.

This PR adds the missing information to the doc in relevant places in the tutorials, and includes it in the FAQ.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@@ -111,12 +111,19 @@ Make sure to specify the path to the working directory in the ``--working-dir``
# Job 'raysubmit_inB2ViQuE29aZRJ5' succeeded
# ------------------------------------------

This command will run the script on the Ray Cluster and wait until the job has finished. Note that it also streams the stdout of the job back to the client (``hello world`` in this case). Ray will also make the contents of the directory passed as `--working-dir` available to the Ray job by downloading the directory to all nodes in your cluster.
This command will run the entrypoint script on the Ray Cluster's head node and wait until the job has finished. Note that it also streams the stdout of the job back to the client (``hello world`` in this case). Ray will also make the contents of the directory passed as `--working-dir` available to the Ray job by downloading the directory to all nodes in your cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It not only streams STDOUT but also STDERR. You can do a simple experiment:

# example.py
import sys

# Print a message to stdout
print("This is a message to stdout")

# Print a message to stderr
print("This is an error message to stderr", file=sys.stderr)
Screen Shot 2023-11-21 at 3 39 49 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I just want to convey that it would stream whatever is normally output to the terminal when you run a command in your local terminal. Should I say "streams the stdout and stderr" or "streams the output"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the former? A lot of commands/tools do not stream stderr by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to "streams the output of the entrypoint script", which should be clear. 61d65ff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, didn't see your message. Updated to "streams the stdout and stderr" ab7950c


.. note::

The double dash (`--`) separates the arguments for the entrypoint command (e.g. `python script.py --arg1=val1`) from the arguments to `ray job submit`.

.. note::

By default the entrypoint script is run on the head node. To override this, specify any of the arguments
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"entrypoint script is run on the head node" => Do you mean the driver process would be running on the head node by default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we say entrypoint script here to convey that it is running whatever the user specifies as entrypoint. Typically this is a script that starts a Ray driver process (ray.init()), but it could also be any command at all, like echo hello && pip install something. It technically doesn't have to involveRay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Short answer, yes, the driver is running on the head node by default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

architkulkarni and others added 5 commits November 21, 2023 15:58
…art.rst

Co-authored-by: Kai-Hsun Chen <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
@architkulkarni architkulkarni merged commit 80a1770 into ray-project:master Nov 22, 2023
9 of 10 checks passed
architkulkarni added a commit that referenced this pull request Nov 23, 2023
Quick follow to #41319

---------

Signed-off-by: angelinalg <[email protected]>
Co-authored-by: Archit Kulkarni <[email protected]>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
…specify it (ray-project#41319)

There is recurring user confusion about where the job entrypoint script runs and how to make it run on a worker node.

This PR adds the missing information to the doc in relevant places in the tutorials, and includes it in the FAQ.

---------

Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Co-authored-by: Kai-Hsun Chen <[email protected]>
ujjawal-khare pushed a commit to ujjawal-khare-27/ray that referenced this pull request Nov 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants