Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics] Add easy-install script for Prometheus #42359

Merged

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Jan 12, 2024

Why are these changes needed?

This PR simplifies some of the manual steps currently required for first-time users setting up Prometheus with Ray.

This PR adds an easy install script that automatically downloads and runs Prometheus with the appropriate configuration to scrape from Ray. It's for first-time users to run a quick demo of metrics and is not intended for production use.

TODO before merge:

  • Figure out the best place to host the install script. Telling the user to copy from site-packages/ray/... is a bit weird. [Solution: Add it to the Ray CLI as ray metrics launch-prometheus]
  • Figure out why prometheus metrics query doesn't work after ray stop. I would expect this to just work. [Solution: It does work, you just have to view "Graph", or view "Table" after choosing a timestamp from when Ray was up.]
  • Testing strategy

Possible followups:

  • Verify checksum
  • Detect if prometheus is already running and gracefully handle it

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
@architkulkarni architkulkarni marked this pull request as ready for review January 12, 2024 01:18
@architkulkarni
Copy link
Contributor Author

@alanwguo This PR is ready for some early feedback. The remaining work is listed in the PR description.

Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
@architkulkarni architkulkarni linked an issue Jan 18, 2024 that may be closed by this pull request
Comment on lines +37 to +56
response = request_method(url, stream=True)
response.raise_for_status()

total_size_in_bytes = int(response.headers.get("content-length", 0))
total_size_in_mb = total_size_in_bytes / (1024 * 1024)

downloaded_size_in_mb = 0
block_size = DOWNLOAD_BLOCK_SIZE

with open(filename, "wb") as file:
for chunk in response.iter_content(chunk_size=block_size):
file.write(chunk)
downloaded_size_in_mb += len(chunk) / (1024 * 1024)
print(
f"Downloaded: {downloaded_size_in_mb:.2f} MB / "
f"{total_size_in_mb:.2f} MB",
end="\r",
)

print("\nDownload completed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a higher level library for downloading byte data into files?
This feels more manual than necessary.

I would think somethign like library.download_file(url, out_path) would already exist in one of the libraries we already have in our environment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one used by Ray Data, but it doesn't provide a percent loading indicator, and I'd like to avoid the Ray Data dependency. I think the percent indicator is useful otherwise it looks like the script is hanging (50mb takes seconds or 1-2 minutes depending on the internet speed). I couldn't find a less manual way of getting a basic progress indicator.

config_file = Path(PROMETHEUS_CONFIG_INPUT_PATH)

if not config_file.exists():
raise FileNotFoundError(f"Prometheus config file not found: {config_file}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration only gets created once ray starts. It might make sense to see if we can move the config generation logic to this file instead since this runs before Ray starts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline:

  • Currently in Ray, "creating the configuration" just consists of copying a hardcoded config from the site-packages Ray directory into some location in /tmp/ray. In this PR, we pull the config file from the first location, so we don't actually have to run Ray first.
  • However, if in the future we make a change to Ray that makes updates to the hardcoded config upon ray start before copying it to /tmp/ray, then install_and_start_prometheus.py won't have the updated config. So we should at the very least add code comments in the install script and in the "Ray creating the config" step to make sure future maintainers take this into account. I'll update the PR with these comments.

logging.error("Installation failed.")
sys.exit(1)

# TODO: Add a check to see if Prometheus is already running
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do this TODO. Maybe just a simple check of the prometheus port?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will file a followup PR for this. I remember Prometheus prints a reasonable error if the port is already in use, so it should be okay.

@architkulkarni architkulkarni added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jan 23, 2024
Copy link
Contributor

@angelinalg angelinalg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great writing!

@architkulkarni architkulkarni merged commit 44db192 into ray-project:master Jan 25, 2024
9 checks passed
khluu pushed a commit that referenced this pull request Jan 27, 2024
This PR simplifies some of the manual steps currently required for first-time users setting up Prometheus with Ray.

This PR adds an easy install script that automatically downloads and runs Prometheus with the appropriate configuration to scrape from Ray. It's for first-time users to run a quick demo of metrics and is not intended for production use.

---------

Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: khluu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Easy metrics setup: install prometheus
3 participants