-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics] Add easy-install script for Prometheus #42359
[Metrics] Add easy-install script for Prometheus #42359
Conversation
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
@alanwguo This PR is ready for some early feedback. The remaining work is listed in the PR description. |
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…lkarni/ray into easy-install-prometheus Signed-off-by: Archit Kulkarni <[email protected]>
response = request_method(url, stream=True) | ||
response.raise_for_status() | ||
|
||
total_size_in_bytes = int(response.headers.get("content-length", 0)) | ||
total_size_in_mb = total_size_in_bytes / (1024 * 1024) | ||
|
||
downloaded_size_in_mb = 0 | ||
block_size = DOWNLOAD_BLOCK_SIZE | ||
|
||
with open(filename, "wb") as file: | ||
for chunk in response.iter_content(chunk_size=block_size): | ||
file.write(chunk) | ||
downloaded_size_in_mb += len(chunk) / (1024 * 1024) | ||
print( | ||
f"Downloaded: {downloaded_size_in_mb:.2f} MB / " | ||
f"{total_size_in_mb:.2f} MB", | ||
end="\r", | ||
) | ||
|
||
print("\nDownload completed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a higher level library for downloading byte data into files?
This feels more manual than necessary.
I would think somethign like library.download_file(url, out_path)
would already exist in one of the libraries we already have in our environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one used by Ray Data, but it doesn't provide a percent loading indicator, and I'd like to avoid the Ray Data dependency. I think the percent indicator is useful otherwise it looks like the script is hanging (50mb takes seconds or 1-2 minutes depending on the internet speed). I couldn't find a less manual way of getting a basic progress indicator.
config_file = Path(PROMETHEUS_CONFIG_INPUT_PATH) | ||
|
||
if not config_file.exists(): | ||
raise FileNotFoundError(f"Prometheus config file not found: {config_file}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This configuration only gets created once ray starts. It might make sense to see if we can move the config generation logic to this file instead since this runs before Ray starts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline:
- Currently in Ray, "creating the configuration" just consists of copying a hardcoded config from the site-packages Ray directory into some location in
/tmp/ray
. In this PR, we pull the config file from the first location, so we don't actually have to run Ray first. - However, if in the future we make a change to Ray that makes updates to the hardcoded config upon
ray start
before copying it to/tmp/ray
, theninstall_and_start_prometheus.py
won't have the updated config. So we should at the very least add code comments in the install script and in the "Ray creating the config" step to make sure future maintainers take this into account. I'll update the PR with these comments.
logging.error("Installation failed.") | ||
sys.exit(1) | ||
|
||
# TODO: Add a check to see if Prometheus is already running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do this TODO. Maybe just a simple check of the prometheus port?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will file a followup PR for this. I remember Prometheus prints a reasonable error if the port is already in use, so it should be okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great writing!
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
…lkarni/ray into easy-install-prometheus Signed-off-by: Archit Kulkarni <[email protected]>
This PR simplifies some of the manual steps currently required for first-time users setting up Prometheus with Ray. This PR adds an easy install script that automatically downloads and runs Prometheus with the appropriate configuration to scrape from Ray. It's for first-time users to run a quick demo of metrics and is not intended for production use. --------- Signed-off-by: Archit Kulkarni <[email protected]> Signed-off-by: khluu <[email protected]>
Why are these changes needed?
This PR simplifies some of the manual steps currently required for first-time users setting up Prometheus with Ray.
This PR adds an easy install script that automatically downloads and runs Prometheus with the appropriate configuration to scrape from Ray. It's for first-time users to run a quick demo of metrics and is not intended for production use.
TODO before merge:
site-packages/ray/...
is a bit weird. [Solution: Add it to the Ray CLI asray metrics launch-prometheus
]ray stop
. I would expect this to just work. [Solution: It does work, you just have to view "Graph", or view "Table" after choosing a timestamp from when Ray was up.]Possible followups:
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.