Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Observability] Fix --follow lost connection when it is used for > 30 seconds #26080

Merged
merged 2 commits into from
Jun 28, 2022

Conversation

rkooo567
Copy link
Contributor

Why are these changes needed?

This PR fixes the issue where --follow lost connection when it is used for > 30 seconds because the gRPC timeout is configured to be 30 seconds, and we don't reset it when --follow is set.

This fixes the issue by setting timeout=None when keepalive==True

Related issue number

Closes #25721

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

# If we keepalive logs connection, we shouldn't have timeout
# otherwise the stream will be terminated forcefully
# after the deadline is expired.
timeout=options.timeout if not keep_alive else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there are multiple timeout: connection time/receive timeout. Does the client allow us to specify connection timeout? For streaming we probably want to handle connection timeout?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think we could specify (connect timeout, read timeout) as a pair. https://requests.readthedocs.io/en/latest/user/advanced/#timeouts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is using gRPC, not requests. I think connection timeout should be automatically handled by gRPC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I think I intended to ask about adding timeout here:

with requests.get(
f"{api_server_url}/api/v0/logs/{media_type}?"
f"{urllib.parse.urlencode(options_dict)}",
stream=True,
) as r:

Or the stream=True makes it blocking forever?

@rkooo567
Copy link
Contributor Author

Ping for the follow up request or approval!

Copy link
Contributor

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# If we keepalive logs connection, we shouldn't have timeout
# otherwise the stream will be terminated forcefully
# after the deadline is expired.
timeout=options.timeout if not keep_alive else None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I think I intended to ask about adding timeout here:

with requests.get(
f"{api_server_url}/api/v0/logs/{media_type}?"
f"{urllib.parse.urlencode(options_dict)}",
stream=True,
) as r:

Or the stream=True makes it blocking forever?

@rkooo567 rkooo567 merged commit 2d58bd5 into ray-project:master Jun 28, 2022
stephanie-wang added a commit that referenced this pull request Jun 28, 2022
stephanie-wang added a commit that referenced this pull request Jun 28, 2022
rkooo567 added a commit to rkooo567/ray that referenced this pull request Jun 28, 2022
rkooo567 added a commit that referenced this pull request Jun 28, 2022
…s used for > 30 seconds" #26162 (#26163)

* Revert "Revert "[Observability] Fix --follow lost connection when it is used for > 30 seconds (#26080)" (#26162)"

This reverts commit 3017128.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Core] [State Observability] --follow lost connection when it is used for > 30 seconds.
3 participants