Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. Fixes #127018 #127300

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

biabbas
Copy link

@biabbas biabbas commented Jul 4, 2024

Fixes #127018
Connect_timeout would call poll and check pollfd.revents for POLLHUP error, rather that checking readiness. This behavior was meant for Linux as it returns POLLHUP | POLLOUT | POLLERR in case of errors. But on targets that do not return POLLHUP in pollfd.revents, this would indicate a false success and result in this issue. To resolve this we will check readiness of socket using getsockopt(): and return success from connect_timeout when there are no errors.
Changes were tested on Linux and an rtos.
Screenshot 2024-07-04 105820
Thank you.

@rustbot
Copy link
Collaborator

rustbot commented Jul 4, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Nilstrieb (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jul 4, 2024
@rustbot rustbot assigned jhpratt and unassigned Nilstrieb Jul 4, 2024
@jhpratt
Copy link
Member

jhpratt commented Jul 4, 2024

Not sure why I was explicitly requested as a reviewer. I don't have the necessary context/knowledge in any situation.

Edit: sorry for the spam. GitHub mobile didn't show the reassignments for whatever reason :/

@Nilstrieb
Copy link
Member

I don't know enough about this area either. maybe you do?
r? the8472

@rustbot rustbot assigned the8472 and unassigned Nilstrieb Jul 14, 2024
Comment on lines +216 to 220
// Check if the connnection actually succeeded and return ok only when
// the socket is ready and no errors were found
if let Some(e) = self.take_error()? {
return Err(e);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to change this for all platforms, since #127272 (precursor) and #127018 seem to indicate this is only on vxworks (or are there others you know of?)

Suggested change
// Check if the connnection actually succeeded and return ok only when
// the socket is ready and no errors were found
if let Some(e) = self.take_error()? {
return Err(e);
}
if cfg!(target_os = "vxworks") {
// [comment about vxworks not using POLLHUP]
if let Some(e) = self.take_error()? {
return Err(e);
}
} else {
/* original impl */
}

Also just for reference, if you put Fixes #127018 in a line of your PR description then it will close the issue when this merges.

Copy link
Author

@biabbas biabbas Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tgross35 I don't think that changes I made, will impact other platforms. For code simplicity, I did not make these changes specific for Vxworks. Not sure which approach is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also do not understand the details here but to me it seems like the difference between returning Err on all errors vs. only on hangups must be intentional. So I am hesitant to change this without having a better picture of what is going on.

@sfackler it looks like you authored this bit of code in #43062 and updated it in #45269. What is the reasoning to check for POLLOUT/POLLHUP rather than just always returning an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472 might have a better idea here too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that code 7 years ago - I do not remember the specific context around it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a shot, thanks for taking a look.

Copy link
Contributor

@tgross35 tgross35 Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few relevant things from https://man7.org/linux/man-pages/man2/poll.2.html:

The field events is an input parameter ...
This field may be specified as zero, in which case the only
events that can be returned in revents are POLLHUP, POLLERR, and
POLLNVAL (see below).

We probably don't need to set POLLOUT anymore as we aren't checking on it since #45269.

       POLLERR
              Error condition (only returned in revents; ignored in
              events).  This bit is also set for a file descriptor
              referring to the write end of a pipe when the read end has
              been closed.

       POLLHUP
              Hang up (only returned in revents; ignored in events).
              Note that when reading from a channel such as a pipe or a
              stream socket, this event merely indicates that the peer
              closed its end of the channel.  Subsequent reads from the
              channel will return 0 (end of file) only after all
              outstanding data in the channel has been consumed.

       POLLNVAL
              Invalid request: fd not open (only returned in revents;
              ignored in events).

From #45265 (comment), it sounds like in reality it sets all three.

The manpage doesn't say anything about setting an error to SO_ERROR. I would assume this is always true, but it does seem safer to have the fallback in place. It also seems a bit weird to me that we are checking only against POLLHUP and not a POLLHUP | POLLNVAL | POLLERR mask, since these are all documented error conditions (as you pointed out below. Possibly relevant only for files, but without checking kernel source seems uncertain).

Perhaps I just don't have all the context here, if you find docs on the poll / SO_ERROR relationship then please do link it.


So: I wouldn't really want to change the existing Linux handling to only check take_error because it means going from documented behavior to empirical behavior, even if has proven correct. In which case I would prefer the cfg suggestion above to the current proposal.

However I think this code could be cleaned up a bit with the following, which should work on your system too:

  1. Change the libc::pollfd struct constructed above to have events as 0 rather than POLLOUT, since we are no longer checking POLLOUT (since Fix TcpStream::connect_timeout on linux #45269)

  2. Because we are no longer have a possible POLLOUT revent, the possible results of libc::poll are:

    • -1: error via errno
    • 0: success
    • 3 (Linux), something else (vxworks): emitted revents, POLLHUP | POLLNVAL | POLLERR
  3. Since we can no longer have any successful revents (previously POLLOUT), the last match arm can be changed to unconditionally emit an error (without checking revents or Some(e) = take_error). So just remove the if from the existing code so it always emits self.take_error()?.unwrap_or_else(|| { ... }).

    That is: that branch of the match will now always try to read the SO_ERROR, which works on both Linux and vxworks. If that fails, meaning revents were set but there is no SO_ERROR for some unexpected reason, it will emit an Uncategorized error (can never happen on vxworks per std::net::TcpStream::connect_timeout on vxworks #127018, but maybe possible on linux?).

  4. If you feel like it, debug_assert! that the revents are something we expect so we learn if this isn't accurate.

If that tests well, then I think it might be a more robust option on all systems. If you make the changes please comment it well, somebody else is going to have to reverse engineer this one day...


Doesn't matter if the above works, but does Vxworks set POLLNVAL or POLLERR rather than a mask including POLLHUP?

@tgross35
Copy link
Contributor

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 19, 2024
@biabbas
Copy link
Author

biabbas commented Jul 22, 2024

The original implementation also calls take_error call incase it detects error in revents. But does not call self.take_error() when POLLHUP is not reflected in revents. Reading readiness of the socket would actually detect more errors. The previous implementation would only detect refused connections(POLLHUP). This seems better for all platforms including Linux.

@rustbot review

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 22, 2024
@biabbas biabbas changed the title Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. fixes#127018 Jul 22, 2024
@biabbas biabbas changed the title Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. fixes#127018 Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. Fixes #127018 Jul 22, 2024
@biabbas biabbas changed the title Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. Fixes #127018 Fix connect timeout for non-linux targets, read readiness of socket connection, Read readiness to detect errors. Fixes #127018 Jul 22, 2024
@biabbas
Copy link
Author

biabbas commented Jul 22, 2024

r? @tgross35

@rustbot rustbot assigned tgross35 and unassigned the8472 Jul 22, 2024
@biabbas
Copy link
Author

biabbas commented Jul 22, 2024

r? @tgross35

@rustbot
Copy link
Collaborator

rustbot commented Jul 22, 2024

Error: Parsing assign command in comment failed: ...' ' | error: specify user to assign to at >| ''...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@rustbot rustbot assigned joboet and tgross35 and unassigned tgross35 and joboet Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

std::net::TcpStream::connect_timeout on vxworks
9 participants