Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handle ECONNRESET on connected datagram sockets. #2979

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

Lukasa
Copy link
Contributor

@Lukasa Lukasa commented Nov 19, 2024

Motivation:

When a connected datagram socket sends a datagram to a port or host that is not listening, an ICMP Destination Unreachable message may be returned. That message triggers an ECONNRESET to be produced at the socket layer.

On Darwin we handled this well, but on Linux it turned out that this would push us into connection teardown and, eventually, into a crash. Not so good.

To better handle this, we need to distinguish EPOLLERR from EPOLLHUP on datagram sockets. In these cases, we should check whether the socket error was fatal and, if it was not, we should continue our execution having fired the error down the pipeline.

Modifications:

Modify the selector code to distinguish reset and error. Add support for our channels to handle errors.
Have most channels handle errors as resets.
Override the logic for datagram channels to duplicate the logic in readable.
Add a unit test.

Result:

Better datagrams for all.

@Lukasa Lukasa added the semver/patch No public API change. label Nov 19, 2024
Motivation:

When a connected datagram socket sends a datagram to a port or
host that is not listening, an ICMP Destination Unreachable
message may be returned. That message triggers an ECONNRESET
to be produced at the socket layer.

On Darwin we handled this well, but on Linux it turned out that
this would push us into connection teardown and, eventually,
into a crash. Not so good.

To better handle this, we need to distinguish EPOLLERR from
EPOLLHUP on datagram sockets. In these cases, we should check
whether the socket error was fatal and, if it was not, we
should continue our execution having fired the error down the
pipeline.

Modifications:

Modify the selector code to distinguish reset and error.
Add support for our channels to handle errors.
Have most channels handle errors as resets.
Override the logic for datagram channels to duplicate the
logic in readable.
Add a unit test.

Result:

Better datagrams for all.
Comment on lines +1762 to +1767
// Write again.
XCTAssertNoThrow(try self.firstChannel.writeAndFlush(writeData).wait())

// This should trigger an error.
let errors = try self.firstChannel.waitForErrors(count: 1)
XCTAssertEqual((errors[0] as? IOError)?.errnoCode, ECONNREFUSED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really interesting. So the write succeeds since we send out the data on the socket however when the subsequently get the ECONNREFUSED since there was no reachable destination for the packet.

@Lukasa Lukasa merged commit 2a3a333 into apple:main Nov 20, 2024
41 of 42 checks passed
@Lukasa Lukasa deleted the cb-datagrams-are-crimes branch November 20, 2024 10:28
Lukasa added a commit to Lukasa/swift-nio that referenced this pull request Nov 22, 2024
Motivation:

When a connected datagram socket sends a datagram to a port or host that
is not listening, an ICMP Destination Unreachable message may be
returned. That message triggers an ECONNRESET to be produced at the
socket layer.

On Darwin we handled this well, but on Linux it turned out that this
would push us into connection teardown and, eventually, into a crash.
Not so good.

To better handle this, we need to distinguish EPOLLERR from EPOLLHUP on
datagram sockets. In these cases, we should check whether the socket
error was fatal and, if it was not, we should continue our execution
having fired the error down the pipeline.

Modifications:

Modify the selector code to distinguish reset and error. Add support for
our channels to handle errors.
Have most channels handle errors as resets.
Override the logic for datagram channels to duplicate the logic in
readable.
Add a unit test.

Result:

Better datagrams for all.

(cherry picked from commit 2a3a333)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
semver/patch No public API change.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants