make Path MTU Discovery resilient to random packet loss #4545

marten-seemann · 2024-06-02T05:36:00Z

We now tolerate the random (i.e. not MTU-related) loss of up to probe packets. This should make DPLPMTUD resilient to packet loss. This is relevant since MTU discovery is started at the beginning of the connection, where packet loss is likely once the congestion slow start period reaches the bottleneck bandwidth.

See code for a detailed description of the algorithm used.

The algorithm contains a tunable parameter describing how many random packet losses it tolerates. For the special case of 0, the new algorithm is equivalent to the old algorithm.

For a search of randomly chosen MTUs between 1200 and 1500 bytes (no random packet loss), this is how this parameter influences the progression of the algorithm:

tolerance to N packet losses	probe packets sent	difference from real MTU
0	4	8.9
1	6	5.27
2	8.5	1.87
3	11.2	0.85

In this PR, we choose the N=2 variant.

It would be desirable to decrease the number of probe packets sent, and the average difference from the real MTU would justify doing that. However, I don't see an obvious way to do so without increasing the worst case behavior (the worst case difference from the real MTU).

codecov · 2024-06-02T05:41:07Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.19%. Comparing base (375fc59) to head (b593c5e).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #4545      +/-   ##
==========================================
+ Coverage   85.15%   85.19%   +0.04%     
==========================================
  Files         154      154              
  Lines       14800    14830      +30     
==========================================
+ Hits        12602    12634      +32     
+ Misses       1690     1689       -1     
+ Partials      508      507       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bt90 · 2024-06-02T14:23:08Z

mtu_discoverer.go

+//
+// We can't conclude that the path doesn't support this packet size, since the loss of the probe
+// packet could have been unrelated to the packet size. A larger probe packet will be sent later on.
+// After a loss, the next probe packet has size (min+lost[0])/2.


Recommendation from the RFC:

Implementations could optimize the search procedure by selecting step sizes from a table of common PMTU sizes.

Maybe we could select the nearest common PMTU value for the result of this calculation?

I'm not sure if there's much to be gained from this. We definitely don't want to jump to a higher value than we have confirmed, just because it's a common value. This might blackhole the entire connection.

And if I understand https://www.hb.fh-muenster.de/opus4/frontdoor/deliver/index/docId/14965/file/dplpmtudQuicPaper.pdf correctly, there's not too much benefit, compared to a binary search.

Also keep in mind that as more and more traffic is proxied (via MASQUE and friends), the value of "common values" will decrease further.

And if I understand https://www.hb.fh-muenster.de/opus4/frontdoor/deliver/index/docId/14965/file/dplpmtudQuicPaper.pdf correctly, there's not too much benefit, compared to a binary search.

True. But they also mention limiting the precision to a multiple of four bytes:

Thus, considering only PMTU candidates that
are multiples of four is a reduction without losing precision in most
cases.

That's a good idea. Very similar, we could skip steps that would lead to increases smaller than 4 bytes.

I'll go ahead and merge this PR. We can do this as a follow up.

bt90 · 2024-06-03T10:13:18Z

mtu_discoverer.go

+// |------------------------------------------------------------------------------|
+// min max
+//
+// The first MTU probe packet will have size (min+max)/2.


This follows the Binary Candidate Sequence as outlined in the QUIC Path MTU paper. While it performs good, the authors conclude that the OptBinary variant is preferable:

We prefer the optimistic variant of Binary, because it selects
the upper bound of PMTU candidates first and, therewith, has the
potential to immediately either find the best PMTU estimation or
trigger a PTB message. The results of the simulations show that
both algorithms perform well even with external events like data
transmission or packet loss.

bt90 · 2024-06-03T11:15:00Z

I assume that caching PMTU estimates is not currently feasible as long as we leave endpoint IP resolution to the go stdlib as outlined in #3772 (comment) ?

bt90 · 2024-06-03T11:25:47Z

We could also try to derive the upper limit of the MTU by examining the network interfaces. The highest MTU of a non-loopback interface that is considered up would be the upper bound. If that fails, we can always fall back on the fixed upper bound of, say, 1500.

e.g: the server has a single NIC with a MTU of 1400

marten-seemann · 2024-06-04T09:09:59Z

We could also try to derive the upper limit of the MTU by examining the network interfaces. The highest MTU of a non-loopback interface that is considered up would be the upper bound. If that fails, we can always fall back on the fixed upper bound of, say, 1500.

e.g: the server has a single NIC with a MTU of 1400

We currently don't support any packet sizes above 1452 bytes anyway. This mostly due to DoS protection of the STREAM frame sorter, but we'll need to find a better solution for that once we implement GRO support (#3838).

bt90 · 2024-06-04T13:35:01Z

We currently don't support any packet sizes above 1452 bytes anyway.

The intention is rather to avoid starting the discovery with a max that's too high because it won't even pass the NIC of the server.

bt90 · 2024-06-12T09:10:02Z

This mostly due to DoS protection of the STREAM frame sorter,

Is this related to #3471 ?

marten-seemann force-pushed the loss-resistant-mtu-discovery branch from dd12c30 to f6496a0 Compare June 2, 2024 06:20

make Path MTU Discovery resilient to random packet loss

b593c5e

marten-seemann force-pushed the loss-resistant-mtu-discovery branch from f6496a0 to b593c5e Compare June 2, 2024 06:34

marten-seemann linked an issue Jun 2, 2024 that may be closed by this pull request

Path MTU Discovery is highly affected by packet loss #4520

Closed

marten-seemann mentioned this pull request Jun 2, 2024

Path MTU Discovery is highly affected by packet loss #4520

Closed

bt90 reviewed Jun 2, 2024

View reviewed changes

bt90 reviewed Jun 3, 2024

View reviewed changes

marten-seemann merged commit 0db3544 into master Jun 4, 2024
34 checks passed

marten-seemann deleted the loss-resistant-mtu-discovery branch June 4, 2024 09:16

marten-seemann mentioned this pull request Jun 4, 2024

introduce a minimum step size of Path MTU Discovery #4552

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make Path MTU Discovery resilient to random packet loss #4545

make Path MTU Discovery resilient to random packet loss #4545

marten-seemann commented Jun 2, 2024 •

edited

codecov bot commented Jun 2, 2024 •

edited

bt90 Jun 2, 2024

marten-seemann Jun 3, 2024

bt90 Jun 3, 2024

marten-seemann Jun 4, 2024

marten-seemann Jun 4, 2024

bt90 Jun 3, 2024

bt90 commented Jun 3, 2024

bt90 commented Jun 3, 2024 •

edited

marten-seemann commented Jun 4, 2024

bt90 commented Jun 4, 2024

bt90 commented Jun 12, 2024

make Path MTU Discovery resilient to random packet loss #4545

make Path MTU Discovery resilient to random packet loss #4545

Conversation

marten-seemann commented Jun 2, 2024 • edited

codecov bot commented Jun 2, 2024 • edited

Codecov Report

bt90 Jun 2, 2024

Choose a reason for hiding this comment

marten-seemann Jun 3, 2024

Choose a reason for hiding this comment

bt90 Jun 3, 2024

Choose a reason for hiding this comment

marten-seemann Jun 4, 2024

Choose a reason for hiding this comment

marten-seemann Jun 4, 2024

Choose a reason for hiding this comment

bt90 Jun 3, 2024

Choose a reason for hiding this comment

bt90 commented Jun 3, 2024

bt90 commented Jun 3, 2024 • edited

marten-seemann commented Jun 4, 2024

bt90 commented Jun 4, 2024

bt90 commented Jun 12, 2024

marten-seemann commented Jun 2, 2024 •

edited

codecov bot commented Jun 2, 2024 •

edited

bt90 commented Jun 3, 2024 •

edited