Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore responses with unexpected IDs #1155

Merged
merged 6 commits into from
Oct 18, 2020
Merged

Ignore responses with unexpected IDs #1155

merged 6 commits into from
Oct 18, 2020

Conversation

AGWA
Copy link
Contributor

@AGWA AGWA commented Sep 2, 2020

This fixes the following problem:

At time 0, we send a query with ID X from port P.

At time T, we time out the query due to lack of response, and then send a different query with ID Y. By coincidence, the new query is sent from the same port number P (since port numbers are only 16 bits, this can happen with non-negligible probability when making queries at a high rate).

At time T+epsilon, we receive a response to the original query. Since the ID in this response is X, not Y, exchange returns ErrId, preventing the second query from succeeding.

Since UDP does not have connections, there's no guarantee that a response that we receive is actually associated with the query that we sent - the ID is needed to make the association. Therefore, it's more appropriate to ignore replies with unexpected IDs that to consider them an error. Accordingly, this pull request changes exchange to ignore responses with mismatched IDs. exchange continues waiting for a response with the expected ID and returns it when it arrives.

@codecov-commenter
Copy link

codecov-commenter commented Sep 2, 2020

Codecov Report

Merging #1155 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1155   +/-   ##
=======================================
  Coverage   55.89%   55.89%           
=======================================
  Files          41       41           
  Lines       10141    10142    +1     
=======================================
+ Hits         5668     5669    +1     
  Misses       3468     3468           
  Partials     1005     1005           
Impacted Files Coverage Δ
client.go 70.20% <100.00%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efdec21...ddbee29. Read the comment docs.

@miekg
Copy link
Owner

miekg commented Sep 3, 2020 via email

client_test.go Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Oct 11, 2020

Codecov Report

Merging #1155 into master will decrease coverage by 0.76%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1155      +/-   ##
==========================================
- Coverage   55.89%   55.12%   -0.77%     
==========================================
  Files          41       42       +1     
  Lines       10141     9581     -560     
==========================================
- Hits         5668     5282     -386     
+ Misses       3468     3274     -194     
- Partials     1005     1025      +20     
Impacted Files Coverage Δ
client.go 68.92% <100.00%> (-1.13%) ⬇️
update.go 58.97% <0.00%> (-12.96%) ⬇️
duplicate.go 28.57% <0.00%> (-8.93%) ⬇️
msg_helpers.go 59.29% <0.00%> (-8.92%) ⬇️
serve_mux.go 70.27% <0.00%> (-5.29%) ⬇️
listen_go111.go 73.33% <0.00%> (-4.45%) ⬇️
zmsg.go 36.11% <0.00%> (-3.93%) ⬇️
acceptfunc.go 42.85% <0.00%> (-3.81%) ⬇️
xfr.go 29.45% <0.00%> (-3.17%) ⬇️
msg.go 73.80% <0.00%> (-2.73%) ⬇️
... and 33 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update efdec21...aad691a. Read the comment docs.

client_test.go Outdated Show resolved Hide resolved
client.go Outdated Show resolved Hide resolved
@AGWA
Copy link
Contributor Author

AGWA commented Oct 13, 2020

I've made the requested changes. Please take another look.

Copy link
Collaborator

@tmthrgd tmthrgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty much perfect to me and definitely the right approach IMO. LGTM.

@tmthrgd
Copy link
Collaborator

tmthrgd commented Oct 14, 2020

errors.As breaks the really ancient Travis CI build. I'll open a PR to bump the tested version and then this can be merged after that.

@tmthrgd tmthrgd linked an issue Oct 14, 2020 that may be closed by this pull request
@miekg
Copy link
Owner

miekg commented Oct 14, 2020

this needs a rebase to pick up the travis changes

Copy link
Collaborator

@tmthrgd tmthrgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase + fix minor nit please.

client_test.go Outdated Show resolved Hide resolved
This fixes the following problem:

At time 0, we send a query with ID X from port P.

At time T, we time out the query due to lack of response, and then send
a different query with ID Y.  By coincidence, the new query is sent from
the same port number P (since port numbers are only 16 bits, this can happen
with non-negligible probability when making queries at a high rate).

At time T+epsilon, we receive a response to the original query.
Since the ID in this response is X, not Y, we would previously return
ErrId, preventing the second query from succeeding.

With this commit, we simply ignore the response with the mismatched ID
and return once we receive the response with the correct ID.
The new test sends two replies: the first one has a bad ID, which should
be ignored, and the second one has the correct ID.
@AGWA
Copy link
Contributor Author

AGWA commented Oct 18, 2020

Should be good now. Please take another look.

Copy link
Collaborator

@tmthrgd tmthrgd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@miekg miekg merged commit a433fbe into miekg:master Oct 18, 2020
aanm pushed a commit to cilium/dns that referenced this pull request Jul 29, 2022
* Ignore replies with unexpected IDs

This fixes the following problem:

At time 0, we send a query with ID X from port P.

At time T, we time out the query due to lack of response, and then send
a different query with ID Y.  By coincidence, the new query is sent from
the same port number P (since port numbers are only 16 bits, this can happen
with non-negligible probability when making queries at a high rate).

At time T+epsilon, we receive a response to the original query.
Since the ID in this response is X, not Y, we would previously return
ErrId, preventing the second query from succeeding.

With this commit, we simply ignore the response with the mismatched ID
and return once we receive the response with the correct ID.

* Update test for bad ID

The new test sends two replies: the first one has a bad ID, which should
be ignored, and the second one has the correct ID.

* Add test to ensure query times out when server returns bad ID

* Avoid use of error string matching in test case

* Check for mismatched query IDs when using TCP

* Reduce timeout in TestClientSyncBadID
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spurious "id mismatch" errors when making high volume of requests
5 participants