-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TLS close_notify handler to http server #109899
Conversation
Pinging @elastic/es-distributed (Team:Distributed) |
var cert = getDataPath("/org/elasticsearch/xpack/security/transport/ssl/certs/simple/testnode.crt"); | ||
var pk = getDataPath("/org/elasticsearch/xpack/security/transport/ssl/certs/simple/testnode.pem"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha nice find! Although it seems hardly ideal to be using a cert which exposed in 2019...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to use ssc = new SelfSignedCertificate();
from netty, it relies on bouncy-castle, but we dont have it in dependencies. Should I update deps or static cert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have good reasons not to rely on bouncy castle in prod code but it seems like it should be ok for tests IMO. If it's easy to add then please do so and we can raise it for discussion as the PR nears completion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made netty's self-signed-cert work with few security suppressions and bounce-castle test dependnecy
@ESTestCase.WithoutSecurityManager
@SuppressForbidden(reason = "requires java.io.File for netty self-signed certificate")
|
||
import javax.net.ssl.SSLException; | ||
|
||
public class SecurityNetty4HttpServerTransportCloseNotifyTests extends AbstractHttpServerTransportTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests make sense, but I think we'll also need some integ tests showing how this interacts with the rest of the system, particularly the task cancellation mechanism (which it doesn't today, but it'll need to).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still thinking should we cancel or not. If we serve all requests while channel is writable then we dont need additional handling for cancelation. If channel is not writable, client decided not to wait, it will go through exception path, which exists today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it is certainly possible to keep serving responses as you propose, it's also legitimate to cancel tasks and send the resulting exception responses, or indeed to just tear down the connection without sending any of the remaining responses, so this is more of a design question than a technical absolute.
I couldn't find any clients which might send a close_notify
while awaiting responses - AFAICT this only happens on the abort path or when dropping an idle connection from a pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, looking further into HTTP and TLS RFC's closing connection on close_notify
make sense. Serving all responses and closing connection should be handled with Connection: close
header in HTTP1.
In TLS prior to v1.3 close_notify
should discard all pending writes and close connection. TLS v1.3 tries to address truncation attack by letting application layer to figure out how to close connection, we probably want to handle logout requests (oidc, saml) fully without truncation. I think in scope of this PR closing connection should be a good step forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think truncation attacks are a possibility here, we always send either a Content-length
header or else a properly-terminated chunked-encoded message so clients can tell if a response is incomplete at the HTTP layer. But this isn't the right place to discuss this more deeply - if you think there's a security issue in this area then please mail [email protected] with details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no cancellation now,
I think task cancellation should be hooked-up by default with this implementation. Since you close the channel, that will propagate through the listener framework and cancel tasks (I think).
Netty4HttpChannel#addCloseListener
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Is it worth to test cancellation in integ test? If yes, what rest operation would be a good example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Is it worth to test cancellation in integ test?
I will take another look at this PR later today.
But IMO, if "HTTP Close triggers Task cancellation" triggers task cancellation is already tested, then this PR only needs to validate the "SSL client close_notify" leads to channel close.
However, I don't know if that test exists. My initial perusal of the code is that we do not have this test at an HTTP level (which we should). I would say this PR should implement it if it is missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would look in RestActionCancellationIT
and other classes in that qa package (SearchRestCancellationIT
, ClusterHealthRestCancellationIT
, etc).
It looks like these are testing task cancellation through the http layer. I would do a bit of investigation and look at how the "cancellation" is implemented. I think it might be through closing the http connection. But that appears to be a bit deep in the apache code and we should confirm it.
If those tests are sufficient then I think we are covered at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added action cancellation test.
RestActionCancellationIT
and SearchRestCancellationIT
call request.abort() which indeed close connection.
With help of neat plugin CancellableActionTestPlugin
I found it's pretty easy to add integ test here for action cancellation - testSendCloseNotifyCancelAction
...ransport-netty4/src/main/java/org/elasticsearch/http/netty4/Netty4HttpPipeliningHandler.java
Outdated
Show resolved
Hide resolved
if (evt instanceof SslCloseCompletionEvent closeEvent) { | ||
if (closeEvent.isSuccess() && ctx.channel().isActive()) { | ||
logger.trace("received TLS close_notify, closing connection {}", ctx.channel()); | ||
ctx.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a minor preference to use:
ctx.channel().close();
to ensure that the close initiates from the tail of the pipeline (should we added handlers after this in the future).
|
||
@Override | ||
public void dispatchRequest(RestRequest request, RestChannel channel, ThreadContext threadContext) { | ||
reqQueue.add(new ReqCtx(request, channel, threadContext)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not investigated the test failure much. But I'm pretty sure these are going to need to be released at the end of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or else the leak detector will fire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good. Just one small comment and a note about a failing test.
Thanks @Tim-Brooks for the feedback! I fixed leaking buffers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Some suggestions from a post-merge review of elastic#109899: - Reduce scope of `SuppressForbidden` to a single method. - Reinstate security manager. - Remove hard-coded BouncyCastle version. - Sometimes run on multi-node cluster. - Use `SAFE_AWAIT_TIMEOUT` for waits. - Remove unnecessary `assertBusy()`
This PR addresses issue when HTTP client sends TLS close_notify alert and wait for response from our HTTP server.
But we dont handle close_notify alerts today, only full connection termination. Some clients might hang for a while until other timeouts fire up.
In this change I introduce an event listener for SslCloseCompletionEvent in the Netty4HttpPipeliningHandler, that handles close_notify alerts. When we receive alert we will close connection immediately. It might introduce server response truncation, but we rely on client to send close_notify when no more data is expected from the server.
Added warning logging, can be spotted now in unit and integ tests
Fixes #76642