Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TLS close_notify handler to http server #109899

Merged
merged 19 commits into from
Jun 28, 2024

Conversation

mhl-b
Copy link
Contributor

@mhl-b mhl-b commented Jun 19, 2024

"close_notify" alert is used to indicate orderly closure of one
direction of the connection. Upon receiving such an alert, the TLS
implementation SHOULD indicate end-of-data to the application.

This PR addresses issue when HTTP client sends TLS close_notify alert and wait for response from our HTTP server.
But we dont handle close_notify alerts today, only full connection termination. Some clients might hang for a while until other timeouts fire up.

In this change I introduce an event listener for SslCloseCompletionEvent in the Netty4HttpPipeliningHandler, that handles close_notify alerts. When we receive alert we will close connection immediately. It might introduce server response truncation, but we rely on client to send close_notify when no more data is expected from the server.

Added warning logging, can be spotted now in unit and integ tests

[2024-06-25T21:08:49,310][WARN ][o.e.h.n.Netty4HttpPipeliningHandler] [node_t1] received TLS close_notify, closing connection [id: 0xfb96648b, L:/[0:0:0:0:0:0:0:1]:13484 - R:/[0:0:0:0:0:0:0:1]:63937]

Fixes #76642

@mhl-b mhl-b added v8.15.0 :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team >bug labels Jun 19, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

Comment on lines 89 to 90
var cert = getDataPath("/org/elasticsearch/xpack/security/transport/ssl/certs/simple/testnode.crt");
var pk = getDataPath("/org/elasticsearch/xpack/security/transport/ssl/certs/simple/testnode.pem");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha nice find! Although it seems hardly ideal to be using a cert which exposed in 2019...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use ssc = new SelfSignedCertificate(); from netty, it relies on bouncy-castle, but we dont have it in dependencies. Should I update deps or static cert?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have good reasons not to rely on bouncy castle in prod code but it seems like it should be ok for tests IMO. If it's easy to add then please do so and we can raise it for discussion as the PR nears completion.

Copy link
Contributor Author

@mhl-b mhl-b Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made netty's self-signed-cert work with few security suppressions and bounce-castle test dependnecy

@ESTestCase.WithoutSecurityManager
@SuppressForbidden(reason = "requires java.io.File for netty self-signed certificate")


import javax.net.ssl.SSLException;

public class SecurityNetty4HttpServerTransportCloseNotifyTests extends AbstractHttpServerTransportTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests make sense, but I think we'll also need some integ tests showing how this interacts with the rest of the system, particularly the task cancellation mechanism (which it doesn't today, but it'll need to).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still thinking should we cancel or not. If we serve all requests while channel is writable then we dont need additional handling for cancelation. If channel is not writable, client decided not to wait, it will go through exception path, which exists today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is certainly possible to keep serving responses as you propose, it's also legitimate to cancel tasks and send the resulting exception responses, or indeed to just tear down the connection without sending any of the remaining responses, so this is more of a design question than a technical absolute.

I couldn't find any clients which might send a close_notify while awaiting responses - AFAICT this only happens on the abort path or when dropping an idle connection from a pool.

Copy link
Contributor Author

@mhl-b mhl-b Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, looking further into HTTP and TLS RFC's closing connection on close_notify make sense. Serving all responses and closing connection should be handled with Connection: close header in HTTP1.

In TLS prior to v1.3 close_notify should discard all pending writes and close connection. TLS v1.3 tries to address truncation attack by letting application layer to figure out how to close connection, we probably want to handle logout requests (oidc, saml) fully without truncation. I think in scope of this PR closing connection should be a good step forward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think truncation attacks are a possibility here, we always send either a Content-length header or else a properly-terminated chunked-encoded message so clients can tell if a response is incomplete at the HTTP layer. But this isn't the right place to discuss this more deeply - if you think there's a security issue in this area then please mail [email protected] with details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is no cancellation now,

I think task cancellation should be hooked-up by default with this implementation. Since you close the channel, that will propagate through the listener framework and cancel tasks (I think).

Netty4HttpChannel#addCloseListener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Is it worth to test cancellation in integ test? If yes, what rest operation would be a good example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Is it worth to test cancellation in integ test?

I will take another look at this PR later today.

But IMO, if "HTTP Close triggers Task cancellation" triggers task cancellation is already tested, then this PR only needs to validate the "SSL client close_notify" leads to channel close.

However, I don't know if that test exists. My initial perusal of the code is that we do not have this test at an HTTP level (which we should). I would say this PR should implement it if it is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would look in RestActionCancellationIT and other classes in that qa package (SearchRestCancellationIT, ClusterHealthRestCancellationIT, etc).

It looks like these are testing task cancellation through the http layer. I would do a bit of investigation and look at how the "cancellation" is implemented. I think it might be through closing the http connection. But that appears to be a bit deep in the apache code and we should confirm it.

If those tests are sufficient then I think we are covered at the moment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added action cancellation test.

RestActionCancellationIT and SearchRestCancellationIT call request.abort() which indeed close connection.
With help of neat plugin CancellableActionTestPlugin I found it's pretty easy to add integ test here for action cancellation - testSendCloseNotifyCancelAction

@mhl-b mhl-b requested a review from a team as a code owner June 21, 2024 23:43
@mhl-b mhl-b removed the request for review from a team June 22, 2024 00:14
if (evt instanceof SslCloseCompletionEvent closeEvent) {
if (closeEvent.isSuccess() && ctx.channel().isActive()) {
logger.trace("received TLS close_notify, closing connection {}", ctx.channel());
ctx.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor preference to use:

ctx.channel().close();

to ensure that the close initiates from the tail of the pipeline (should we added handlers after this in the future).


@Override
public void dispatchRequest(RestRequest request, RestChannel channel, ThreadContext threadContext) {
reqQueue.add(new ReqCtx(request, channel, threadContext));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not investigated the test failure much. But I'm pretty sure these are going to need to be released at the end of the test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or else the leak detector will fire.

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. Just one small comment and a note about a failing test.

@mhl-b
Copy link
Contributor Author

mhl-b commented Jun 28, 2024

Looks pretty good. Just one small comment and a note about a failing test.

Thanks @Tim-Brooks for the feedback! I fixed leaking buffers.

Copy link
Contributor

@Tim-Brooks Tim-Brooks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhl-b mhl-b added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 28, 2024
@elasticsearchmachine elasticsearchmachine merged commit 9b6cca1 into elastic:main Jun 28, 2024
15 checks passed
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jun 30, 2024
Some suggestions from a post-merge review of elastic#109899:

- Reduce scope of `SuppressForbidden` to a single method.
- Reinstate security manager.
- Remove hard-coded BouncyCastle version.
- Sometimes run on multi-node cluster.
- Use `SAFE_AWAIT_TIMEOUT` for waits.
- Remove unnecessary `assertBusy()`
ywangd added a commit to ywangd/elasticsearch that referenced this pull request Jul 2, 2024
ywangd added a commit that referenced this pull request Jul 2, 2024
elasticsearchmachine pushed a commit that referenced this pull request Jul 2, 2024
Some suggestions from a post-merge review of #109899:

- Sometimes run on multi-node cluster.
- Use `SAFE_AWAIT_TIMEOUT` for waits.
- Remove unnecessary `assertBusy()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

close_notify not sent when incoming connection is half-closed and quiescent when using TLS1.3
4 participants