Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content-length validation does not handle spaces #3321

Open
chrisstaite-menlo opened this issue Sep 13, 2023 · 7 comments
Open

Content-length validation does not handle spaces #3321

chrisstaite-menlo opened this issue Sep 13, 2023 · 7 comments

Comments

@chrisstaite-menlo
Copy link

chrisstaite-menlo commented Sep 13, 2023

Additional validation of Content-Length parsing was introduced in bf90f3a however, the value is not striped and therefore a value of '0 ' causes a ValueError.

This is a particular issue because if Content-Length is the last header in a request and parse_line is being called then the \r\n of the end of the header is interpreted as a multi-line continuation and appends the space to the end in HTTPHeaders.parse_line: new_part = " " + line.lstrip().

@bdarnell

@bdarnell
Copy link
Member

Can you say more about how exactly this happens? It's true that we don't strip the value when parsing content-length, but it's supposed to already be stripped in the last line of HTTPHeaders.parse_line.

The \r\n is not supposed to make it to parse_line; those characters are handled in parse(). I don't see an issue when Content-Length is the last header: we have a test for this case at

>>> h = HTTPHeaders.parse("Content-Type: text/html\\r\\nContent-Length: 42\\r\\n")
.

I do see a couple of potential issues in edge cases, though.

  • Content-Length: 42\r\n \r\n (with a space between the CRLF pairs) will add a space to the value "42 "
  • Content-Length:\r\n 42\r\n (with the whole value in a continuation line) adds a leading space, " 42"

Both of these cases are errors now although they were accepted prior to bf90f3a. I think they're both technically legal although I'd have to go back to the RFCs to be sure.

@chrisstaite-menlo
Copy link
Author

We had some code that was manually proxying headers from an upstream request to a response that was pushing all of the lines passed to a AsyncHTTPClient.fetch header_callback to parse_line that triggered this.

@kenballus
Copy link

I just tested sending a request with a Content-Length of 0 , and it worked totally fine. Can you enter an example of a request that causes the problem?

@chrisstaite-menlo
Copy link
Author

The Content-Length needs to be the last header which then gets interpreted as a multi-line continuation and then adds a space itself, as stated in the first message.

@kenballus
Copy link

kenballus commented Jan 31, 2024

Got it; now I can reproduce the bug. Agreed that this is a problem.

Also, it turns out that gunicorn and fasthttp also have this exact same bug.

@bdarnell
Copy link
Member

bdarnell commented Mar 3, 2024

Got it; now I can reproduce the bug. Agreed that this is a problem.

I'm still not clear on what exactly the problem is. Is there an issue with HTTPHeaders.parse() or only with parse_line()? Internally, Tornado only uses parse_line() inside parse() and in curl_httpclient's header callback.

I see that there's a design mismatch in the interfaces of header_callback and parse_line: the former gives you the newlines, while parse_line expects them to be removed (this isn't formally specified but it's implied by the doctest). So you can't actually pass the values from header_callback directly to parse_line, even though this is superficially a reasonable thing to do.

There's also a couple of weird edge cases I noted at the bottom of #3321 (comment)

Does that cover everything or am I missing something?

Solutions to the design mismatch include:

  • Working as intended, just needs better docs
  • Deprecate header_callback in AsyncHTTPClient.fetch and replace it with a separate callback that gives you a pre-parsed HTTPHeaders object. We need a callback that gives you headers before the first streaming chunk, but doing it with raw header lines just pushes unnecessary work into the application.
  • Make parse_line able to handle newlines. This almost works (by accident) because simple headers get stripped, but continuation lines can cause extraneous whitespace.

@bdarnell
Copy link
Member

bdarnell commented Jun 5, 2024

Aha, now I see the problem. Single-line headers have leading and trailing whitespace stripped, while continuation lines make it possible to construct a header with trailing whitespace, potentially confusing users of that header. RFC 9110 is clear that trailing whitespace should be stripped from header values. I'm going to:

  1. Make continuation lines containing only whitespace an error. The parse_line interface doesn't let us handle this properly (we must preserve internal space but strip trailing space, and we can't tell in the line-by-line interface whether we're looking at a middle line or the last one of a header)
  2. Handle newlines in parse_line, specifically so that lines containing only newlines are no-ops. This fixes the way that the last header gets a trailing space if you use parse_line directly instead of parse
  3. Emit a deprecation warning on continuation lines. There should be no reason to support this feature any more and we should get rid of it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants