Content-length validation does not handle spaces #3321

chrisstaite-menlo · 2023-09-13T10:02:58Z

Additional validation of Content-Length parsing was introduced in bf90f3a however, the value is not striped and therefore a value of '0 ' causes a ValueError.

This is a particular issue because if Content-Length is the last header in a request and parse_line is being called then the \r\n of the end of the header is interpreted as a multi-line continuation and appends the space to the end in HTTPHeaders.parse_line: new_part = " " + line.lstrip().

@bdarnell

The text was updated successfully, but these errors were encountered:

bdarnell · 2023-09-24T18:27:01Z

Can you say more about how exactly this happens? It's true that we don't strip the value when parsing content-length, but it's supposed to already be stripped in the last line of HTTPHeaders.parse_line.

The \r\n is not supposed to make it to parse_line; those characters are handled in parse(). I don't see an issue when Content-Length is the last header: we have a test for this case at

tornado/tornado/httputil.py

Line 188 in a48d634

    
                   >>> h = HTTPHeaders.parse("Content-Type: text/html\\r\\nContent-Length: 42\\r\\n")

.

I do see a couple of potential issues in edge cases, though.

Content-Length: 42\r\n \r\n (with a space between the CRLF pairs) will add a space to the value "42 "
Content-Length:\r\n 42\r\n (with the whole value in a continuation line) adds a leading space, " 42"

Both of these cases are errors now although they were accepted prior to bf90f3a. I think they're both technically legal although I'd have to go back to the RFCs to be sure.

chrisstaite-menlo · 2023-09-27T08:09:53Z

We had some code that was manually proxying headers from an upstream request to a response that was pushing all of the lines passed to a AsyncHTTPClient.fetch header_callback to parse_line that triggered this.

kenballus · 2024-01-31T09:11:34Z

I just tested sending a request with a Content-Length of 0 , and it worked totally fine. Can you enter an example of a request that causes the problem?

chrisstaite-menlo · 2024-01-31T09:17:48Z

The Content-Length needs to be the last header which then gets interpreted as a multi-line continuation and then adds a space itself, as stated in the first message.

kenballus · 2024-01-31T16:45:41Z

Got it; now I can reproduce the bug. Agreed that this is a problem.

Also, it turns out that gunicorn and fasthttp also have this exact same bug.

bdarnell · 2024-03-03T16:01:47Z

Got it; now I can reproduce the bug. Agreed that this is a problem.

I'm still not clear on what exactly the problem is. Is there an issue with HTTPHeaders.parse() or only with parse_line()? Internally, Tornado only uses parse_line() inside parse() and in curl_httpclient's header callback.

I see that there's a design mismatch in the interfaces of header_callback and parse_line: the former gives you the newlines, while parse_line expects them to be removed (this isn't formally specified but it's implied by the doctest). So you can't actually pass the values from header_callback directly to parse_line, even though this is superficially a reasonable thing to do.

There's also a couple of weird edge cases I noted at the bottom of #3321 (comment)

Does that cover everything or am I missing something?

Solutions to the design mismatch include:

Working as intended, just needs better docs
Deprecate header_callback in AsyncHTTPClient.fetch and replace it with a separate callback that gives you a pre-parsed HTTPHeaders object. We need a callback that gives you headers before the first streaming chunk, but doing it with raw header lines just pushes unnecessary work into the application.
Make parse_line able to handle newlines. This almost works (by accident) because simple headers get stripped, but continuation lines can cause extraneous whitespace.

bdarnell · 2024-06-05T14:37:21Z

Aha, now I see the problem. Single-line headers have leading and trailing whitespace stripped, while continuation lines make it possible to construct a header with trailing whitespace, potentially confusing users of that header. RFC 9110 is clear that trailing whitespace should be stripped from header values. I'm going to:

Make continuation lines containing only whitespace an error. The parse_line interface doesn't let us handle this properly (we must preserve internal space but strip trailing space, and we can't tell in the line-by-line interface whether we're looking at a middle line or the last one of a header)
Handle newlines in parse_line, specifically so that lines containing only newlines are no-ops. This fixes the way that the last header gets a trailing space if you use parse_line directly instead of parse
Emit a deprecation warning on continuation lines. There should be no reason to support this feature any more and we should get rid of it in the future.

bdarnell added the http1connection label Jun 13, 2024

Nirab123456 mentioned this issue Oct 20, 2024

Enhanced Trailing Whitespace Handling in HTTP Headers #3429

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content-length validation does not handle spaces #3321

Content-length validation does not handle spaces #3321

chrisstaite-menlo commented Sep 13, 2023 •

edited

Loading

bdarnell commented Sep 24, 2023

chrisstaite-menlo commented Sep 27, 2023

kenballus commented Jan 31, 2024

chrisstaite-menlo commented Jan 31, 2024

kenballus commented Jan 31, 2024 •

edited

Loading

bdarnell commented Mar 3, 2024

bdarnell commented Jun 5, 2024

Content-length validation does not handle spaces #3321

Content-length validation does not handle spaces #3321

Comments

chrisstaite-menlo commented Sep 13, 2023 • edited Loading

bdarnell commented Sep 24, 2023

chrisstaite-menlo commented Sep 27, 2023

kenballus commented Jan 31, 2024

chrisstaite-menlo commented Jan 31, 2024

kenballus commented Jan 31, 2024 • edited Loading

bdarnell commented Mar 3, 2024

bdarnell commented Jun 5, 2024

chrisstaite-menlo commented Sep 13, 2023 •

edited

Loading

kenballus commented Jan 31, 2024 •

edited

Loading