fix(fetch): Correctly decode `multipart/form-data` names and filenames #19145

andreubotella · 2023-05-16T11:56:37Z

Currently the multipart/form-data parser in Request.prototype.formData and Response.prototype.formData decodes non-ASCII filenames incorrectly, as if they were encoded in Latin-1 rather than UTF-8. This happens because the header section of each multipart/form-data entry is decoded as Latin-1 in order to be parsed with Headers, which only allows ByteStrings, but the names and filenames are never decoded correctly. This PR fixes this as a post-processing step.

Note that the multipart/form-data parsing for this APIs in the Fetch spec is very much underspecified, and it does not specify that names and filenames must be decoded as UTF-8. However, it does require that the bodies of non-File entries are decoded as UTF-8, and in browsers, names and filenames always use the same encoding as the body.

Closes #19142.

Currently the `multipart/form-data` parser in `Request.prototype.formData` and `Response.prototype.formData` decodes non-ASCII filenames incorrectly, instead decoding them as if they were Latin-1. This happens because the header section of each `multipart/form-data` entry is decoded as Latin-1 in order to be parsed with `Headers`, which only allows `ByteString`s, but the names and filenames are never decoded correctly. This PR fixes this as a post-processing step. Note that the `multipart/form-data` parsing for this APIs is very much underspecified, and it does not specify that names and filenames must be decoded as UTF-8. However, it does require that the bodies of non-`File` entries are decoded as UTF-8, and in browsers, names and filenames will be encoded with the same encoding as the body. Closes denoland#19142.

bartlomieju

LGTM, thanks Andreu!

#19145) Currently the `multipart/form-data` parser in `Request.prototype.formData` and `Response.prototype.formData` decodes non-ASCII filenames incorrectly, as if they were encoded in Latin-1 rather than UTF-8. This happens because the header section of each `multipart/form-data` entry is decoded as Latin-1 in order to be parsed with `Headers`, which only allows `ByteString`s, but the names and filenames are never decoded correctly. This PR fixes this as a post-processing step. Note that the `multipart/form-data` parsing for this APIs in the Fetch spec is very much underspecified, and it does not specify that names and filenames must be decoded as UTF-8. However, it does require that the bodies of non-`File` entries are decoded as UTF-8, and in browsers, names and filenames always use the same encoding as the body. Closes #19142.

bartlomieju approved these changes May 16, 2023

View reviewed changes

bartlomieju merged commit 9ba2c4c into denoland:main May 16, 2023

andreubotella deleted the multipart-utf8-names branch May 16, 2023 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(fetch): Correctly decode `multipart/form-data` names and filenames #19145

fix(fetch): Correctly decode `multipart/form-data` names and filenames #19145

andreubotella commented May 16, 2023 •

edited

Loading

bartlomieju left a comment

fix(fetch): Correctly decode multipart/form-data names and filenames #19145

fix(fetch): Correctly decode multipart/form-data names and filenames #19145

Conversation

andreubotella commented May 16, 2023 • edited Loading

bartlomieju left a comment

Choose a reason for hiding this comment

fix(fetch): Correctly decode `multipart/form-data` names and filenames #19145

fix(fetch): Correctly decode `multipart/form-data` names and filenames #19145

andreubotella commented May 16, 2023 •

edited

Loading