Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(fetch): Correctly decode multipart/form-data names and filenames #19145

Merged
merged 1 commit into from
May 16, 2023

Conversation

andreubotella
Copy link
Contributor

@andreubotella andreubotella commented May 16, 2023

Currently the multipart/form-data parser in Request.prototype.formData and Response.prototype.formData decodes non-ASCII filenames incorrectly, as if they were encoded in Latin-1 rather than UTF-8. This happens because the header section of each multipart/form-data entry is decoded as Latin-1 in order to be parsed with Headers, which only allows ByteStrings, but the names and filenames are never decoded correctly. This PR fixes this as a post-processing step.

Note that the multipart/form-data parsing for this APIs in the Fetch spec is very much underspecified, and it does not specify that names and filenames must be decoded as UTF-8. However, it does require that the bodies of non-File entries are decoded as UTF-8, and in browsers, names and filenames always use the same encoding as the body.

Closes #19142.

Currently the `multipart/form-data` parser in
`Request.prototype.formData` and `Response.prototype.formData` decodes
non-ASCII filenames incorrectly, instead decoding them as if they were
Latin-1. This happens because the header section of each
`multipart/form-data` entry is decoded as Latin-1 in order to be
parsed with `Headers`, which only allows `ByteString`s, but the names
and filenames are never decoded correctly. This PR fixes this as a
post-processing step.

Note that the `multipart/form-data` parsing for this APIs is very much
underspecified, and it does not specify that names and filenames must
be decoded as UTF-8. However, it does require that the bodies of
non-`File` entries are decoded as UTF-8, and in browsers, names and
filenames will be encoded with the same encoding as the body.

Closes denoland#19142.
Copy link
Member

@bartlomieju bartlomieju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Andreu!

@bartlomieju bartlomieju merged commit 9ba2c4c into denoland:main May 16, 2023
@andreubotella andreubotella deleted the multipart-utf8-names branch May 16, 2023 15:50
levex pushed a commit that referenced this pull request May 18, 2023
#19145)

Currently the `multipart/form-data` parser in
`Request.prototype.formData` and `Response.prototype.formData` decodes
non-ASCII filenames incorrectly, as if they were encoded in Latin-1
rather than UTF-8. This happens because the header section of each
`multipart/form-data` entry is decoded as Latin-1 in order to be parsed
with `Headers`, which only allows `ByteString`s, but the names and
filenames are never decoded correctly. This PR fixes this as a
post-processing step.

Note that the `multipart/form-data` parsing for this APIs in the Fetch
spec is very much underspecified, and it does not specify that names and
filenames must be decoded as UTF-8. However, it does require that the
bodies of non-`File` entries are decoded as UTF-8, and in browsers,
names and filenames always use the same encoding as the body.

Closes #19142.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request#formData corrupts non-ASCII file names
2 participants