Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How are conflicts between allowReserved and RFC 3986 handled? #3759

Closed
handrews opened this issue Apr 26, 2024 · 7 comments
Closed

How are conflicts between allowReserved and RFC 3986 handled? #3759

handrews opened this issue Apr 26, 2024 · 7 comments
Assignees
Labels
clarification requests to clarify, but not change, part of the spec param serialization Issues related to parameter and/or header serialization
Milestone

Comments

@handrews
Copy link
Member

The allowReserved keyword in the Parameter Object is specified as follows (from 3.1.0):

Determines whether the parameter value SHOULD allow reserved characters, as defined by [RFC3986] :/?#[]@!$&'()*+,;= to be included without percent-encoding. This property only applies to parameters with an in value of query. The default value is false.

This is strange in several ways. Most of those characters are not forbidden in query strings. Here is the relevant ABNF:

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

   query         = *( pchar / "/" / "?" )

   fragment      = *( pchar / "/" / "?" )

   pct-encoded   = "%" HEXDIG HEXDIG

   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

What this means is that the actual set of characters not allowed in the query string by RFC 3986 is just #, [, and ]. Everything else is allowed by either the pchar production or the query production.

It does not make sense to allow # in a query string value because it will cause everything after it to parse as a fragment. The OAS cannot change that behavior, and implying that we can is problematic at best.

While the failure mode of including [ or ] is less clear (they're only used for IPv6 or IPvFuture literals in the ABNF), it's still strange to encourage something that correctly implemented URI parsers will reject (admittedly, the percentage of URI parsers that are implemented correctly is depressingly low).

So...

  • What is the use case for allowing unencoded [ or ] in a query parameter value? How are correct URI parsers expected to handle this? Are we requiring tools to parse URIs incorrectly? We might need some guidance here - I would have no idea what to do with this, or whether to validate it in OASComply
  • I don't think we should include # here at all as it's not going to work even in poorly-implemented URI parsers. Since we can't technically remove the feature, we should put a strongly worded SHOULD NOT around it
  • Is there a reason we need to mention the other characters at all? They don't need to be encoded anyway (although sometimes people do). Are we just using this as a "no, really, please don't encode these, you don't need to anyway"?
@handrews handrews added clarification requests to clarify, but not change, part of the spec param serialization Issues related to parameter and/or header serialization labels Apr 26, 2024
@handrews handrews added this to the v3.0.4 milestone Apr 26, 2024
@handrews
Copy link
Member Author

handrews commented Apr 27, 2024

@OAI/tsc review request: see above (never mind, see below)

@karenetheridge
Copy link
Member

karenetheridge commented Apr 28, 2024

My $0.02: dump allowReserved entirely and go with the character set specified in the RFC.

How I've handled this in my implementation: I parse the URL string after it has been normalized (i.e. any characters that should become encoded are encoded; any that are encoded needlessly are decoded).

@karenetheridge
Copy link
Member

Also, using deepObject with allowReserved: true appears to be problematic: #1942 (comment)

@miqui
Copy link
Contributor

miqui commented Apr 28, 2024

Here is one of the many examples I have seen APIs being designed using RFC3986: https://developers.google.com/maps/documentation/aerial-view/web-api-best-practices?hl=en

Ofcourse there are always exceptions, but I am going IMHO for the greater good.

So, my $0.02 as well would be to comply with RC3986 and eliminate alloweReserved. Perhaps long term we should just gradually reduce these conflicts opting more for established RFC (wherever it makes more sense)

@handrews
Copy link
Member Author

handrews commented Apr 28, 2024

Ugh. Digging through the history I found OAI/learn.openapis.org#100 that indicates allowReserved corresponds to the + operator in RFC 6570. This operator controls what needs to be encoded.

However, instead of relying on RFC 3986's pchar production, which governs path segments, query strings, and fragments, they only make the distinction between allowing unreserved or (unreserved+reserved)

   segment       = *pchar
   segment-nz    = 1*pchar
   segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
                 ; non-zero-length segment without any colon ":"
   query         = *( pchar / "/" / "?" )
   fragment      = *( pchar / "/" / "?" )

   pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

   pct-encoded   = "%" HEXDIG HEXDIG
   unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
   reserved      = gen-delims / sub-delims
   gen-delims    = ":" / "/" / "?" / "#" / "[" / "]" / "@"
   sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                 / "*" / "+" / "," / ";" / "="

This seems to be because some of the sub-delims are used in the syntax for expanded values (notably ,, ;, =, and &), requiring those characters to be percent-encoded to distinguish between use in values or names vs as internal expansion delimiters (or application/x-www-form-urlencoded delimiters).

But allowing reserved characters allows all sub-delims and all gen-delims, and the spec requires the variable value substitution process to perform percent-encoding, and you never want to run percent-encoding twice.

However, in §3.2.1 Variable expansion, we find this as the third paragraph:

The allowed set for a given expansion depends on the expression type:
reserved ("+") and fragment ("#") expansions allow the set of
characters in the union of ( unreserved / reserved / pct-encoded ) to
be passed through without pct-encoding, whereas all other expression
types allow only unreserved characters to be passed through without
pct-encoding. Note that the percent character ("%") is only allowed
as part of a pct-encoded triplet and only for reserved/fragment
expansion: in all other cases, a value character of "%" MUST be pct-
encoded as "%25" by variable expansion.

I find this a bit hard to parse, but I think the key is that last sentence: when using + (equivalent to allowReserved: true), you are expected to percent-encode all reserved (and optionally non-reserved) characters that you do not want in the result prior to template expansion, as template expansion will not do it for you. This is the opposite of the normal behavior, in which you MUST NOT percent-encode in advance, or else the template expansion will double-encode, and turn your % signs that mark any pre-encoded characters into %25, causing the whole thing to be gibberish.

I don't see any examples of this in RFC 6570, and this is pretty buried in the text. I also notice that while we correlate some other things with RFC 6570 syntax, we do not do so for allowReserved, which has left it unclear.

Anyway, @karenetheridge @miqui we can't remove this keyword because it's the only way to get certain valid parameter values serialized correctly. But we need to be a lot more clear on the different times at which encoding happens in various serialization configurations.

@handrews
Copy link
Member Author

So what I've settled on is that this is intended to be used for when you need to pass through some allowed reserved characters (and most of them are allowed in the query string). But when you do that, it is your responsibility to make sure that any illegal reserved characters are encoded separately. Since RFC6570 reserved expansion leaves percent-encoded sequences untouched, the safest and easiest way to do this is to do your percent-encoding manually before RFC 6570 value expansion.

allowReserved doesn't magically makes things work - it gives you the option of doing the work instead so you can customize it.

@handrews
Copy link
Member Author

PRs merged for 3.0.4 and ported to 3.1.1 via PR #3921!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clarification requests to clarify, but not change, part of the spec param serialization Issues related to parameter and/or header serialization
Projects
None yet
Development

No branches or pull requests

3 participants