Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decode paired surrogates in unicode escapes #657

Open
SimonSapin opened this issue Sep 25, 2023 · 0 comments
Open

Decode paired surrogates in unicode escapes #657

SimonSapin opened this issue Sep 25, 2023 · 0 comments

Comments

@SimonSapin
Copy link
Contributor

Split off from #608

The draft GraphQL spec adds a new feature:

https://spec.graphql.org/draft/#sec-String-Value.Escape-Sequences

For legacy reasons, a supplementary character may be escaped by two fixed-width unicode escape sequences forming a surrogate pair. For example the input "\uD83D\uDCA9" is a valid StringValue which represents the same Unicode text as "\u{1F4A9}". While this legacy form is allowed, it should be avoided as a variable-width unicode escape sequence is a clearer way to encode such code points.

(Variable-width unicode escape sequence mentioned here is another new feature, tracked at #640)

https://spec.graphql.org/draft/#sec-String-Value.Static-Semantics specifies the precise algorithm such as a pair of leading and trailing surrogates are decoded as one char, but surrogates not in such a pair are parse errors.

Until we implement this feature, we’ll fix the panic in #608 by making all escaped surrogates parse errors, whether in a well-formed pair or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant