Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make overlong Chars !isvalid; throw error on UInt32 of overlong Char #25559

Merged
merged 1 commit into from
Jan 23, 2018

Conversation

JeffBezanson
Copy link
Sponsor Member

This takes care of most of #25452.

  • Handle an extra case of overlong encoding.
  • Make isvalid(::Char) return false on overlongs.
  • Make UInt32(::Char) throw an error on overlongs.

@JeffBezanson JeffBezanson added the domain:unicode Related to unicode characters and encodings label Jan 14, 2018
@nalimilan
Copy link
Member

CI failures look legitimate.

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pushes the decoding function checks to the point where bit-twiddling will be less effective than a set of range checks. I can take a crack at that version and see how it goes. I'm not sure about introducing this isoverlong predicate. Why not just have isvalid? I guess it's somewhat informative.

@StefanKarpinski
Copy link
Sponsor Member

Another thing: since we're going to throw for all invalid characters, we may as well replace MalformedChar with InvalidChar and just lump all forms of invalidness together. The point of the previous distinction that since we didn't throw for all invalid characters, calling it InvalidChar would be misleading – it was only truly malformed byte sequences that caused an error before.

@JeffBezanson
Copy link
Sponsor Member Author

Updated.

I think isoverlong is fine for now; it's not exported (wasn't introduced by this PR either).

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. We can make performance tweaks in any time – this is the behavior we want.

@JeffBezanson JeffBezanson merged commit 70e3430 into master Jan 23, 2018
@JeffBezanson JeffBezanson deleted the jb/overlong branch January 23, 2018 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants