Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support malformed char literals, the second #44989

Merged
merged 1 commit into from
May 25, 2022
Merged

Conversation

simeonschaub
Copy link
Member

Alternative to #44765. This disallows character literals that can not be
created from iterating a UTF-8 string.

fixes #25072

@simeonschaub simeonschaub added parser Language parsing and surface syntax domain:unicode Related to unicode characters and encodings labels Apr 15, 2022
@simeonschaub simeonschaub changed the title support malformed characters, the second support malformed char literals, the second Apr 15, 2022
@test_parseerror "'\\xff\\xff\\xff\\xff'" "character literal contains multiple characters" # == reinterpret(Char, 0xffffffff)
@test '\uffff' == Char(0xffff)
@test '\U00002014' == Char(0x2014)
@test '\100' == reinterpret(Char, UInt32(0o100) << 24)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just @, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, perhaps this makes it more obvious:

Suggested change
@test '\100' == reinterpret(Char, UInt32(0o100) << 24)
@test '\100' == reinterpret(Char, UInt32(0o100) << 24) == '@'

@test_parseerror "'\\U00002014a'" "character literal contains multiple characters"
@test_parseerror "'\\1000'" "character literal contains multiple characters"
@test Meta.isexpr(Meta.parse("'a"), :incomplete)
@test ''' == "'"[1]
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I never realized that would work, but it already does so carry on...

@StefanKarpinski
Copy link
Sponsor Member

I think this implements exactly what I had in mind. My only quibble is with the commit message, which says:

This disallows character literals that can not be created from iterating a UTF-8 string.

That isn't strictly true since it allows invalid UTF-8, so I would describe this perhaps this way:

This makes the syntax for character literals the same as what is allowed when writing a single-character String literal, except, of course, with different quote character.

simeonschaub added a commit that referenced this pull request Apr 25, 2022
This is pulled out from #44989. Making `test_parseerror` a macro leads
to much more helpful stacktraces if something fails.
src/ast.c Outdated Show resolved Hide resolved
@JeffBezanson
Copy link
Sponsor Member

Tidied up and added news, should be ready to merge.

Make the syntax for character literals the same as what is allowed in
single-character string literals.

Alternative to #44765

fixes #25072
@PallHaraldsson
Copy link
Contributor

I see this changed the flisp parser, so I guess JuliaSyntax.jl needs to be updated too, I assume forgotten.

I'm also thinking, invalid UTF-8 means overlong, and it can be overlong to arbitrary degree (in Strings), while Char is fixed, was that an( unsolvable?) problem thought of?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:unicode Related to unicode characters and encodings parser Language parsing and surface syntax
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'\xc0\x80' should either error or make an overlong Char
5 participants