support malformed char literals, the second #44989

simeonschaub · 2022-04-15T12:57:09Z

Alternative to #44765. This disallows character literals that can not be
created from iterating a UTF-8 string.

StefanKarpinski · 2022-04-19T19:15:56Z

test/syntax.jl

+ @test_parseerror "'\\xff\\xff\\xff\\xff'" "character literal contains multiple characters" # == reinterpret(Char, 0xffffffff)
+ @test '\uffff' == Char(0xffff)
+ @test '\U00002014' == Char(0x2014)
+ @test '\100' == reinterpret(Char, UInt32(0o100) << 24)


This is just @, right?

Yeah, perhaps this makes it more obvious:

Suggested change

@test '\100' == reinterpret(Char, UInt32(0o100) << 24)

@test '\100' == reinterpret(Char, UInt32(0o100) << 24) == '@'

StefanKarpinski · 2022-04-19T19:19:34Z

test/syntax.jl

+ @test_parseerror "'\\U00002014a'" "character literal contains multiple characters"
+ @test_parseerror "'\\1000'" "character literal contains multiple characters"
+ @test Meta.isexpr(Meta.parse("'a"), :incomplete)
+ @test ''' == "'"[1]


Huh, I never realized that would work, but it already does so carry on...

StefanKarpinski · 2022-04-19T19:23:20Z

I think this implements exactly what I had in mind. My only quibble is with the commit message, which says:

This disallows character literals that can not be created from iterating a UTF-8 string.

That isn't strictly true since it allows invalid UTF-8, so I would describe this perhaps this way:

This makes the syntax for character literals the same as what is allowed when writing a single-character String literal, except, of course, with different quote character.

This is pulled out from #44989. Making `test_parseerror` a macro leads to much more helpful stacktraces if something fails.

src/ast.c

JeffBezanson · 2022-05-19T19:20:39Z

Tidied up and added news, should be ready to merge.

Make the syntax for character literals the same as what is allowed in single-character string literals. Alternative to #44765 fixes #25072

PallHaraldsson · 2023-04-14T08:27:08Z

I see this changed the flisp parser, so I guess JuliaSyntax.jl needs to be updated too, I assume forgotten.

I'm also thinking, invalid UTF-8 means overlong, and it can be overlong to arbitrary degree (in Strings), while Char is fixed, was that an( unsolvable?) problem thought of?

simeonschaub added parser Language parsing and surface syntax domain:unicode Related to unicode characters and encodings labels Apr 15, 2022

simeonschaub requested review from JeffBezanson and StefanKarpinski April 15, 2022 12:57

simeonschaub changed the title ~~support malformed characters, the second~~ support malformed char literals, the second Apr 15, 2022

StefanKarpinski reviewed Apr 19, 2022

View reviewed changes

simeonschaub mentioned this pull request Apr 25, 2022

refactor syntax tests slightly #45081

Closed

simeonschaub added a commit that referenced this pull request Apr 25, 2022

refactor syntax tests slightly

0db5f48

This is pulled out from #44989. Making `test_parseerror` a macro leads to much more helpful stacktraces if something fails.

JeffBezanson reviewed Apr 26, 2022

View reviewed changes

src/ast.c Outdated Show resolved Hide resolved

JeffBezanson mentioned this pull request Apr 26, 2022

properly support malformed char literals #44765

Closed

JeffBezanson force-pushed the sds/char_literals2 branch from 71683df to 16e37c2 Compare May 19, 2022 19:20

JeffBezanson approved these changes May 19, 2022

View reviewed changes

JeffBezanson closed this May 24, 2022

JeffBezanson reopened this May 24, 2022

support malformed chars in char literal syntax

a1ce793

Make the syntax for character literals the same as what is allowed in single-character string literals. Alternative to #44765 fixes #25072

JeffBezanson force-pushed the sds/char_literals2 branch from 16e37c2 to a1ce793 Compare May 24, 2022 19:37

KristofferC merged commit 991190f into master May 25, 2022

KristofferC deleted the sds/char_literals2 branch May 25, 2022 14:19

stevengj mentioned this pull request Jan 15, 2023

incorrect search results for malformed Char #48283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support malformed char literals, the second #44989

support malformed char literals, the second #44989

simeonschaub commented Apr 15, 2022

StefanKarpinski Apr 19, 2022

simeonschaub Apr 19, 2022

StefanKarpinski Apr 19, 2022

StefanKarpinski commented Apr 19, 2022

JeffBezanson commented May 19, 2022

PallHaraldsson commented Apr 14, 2023

	@test '\100' == reinterpret(Char, UInt32(0o100) << 24)
	@test '\100' == reinterpret(Char, UInt32(0o100) << 24) == '@'

support malformed char literals, the second #44989

support malformed char literals, the second #44989

Conversation

simeonschaub commented Apr 15, 2022

StefanKarpinski Apr 19, 2022

Choose a reason for hiding this comment

simeonschaub Apr 19, 2022

Choose a reason for hiding this comment

StefanKarpinski Apr 19, 2022

Choose a reason for hiding this comment

StefanKarpinski commented Apr 19, 2022

JeffBezanson commented May 19, 2022

PallHaraldsson commented Apr 14, 2023