Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crystal doesn't UTF-8-Validate first byte of input #14579

Closed
BlobCodes opened this issue May 8, 2024 · 4 comments · Fixed by #14750
Closed

Crystal doesn't UTF-8-Validate first byte of input #14579

BlobCodes opened this issue May 8, 2024 · 4 comments · Fixed by #14750
Labels
good first issue This is an issue suited for newcomers to become aquianted with working on the codebase. kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser

Comments

@BlobCodes
Copy link
Contributor

BlobCodes commented May 8, 2024

Bug Report

The following code compiles fine, even though the macro generates invalid UTF-8:

{{ "\xFF = 2".id }}

This only works if the first character of any input is invalid UTF-8. If any other character is invalid, an exception is raised:

{{ "\xFF\xFE = 2".id }}
# Unexpected byte 0xfe at position 1, malformed UTF-8 (InvalidByteSequenceError)
#   from /crystal/src/compiler/crystal/syntax/lexer.cr:2759:9 in '??'
#   from /crystal/src/compiler/crystal/syntax/lexer.cr:1057:11 in 'next_token'
#   from /crystal/src/enum.cr:361:3 in 'parse_macro_source'
#   from /crystal/src/compiler/crystal/semantic/semantic_visitor.cr:359:23 in 'expand_inline_macro'
#   from /crystal/src/compiler/crystal/semantic/semantic_visitor.cr:431:3 in 'accept'
#   from /crystal/src/enumerable.cr:510:7 in '??'
#   from /crystal/src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
#   from /crystal/src/compiler/crystal/semantic.cr:70:7 in 'semantic:cleanup'
#   from /crystal/src/compiler/crystal/compiler.cr:201:14 in 'compile:combine_rpath'
#   from /crystal/src/compiler/crystal/compiler.cr:195:56 in 'compile:combine_rpath'
#   from /crystal/src/compiler/crystal/command/eval.cr:30:5 in 'eval'
#   from /crystal/src/compiler/crystal/command.cr:126:12 in 'run'
#   from /crystal/src/compiler/crystal.cr:11:1 in '__crystal_main'
#   from /crystal/src/crystal/main.cr:129:5 in 'main'
#   from src/env/__libc_start_main.c:95:2 in 'libc_start_main_stage2'
# Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

Oh, and this "you've found a bug in the Crystal compiler" message should probably also be fixed.


$ crystal -v

Crystal 1.12.1 [4cea10199] (2024-04-11)

LLVM: 15.0.7
Default target: x86_64-unknown-linux-gnu
@BlobCodes BlobCodes added the kind:bug A bug in the code. Does not apply to documentation, specs, etc. label May 8, 2024
@straight-shoota
Copy link
Member

Oh, and this "you've found a bug in the Crystal compiler" message should probably also be fixed.

What fixing does it need?

@BlobCodes
Copy link
Contributor Author

I just meant that macros generating invalid UTF-8 shouldn't result in a "compiler bug" message because it's user error.

@straight-shoota
Copy link
Member

Note the same error appears when the first byte of the source file is invalid UTF-8 encoding.

$ echo '\xFF' | bin/crystal eval
Using compiled compiler at .build/crystal
Regex match error: UTF-8 error: illegal byte (0xfe or 0xff) (ArgumentError)
  from src/regex/pcre2.cr:275:9 in 'match_data'
  from src/regex/pcre2.cr:207:18 in 'match_impl'
  from src/regex.cr:672:12 in 'match_at_byte_index'
  from src/regex.cr:621:12 in 'match:options'
  from src/string.cr:3227:13 in '=~'
  from src/compiler/crystal/semantic/suggestions.cr:41:25 in 'lookup_similar_def'
  from src/compiler/crystal/semantic/suggestions.cr:73:7 in 'lookup_similar_def_name'
  from src/compiler/crystal/semantic/call_error.cr:594:5 in 'raise_undefined_method'
  from src/compiler/crystal/semantic/call_error.cr:98:7 in 'raise_matches_not_found'
  from src/compiler/crystal/semantic/call.cr:291:9 in 'lookup_matches_in_type'
  from src/compiler/crystal/semantic/call.cr:254:3 in 'lookup_matches_in_type:search_in_parents:with_autocast'
  from src/compiler/crystal/semantic/call.cr:210:5 in 'lookup_matches_in'
  from src/compiler/crystal/semantic/call.cr:209:3 in 'lookup_matches_in:with_autocast'
  from src/compiler/crystal/semantic/call.cr:197:7 in 'lookup_matches_without_splat'
  from src/compiler/crystal/semantic/call.cr:124:17 in 'lookup_matches:with_autocast'
  from src/compiler/crystal/semantic/call.cr:113:5 in 'lookup_matches'
  from src/compiler/crystal/semantic/call.cr:90:15 in 'recalculate'
  from src/compiler/crystal/semantic/main_visitor.cr:1380:7 in 'recalculate_call'
  from src/compiler/crystal/semantic/main_visitor.cr:1359:7 in 'visit'
  from src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
  from src/compiler/crystal/semantic/main_visitor.cr:688:11 in 'visit'
  from src/compiler/crystal/syntax/visitor.cr:27:12 in 'accept'
  from src/compiler/crystal/semantic/main_visitor.cr:6:7 in 'visit_main:process_finished_hooks:cleanup:visitor'
  from src/compiler/crystal/progress_tracker.cr:22:7 in 'semantic:cleanup'
  from src/compiler/crystal/compiler.cr:219:14 in 'compile:combine_rpath'
  from src/compiler/crystal/compiler.cr:213:56 in 'compile:combine_rpath'
  from src/compiler/crystal/command/eval.cr:29:5 in 'eval'
  from src/compiler/crystal/command.cr:101:7 in 'run'
  from src/compiler/crystal/command.cr:55:5 in 'run'
  from src/compiler/crystal/command.cr:54:3 in 'run'
  from src/compiler/crystal.cr:11:1 in '__crystal_main'
  from src/crystal/main.cr:129:5 in 'main_user_code'
  from src/crystal/main.cr:115:7 in 'main'
  from src/crystal/main.cr:141:3 in 'main'
  from /lib/x86_64-linux-gnu/libc.so.6 in '??'
  from /lib/x86_64-linux-gnu/libc.so.6 in '__libc_start_main'
  from /home/johannes/src/crystal-lang/crystal/.build/crystal in '_start'
  from ???
Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

It's very peculiar that the compiler advances as far as into a regex match to look for similar names until it notices something is wrong.

If an invalid encoding is in any later byte, the compiler errors gracefully:

$ echo '-\xFF' | crystal eval
Error: file 'eval' is not a valid Crystal source file: Unexpected byte 0xff at position 1, malformed UTF-8

@HertzDevil
Copy link
Contributor

The expected error is only raised inside Crystal::Lexer#next_char_no_column_increment after a call to Char::Reader#next_char; this needs to be done in Crystal::Lexer#initialize as well

@straight-shoota straight-shoota added the good first issue This is an issue suited for newcomers to become aquianted with working on the codebase. label Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue This is an issue suited for newcomers to become aquianted with working on the codebase. kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:parser
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants