Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

printer, scanner: Don't produce unparsable output. #241

Merged
merged 3 commits into from
Mar 20, 2018

Conversation

octo
Copy link
Contributor

@octo octo commented Mar 20, 2018

This changes fix two cases in which printer.Format() produced output that was not accepted by parser.Parse():

  1. The input ends in \x00: the scanner accepted this, because nothing is following the null byte. The formatter, however, decides to add a newline at the end of the file, assuming that this would be a no-op. Upon re-parsing, the null byte is no longer the last character in the stream and produces an error.
  2. The input contains the U+E123 unicode codepoint: this codepoint is used internally by the formatter to do some postprocessing of the generated output. If the input contains that codepoint, the postprocessing removes semantically relevant information, for example the pound sign signifying the beginning of a comment. This leads to unparsable output.

I think that it's sane to refuse those inputs as "invalid" and I think it's very unlikely that this will break existing use-cases.

The formatter will append a newline at the end of file, causing the output
of printer.Format() to be invalid.
This (invalid) Unicode codepoint is used by the printer package to fix up
the indentation of generated files. If this codepoint is present in the
input, the package gets confused and removes more than it should,
producing unparsable output.
@mitchellh
Copy link
Contributor

Thanks, this looks good. Note on point #2 that technically we probably should disallow any of the private use code points. I chose U+E123 because its slightly hinted at being human-made (1-2-3), but there are ~18,000 private use code points (HCL only uses one).

Thanks!

@mitchellh mitchellh merged commit adef769 into hashicorp:master Mar 20, 2018
@octo octo deleted the scanner-null branch March 22, 2018 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants