Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Unicode strings (--unicode-char) #3016

Merged
merged 118 commits into from
Nov 24, 2022

Conversation

robin-aws
Copy link
Member

@robin-aws robin-aws commented Nov 8, 2022

Implementation of the design from dafny-lang/rfcs#13.

Resolves #413. Fixes #2928. Fixes #818. Fixes #3058. Fixes #1293. Fixes #3001.

Depends on #2976 to be fixed for the tests to pass consistently across platforms.

I've documented some of the less obvious compilation strategy decisions in docs/Compilation/StringsAndChars.md.

Ready for review, but will still add the fix to #2928 in the meantime (since it's easy to add as another test case in this set and most of the work is already done).

By submitting this pull request, I confirm that my contribution is made under the terms of the MIT license.

I guess it helps to have no actual character type. Or type checking for that matter. :)
Can’t just use string directly as a sequence without extra customization when —unicode-char
@robin-aws
Copy link
Member Author

robin-aws commented Nov 22, 2022

I've read through the compilers, and the approach looks reasonable to me. My only concern is whether we need a (reference) type to hold codepoints rather than just using a native int. IOW, why can't we define char to be a newtype with 16 bits in the non-unicode mode and 32 bits in the unicode type? Is it to make it easier to track which sequence is a string and which is not?

Just to close the loop, this is justified in "Printing Strings and Characters" in docs/Compilation/StringsAndChars.md (which it turns out Clement had accidentally missed :) ), with more background in #2582.

Source/DafnyCore/Compilers/Compiler-java.cs Outdated Show resolved Hide resolved
Source/DafnyCore/Compilers/SinglePassCompiler.cs Outdated Show resolved Hide resolved
Source/DafnyCore/Dafny.atg Show resolved Hide resolved
Test/comp/UnicodeStrings.dfy Show resolved Hide resolved
Test/unicodechars/dafny0/Char.dfy Show resolved Hide resolved
docs/DafnyRef/Grammar.md Show resolved Hide resolved
docs/DafnyRef/Types.md Show resolved Hide resolved
docs/Compilation/StringsAndChars.md Outdated Show resolved Hide resolved
| Java | `char` | `int` / `dafny.CodePoint` |
| JavaScript | `string` of length 1 | `_dafny.CodePoint` (`number` wrapper) |
| Python | `str` of length 1 | `_dafny.CodePoint` (`str` of length 1 wrapper) |
| Go | `_dafny.Char` (`rune` wrapper) | `_dafny.CodePoint` (`rune` wrapper) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a wrapper needed around Go's rune?

docs/Compilation/StringsAndChars.md Outdated Show resolved Hide resolved
Source/DafnyRuntime/DafnyRuntime.cs Outdated Show resolved Hide resolved
Source/DafnyRuntime/DafnyRuntime.cs Show resolved Hide resolved
Source/DafnyRuntime/DafnyRuntime.py Show resolved Hide resolved
Copy link
Collaborator

@RustanLeino RustanLeino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! This is a good feature and a clear stand on the issue of what Dafny strings are.

@robin-aws robin-aws merged commit f65c111 into dafny-lang:master Nov 24, 2022
@keyboardDrummer
Copy link
Member

Awesome to see this process of careful design followed by implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment