Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type test for char #3814

Open
RustanLeino opened this issue Mar 28, 2023 · 2 comments
Open

Type test for char #3814

RustanLeino opened this issue Mar 28, 2023 · 2 comments
Labels
kind: enhancement Enhancements, feature requests, etc. These are NOT bugs, but ways we can improve Dafny

Comments

@RustanLeino
Copy link
Collaborator

Summary

For an integer u, the expression u is char is currently not supported.

Many variations of of types are supported for is, but testing whether an int is a char is currently missing.

Background and Motivation

Stating u is char as an integer range is awkward and error prone.

Also, for code that is migrating from before-Unicode support to full Unicode support, it would be nicer to have an expression that takes on different meanings depending on what Unicode mode Dafny is invoked in.

Proposed Feature

With --unicode-char:false, u is char (where u has type int`) should be equivalent to

0 <= u < 0x1_0000

With --unicode-char:true, u is char (where u has type int`) should be equivalent to

0 <= u < 0xD800 || 0xE000 <= u < 0x10_0000

Alternatives

No response

@RustanLeino RustanLeino added the kind: enhancement Enhancements, feature requests, etc. These are NOT bugs, but ways we can improve Dafny label Mar 28, 2023
@robin-aws
Copy link
Member

I'm not sure I love this idea, as much as I agree with the motivation. Currently is will allow the RHS to be a subset type but not a newtype, so something similar like x is uint8 is rejected (type test for type 'uint8' must be from an expression assignable to it (got 'int')).

I usually think of is as a test of "is this value already in fact of this type", which lines up well with the semantics of subset types w.r.t. their base types, and classes w.r.t. their traits, for example. char behaves like a newtype rather than a subset type, so I feel like using is to mean "is convertable to" would be inconsistent.

@mattmccutchen-amazon
Copy link

With --unicode-char:true, u is char (where u has type int`) should be equivalent to

0 <= u < 0xD800 || 0xE000 <= u < 0x10_0000

This should be:

0 <= u < 0xD800 || 0xE000 <= u <= 0x10_FFFF

Valid Unicode scalar values go up to 0x10FFFF. It looks like Dafny implements this correctly and only the comment above was wrong. Of course, the <= 0x10_FFFF can be replaced with < 0x11_0000 at the taste of the user. I personally find it easier to remember that 0x10FFFF is the largest valid value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: enhancement Enhancements, feature requests, etc. These are NOT bugs, but ways we can improve Dafny
Projects
None yet
Development

No branches or pull requests

3 participants