Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output encoding issues on Windows #2976

Closed
cpitclaudel opened this issue Nov 3, 2022 · 3 comments
Closed

Output encoding issues on Windows #2976

cpitclaudel opened this issue Nov 3, 2022 · 3 comments
Labels
has-workaround: yes There is a known workaround kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label lang: java Dafny's Java transpiler and its runtime part: CLI interacting with Dafny on the command line

Comments

@cpitclaudel
Copy link
Member

cpitclaudel commented Nov 3, 2022

On Windows, the following program prints (with dafny run) different outputs depending on the compiler that it gets executed on:

method Main() {
  print [8364 as char];
}

Specifically:

On Windows in cmd.exe:

C:\>FOR %t in (cs py java js) DO Dafny run --no-verify --target=%t encodings.dfy

C>Dafny run --no-verify --target=cs encodings.dfy

Dafny program verifier did not attempt verification
?
C:\>Dafny run --no-verify --target=py encodings.dfy

Dafny program verifier did not attempt verification

C:\>Dafny run --no-verify --target=java encodings.dfy

Dafny program verifier did not attempt verification
Ç

C:\>Dafny run --no-verify --target=js encodings.dfy

Dafny program verifier did not attempt verification
Γé¼

Arguably, this is the user's fault: the above happens because the default cmd.exe codepage doesn't even have the € character. So, let's fix that:

On windows with chcp 65001 (the utf-8 codepage):

C:\>FOR %t in (cs py java js) DO Dafny run --no-verify --target=%t encodings.dfy

C:\>Dafny run --no-verify --target=cs encodings.dfy

Dafny program verifier did not attempt verification
€
C:\>Dafny run --no-verify --target=py encodings.dfy

Dafny program verifier did not attempt verification

C:\>Dafny run --no-verify --target=java encodings.dfy

Dafny program verifier did not attempt verification
�

C:\>Dafny run --no-verify --target=js encodings.dfy

Dafny program verifier did not attempt verification
€

C# and JS are good. Python and Java are not.

Python can be fixed with PYTHONIOENCODING=utf8.
The only way to fix java is to use JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8, but unfortunately that also pollutes the output:

C:\>Dafny run --no-verify --target=java encodings.dfy

Dafny program verifier did not attempt verification
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
€
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8

On Ubuntu/WSL:

$ for t in cs py java js; do Dafny run --target=$t encodings.dfy; done

Dafny program verifier finished with 1 verified, 0 errors

Dafny program verifier finished with 1 verified, 0 errors

Dafny program verifier finished with 1 verified, 0 errors

Dafny program verifier finished with 1 verified, 0 errors

Much better (in particular, java respects LANG, and that defaults to <something>.UTF-8)

macOS

(from @robin-aws)

% dafny run --no-verify --target=cs encodings.dfy 

Dafny program verifier did not attempt verification
€% 
% dafny run --no-verify --target=py encodings.dfy

Dafny program verifier did not attempt verification
€%
% dafny run --no-verify --target=java encodings.dfy

Dafny program verifier did not attempt verification

% dafny run --no-verify --target=js encodings.dfy

Dafny program verifier did not attempt verification
€%             

Note the % symbols show up as highlighted in my console. I haven't yet inspected the raw bytes of the output to figure those out. If I change the program to print [8364 as char], "\n"; they go away.

The problem

When using dafny build users can configure the environment and invoke the runtime however they want (e.g. pass -Dfile.encoding=UTF-8 to java on the command line, which doesn't produce "picked up …" warnings).

For dafny run, things are not so simple. For some of our languages there are ways to configure things (chcp for C# and node, PYTHONIOENCODING for Python), but for Java there is only JAVA_TOOL_OPTIONS, which pollutes the output of every Java program.

Should we change dafny run to pass -Dfile.encoding=UTF-8 to java directly?

PS: Here's a useful small program to test things out in Java; use javac Encoding.java and java Encoding to run it:

public class Encoding {
    public static void main(String[] args) {
        System.out.println(System.getProperty("file.encoding"));
        System.out.println((char)8364L);
    }
}
@cpitclaudel cpitclaudel added kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label lang: java Dafny's Java transpiler and its runtime has-workaround: yes There is a known workaround part: CLI interacting with Dafny on the command line labels Nov 3, 2022
@cpitclaudel
Copy link
Member Author

Note that for Python, the better environment variable to use would be PYTHONLEGACYWINDOWSSTDIO, as documented here: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONLEGACYWINDOWSSTDIO

If we later decide to change all of our target languages to produce UTF-8 always, then we would move to PYTHONIOENCODING.

@cpitclaudel
Copy link
Member Author

Note that for Python, the better environment variable to use would be PYTHONLEGACYWINDOWSSTDIO, as documented here: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONLEGACYWINDOWSSTDIO

Unfortunately, that Python documentation is correct for Python 3.7, but wrong for Python 3.8+, as documented in python/cpython#86427 , so the situation is even more complicated for Python than it is for Java. Here is the issue:

# As expected: default is utf-8
C:\Users\cpitcla>python3 -c "import sys; print(sys.stdout.encoding); print('€')"
utf-8
€

# … even though the codepage is 437: that's because Python ignores it by default
C:\Users\cpitcla>chcp
Active code page: 437

# Now we set PYTHONLEGACYWINDOWSSTDIO to have Python obey chcp:
C:\Users\cpitcla>set PYTHONLEGACYWINDOWSSTDIO=1

C:\Users\cpitcla>chcp 65001
Active code page: 65001

# But it ignores it and instead uses cp1252, causing the console to get confused
C:\Users\cpitcla>python3 -c "import sys; print(sys.stdout.encoding); print('€')"
cp1252
�

# And indeed if we switch to cp1252 we then get the right display, but Python should have respected 65001, above!
C:\Users\cpitcla>chcp 1252
Active code page: 1252

C:\Users\cpitcla>python3 -c "import sys; print(sys.stdout.encoding); print('€')"
cp1252
€

cpitclaudel added a commit that referenced this issue Nov 8, 2022
This is intended to make it possible to test Unicode output. I skipped
Java because Java needs a separate patch (since it doesn't respect the
current encoding / codepage, see
#2976)
@cpitclaudel
Copy link
Member Author

Fixed in 2976.

robin-aws added a commit that referenced this issue Nov 24, 2022
Implementation of the design from
dafny-lang/rfcs#13.

Resolves #413. Fixes #2928. Fixes #818. Fixes #3058. Fixes #1293. Fixes
#3001.

Depends on #2976 to be fixed for the tests to pass consistently across
platforms.

I've documented some of the less obvious compilation strategy decisions
in `docs/Compilation/StringsAndChars.md`.

<small>By submitting this pull request, I confirm that my contribution
is made under the terms of the [MIT
license](https://github.com/dafny-lang/dafny/blob/master/LICENSE.txt).</small>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has-workaround: yes There is a known workaround kind: bug Crashes, unsoundness, incorrect output, etc. If possible, add a `part:` label lang: java Dafny's Java transpiler and its runtime part: CLI interacting with Dafny on the command line
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant