Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DafnyRef character encoding issues #121

Closed
wilcoxjay opened this issue Jun 21, 2017 · 3 comments · Fixed by #122
Closed

DafnyRef character encoding issues #121

wilcoxjay opened this issue Jun 21, 2017 · 3 comments · Fixed by #122

Comments

@wilcoxjay
Copy link
Collaborator

The copy of DafnyRef checked into the repository has some strange issues that appear to be related to character encoding.

For example, on page 7 of the PDF, in the third paragraph of the introduction, there are several uses of the characters "â€" that are out of place. These also appear in the source Madoko file.

It is possible that it is an issue with my machines, though I have tried viewing the PDF on several machines and on different operating systems, all of which display other unicode characters correctly. So my current hypothesis is that it is not a problem with my machine.

These misencoded characters go all the way back to commit af22a120, which added the current version of the reference manual.

Using this handy table, one can deduce the correct characters from their misencoded counterparts. I have started to go through the reference manual and replace them as I find them, but wanted to check first:

Is this a known issue? Is it all in my head?

@cpitclaudel
Copy link
Member

Certainly not in your head: I see the same issue.

Looks like the following Python 3 program fixes it?

import codecs

def pass_through(error):
    faulty_char = error.object[error.start:error.end]
    return (bytes([ord(faulty_char)]), error.end)

codecs.register_error("pass_through", pass_through)

with open("DafnyRef.mdk", mode="r", encoding="utf-8") as infile:
    with open("DafnyRef.fixed.mdk", mode="w", encoding="utf-8") as outfile:
        outfile.write(infile.read().encode("windows-1252", errors="pass_through").decode("utf-8"))

@wilcoxjay
Copy link
Collaborator Author

The output looks good to me! Do you want to submit a PR?

@cpitclaudel
Copy link
Member

Happy to leave that to you, if you want :) I really don't have much time atm :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants