Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline string normalization on windows is inconsistent #11988

Closed
c42f opened this issue Jul 2, 2015 · 18 comments · Fixed by #14073
Closed

Multiline string normalization on windows is inconsistent #11988

c42f opened this issue Jul 2, 2015 · 18 comments · Fixed by #14073
Labels
domain:strings "Strings!" system:windows Affects only Windows

Comments

@c42f
Copy link
Member

c42f commented Jul 2, 2015

The following code results in inconsistent line endings and first line normalization on windows:

open("test.txt", "w") do f
    # Write a header
    write(f, """
    This is the first few lines of a text file
    created using a multiline string.  A header, say.
    """
    )
    # Now write some data
    for i=1:5
        write(f, "line $i\n")
    end
end

Result (explicitly writing all nonprintable characters):

\r\n
This is the first few lines of a text file\r\n
created using a multiline string.  A header, say.\r\n
line 1\n
line 2\n
line 3\n
line 4\n
line 5\n

The fact that the first blank line isn't removed as it would be on linux is inconsistent. The underlying issue seems to be that a multiline string gets whatever line endings happen to exist in the source file. Unfortunately this makes multiline strings a bit useless for writing consistent text files on windows, and can lead to some of the more pedantic file parsers rejecting files outright (As recently encountered in the wild by @visr - see discussion on c42f/displaz#40).

Of course, if the julia code above is distributed to windows users with unix line endings the problem won't occur, but windows programs have an annoying habit of "helpfully" converting things (eg, see git.autocrlf). Given this, I think the only sensible option is to normalize multiline strings to the One True Line Ending of "\n" during parsing.

See also #3946, which was the only other place I found this mentioned.

@tkelman
Copy link
Contributor

tkelman commented Jul 2, 2015

This first line bug should probably be fixed. Otherwise, use core.autocrlf input and don't use notepad. There is some honesty in using the line ending that's actually in the file, but it does lead to some inconsistencies as you saw.

@tkelman tkelman added the system:windows Affects only Windows label Jul 2, 2015
@StefanKarpinski
Copy link
Sponsor Member

Should the Julia parser normalize source code so that line endings in string literals are always \n? Otherwise having people working on code bases on different platforms seems like a nightmare.

@ScottPJones
Copy link
Contributor

I would think so. If you need \r\n, that should be done in your output code (which really isn't just OS dependent, it is more device dependent, and even for Windows, many times you just want to write things out in a Unix/Linux compatible way, or you are writing to a file on network file system that lives on a Unix box)

@c42f
Copy link
Member Author

c42f commented Jul 2, 2015

Yes I prefer that julia normalize multiline string literals so that lines always end in \n, regardless of which platform you're on. Otherwise the kind of code snippet I have above acts inconsistently when an unwary user (probably not the original author) edits a file with whatever randomly broken windows editor.

@ScottPJones
Copy link
Contributor

This is not just """, but also " (because those can be multiline also, """ is just about dedenting).
It will have to be fixed in the Scheme parser.

@c42f
Copy link
Member Author

c42f commented Jul 2, 2015

True.

@visr
Copy link
Sponsor Contributor

visr commented Jul 2, 2015

Agree on the normalization. Just wanted to clarify that it's not just a problem because of broken editors on Windows. In this case I just copied the code from the browser to quickly try it out, so there was no core.autocrlf involved. Copying code with string literals with newlines is probably quite common.

@StefanKarpinski
Copy link
Sponsor Member

Git's core.autocrlf feature is good and should be used, but Julia can't depend on a git feature for this.

@tkelman
Copy link
Contributor

tkelman commented Jul 2, 2015

ref

julia/Makefile

Lines 413 to 414 in b6ae1df

echo "[core] eol = lf" >> "$(DESTDIR)$(prefix)/Git/etc/gitconfig" && \
sed -i "s/\bautocrlf = true$$/autocrlf = input/" "$(DESTDIR)$(prefix)/Git/etc/gitconfig" && \
- we actually modify our embedded copy of git to always use unix newlines for Pkg

nolta added a commit that referenced this issue Jul 2, 2015
Strip a leading "\r\n" or "\n" from triple-quoted strings.
@nolta
Copy link
Member

nolta commented Jul 2, 2015

I've pushed a fix for the leading newline issue.

There is some honesty in using the line ending that's actually in the file

Agreed.

@ScottPJones
Copy link
Contributor

However, this is program source, not an arbitrary text file.
Program source might not even be stored as a text file, but rather as a vector of variable-length strings (this is what VMS does, for example). What is important for program source is that it is a logical "newline", if you want to specifically have \r and/or \n characters, use those explicitly.

@stevengj
Copy link
Member

stevengj commented Jul 2, 2015

Python seems to normalize all newlines to \n in multiline literal strings:

julia> using PyCall

julia> pyeval("'''foo\nbar'''")
"foo\nbar"

julia> pyeval("'''foo\rbar'''")
"foo\nbar"

julia> pyeval("'''foo\r\nbar'''")
"foo\nbar"

I agree that it seems sensible for us to do the same.

@StefanKarpinski
Copy link
Sponsor Member

Bump. This came up on julia-users: https://groups.google.com/forum/#!topic/julia-users/N3dgCaT1WYo.

@stevengj
Copy link
Member

I should have a PR shortly.

@stevengj
Copy link
Member

For completeness, do we also want to interpret all of the Unicode newlines as \n?

I've never seen any of the others "in the wild", so I'm not sure what to do about them. (Python doesn't touch anything but CR and CRLF as far as I can tell; the others are left unchanged.)

@StefanKarpinski
Copy link
Sponsor Member

I would just make the occurrence of any of the other newlines in Julia source a syntax error. If you want one, you can always use an escape sequence, and anything else just seems to ask for confusion. I would require \n endings if \r\n weren't so common on Windows systems.

@stevengj
Copy link
Member

In addition, it might be nice to throw an exception for any iscntrl character in Julia source, too, other than a small whitelist (\t, \n, \r). Or only in string literals?

@StefanKarpinski
Copy link
Sponsor Member

Seems like a good idea.

stevengj added a commit that referenced this issue Nov 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!" system:windows Affects only Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants