Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

display something useful for text/plain output of invalid String #18296

Merged
merged 1 commit into from
Aug 31, 2016

Conversation

stevengj
Copy link
Member

Rather than throwing an exception in the REPL when it tries to show a String containing invalid UTF-8 data, this displays something useful.

@stevengj
Copy link
Member Author

Sample output:

julia> String(rand(UInt8, 10))
10-byte String of invalid UTF-8 data:
 0x17
 0xc6
 0x08
 0xde
 0x53
 0x2e
 0xae
 0xc3
 0x1b
 0x49

@nalimilan
Copy link
Member

Good idea. Though it would be even more useful if you could also print the beginning of the string up to the first invalid character. That would make it easier to spot where the problem is (especially in long strings).

@stevengj
Copy link
Member Author

@nalimilan, my thinking was that strings containing invalid UTF-8 are most likely because the user was stuffing arbitrary binary data into a String, in which case there is no point in printing any of it as a string.

@stevengj
Copy link
Member Author

stevengj commented Aug 31, 2016

(We can always do this for now, and decide later if we need more verbose output to explain where the data stops being valid UTF-8 if that turns out to be useful in practice.)

@nalimilan
Copy link
Member

@nalimilan, my thinking was that strings containing invalid UTF-8 are most likely because the user was stuffing arbitrary binary data into a String, in which case there is no point in printing any of it as a string.

One can also frequently get partially invalid strings, e.g. when reading text in the wrong encoding (ISO-8859-* as UTF-8), or with corrupt filenames. It's very useful in that case to know where's the invalid character.

(We can always do this for now, and decide later if we need more verbose output to explain where the data stops being valid UTF-8 if that turns out to be useful in practice.)

Sure.

@stevengj stevengj merged commit 2b2894c into JuliaLang:master Aug 31, 2016
@stevengj stevengj deleted the invalid-string-show branch August 31, 2016 15:39
@StefanKarpinski
Copy link
Sponsor Member

This will get re-revised in the 0.6 release as part of #16107 but for now this is fine.

@tkelman tkelman added this to the 0.5.x milestone Sep 7, 2016
tkelman pushed a commit that referenced this pull request Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants