RFC: Deprecate Int-Char comparisons, e.g. 'x' == 120 #16024

StefanKarpinski · 2016-04-23T20:54:45Z

StefanKarpinski · 2016-04-23T21:02:57Z

What's going on the CI here?

yuyichao · 2016-04-23T21:06:13Z

I don't think the PR CI will run with merge conflict (since it runs on merge commit)

nalimilan · 2016-04-23T21:50:47Z

base/markdown/parse/parse.jl

@@ -51,7 +51,9 @@ function parseinline(stream::IO, md::MD, config::Config)
 content = []
 buffer = IOBuffer()
 while !eof(stream)
- char = peek(stream)
+ # FIXME: this is broken if we're looking for non-ASCII


Woudn't peekchar fix this problem?

we need peek to match the current read interface:

read(io, UInt8) read(io, Char) peek(io, UInt8) peek(io, Char)

with the defaults matching.

Unfortunately, peekchar only works for the IOStream type, so this fails.

stevengj · 2016-04-23T22:52:42Z

I have to say that I'm not convinced that this change is a good idea. Comparison of Char with integers is a pretty well-defined operation, thanks to Unicode, is familiar from many other languages, and is convenient (witness all of the annoying conversions required by this patch, when the original code was perfectly clear).

Tetralux · 2016-04-24T03:32:53Z

@stevengj Is not c == 'x' clearer than c == 120 ?

stevengj · 2016-04-24T04:15:50Z

@H-225, that's not the problem. You can already write c == 'x', and any sane programmer would do so. But if you look at Stefan's patch, you'll see that there are lots of cases where you read in a single UInt8 byte b, and you want to check b == '\n' and similar. And b == '\n' is much clearer than b == 10 and more convenient than b == UInt8('\n'), which are what this patch requires.

Tetralux · 2016-04-24T11:58:36Z

@stevengj My bad - totally should have noticed that.
Devils Advocate: Maybe some kind of Char prefix is in order? b'x' -> UInt8('x') @StefanKarpinski @JeffBezanson
Or would that be considered unnecessary?

nalimilan · 2016-04-24T13:52:27Z

Adding a new prefix system just to avoid typing a few chars? That doesn't sound worth it.

stevengj · 2016-04-24T14:23:23Z

Put it another way: has this change (a) caught any errors or (b) made the Base code cleaner? Looks like "no" on both counts?

Tetralux · 2016-04-25T03:28:36Z

I think these are the important points here:

c = Char('x'); c == 120 is bad; I don't want to have to know that UInt8('x') == 120.
n = UInt8('x'); n == 'x' is good; the intent is clear. Plus, it covers the quote below.

there are lots of cases where you read in a single UInt8 byte b, and you want to check b == '\n' and similar.

I therefore agree with @stevengj; we should not remove UInt8/Char comparisons.
However, I would be OK with removing Int/Char comparisons.

nalimilan · 2016-04-25T09:06:18Z

@H-225 These two examples don't make sense. They should be c = UInt8('x'); c == 120 and n = Char(120); n == 'x'. Both can still be written with or without this PR. What the PR does is force you to use one of these two forms, while before that you could have written 'x' == 120 directly.

stevengj · 2016-04-25T14:12:32Z

@nalimilan, the point is that there are lots of cases (e.g. see this PR!) where you have already read in a byte as a b::UInt8, and you want to check b == '\n' or similar. This PR forces you to do an explicit cast somewhere first. What do we gain by it?

@H-225, it would seem rather odd and arbitrary if you could compare Char to UInt8 but not to other Integer types. (Also a bit short-sighted -- the same reasoning applies to UInt16 for processing UTF-16 data, and also to UInt32 for processing UTF-32 data, especially if Stefan goes ahead with #14383.)

StefanKarpinski · 2016-04-25T14:46:08Z

@stevengj: there are actually several places where there were actual or potential bugs here due to comparing UInt8 values with Char and assuming that if the byte value equals the char value (Unicode code point) then you have a match. This isn't true, of course, unless the stream is encoded as Latin-1. Converting from UInt8 to Char doesn't actually fix that issue, of course, it just makes it a little more obvious. The correct solution is to make sure that I/O APIs like peek return chars correctly, not bytes.

That was also not the motivation for this change. Currently we have isequal('x', 120) which means that Set(Any[120, 'x']) == Set(['x']) which is clearly not good. The only alternative to fixing this while keeping 'x' == 120 is to make 'x' == 120 but !isequal('x', 120) – which would be the only case aside from NaNs and ±0.0 where == and isequal currently differ. We can do that, but it's a biggish step semantically and begins to widen the crack between == and isequal.

stevengj · 2016-04-25T15:41:41Z

@StefanKarpinski, I don't see how having peek etcetera return Char will help with the problem of accidentally reading a stream in some non-Unicode-compatible encoding.

Making == and isequal differ here makes a lot of sense to me, actually, but I can see why you would be reluctant.

StefanKarpinski · 2016-04-25T16:12:55Z

@StefanKarpinski, I don't see how having peek etcetera return Char will help with the problem of accidentally reading a stream in some non-Unicode-compatible encoding.

Because it decodes a Char value in whatever way is appropriate for the stream. Currently we don't support streams with different encodings, but that's something we should support in the future. For now it at least fixes the common problem for UTF-8 streams.

nalimilan · 2016-04-25T16:19:11Z

Currently we don't support streams with different encodings, but that's something we should support in the future.

See https://github.com/nalimilan/StringEncodings.jl#advanced-usage-stringencoder-and-stringdecoder

StefanKarpinski · 2016-04-25T16:22:55Z

@stevengj: I'm ok with making == and isequal more different, but that requires a bit of thought about meaning. Why is this the one case, aside from IEEE 754 weirdness, where it's ok for == and isequal to differ? In general, I've been contemplating whether == should raise an error when you try to compare objects of "incomparable types"; whereas isequal must support that (and === does too).

stevengj · 2016-04-25T19:15:21Z

Because comparison of chars to integers is well-defined (Unicode), convenient, and a long-standing convention in many languages? They aren't really "incomparable types".

StefanKarpinski · 2016-04-25T19:28:25Z

So it's ok to compare 'x' == 120 but !isequal('x', 120). What about 'x' == 120.0? Is that ok too? So Char is not a numeric type, but it has some canonical numerical embedding? Just deciding to do this feels very ad-hoc unless we have some broader concept of why it's ok to do this.

JeffBezanson · 2016-04-25T19:29:10Z

The recent issue about comparing Exprs containing NaN is a good example. :(print(65)) and :(print('A')) clearly should not be equal in any sense. For this particular case, separating chars and ints with isequal would be sufficient, but like Stefan I worry about having many ad-hoc differences between == and isequal.

stevengj · 2016-04-27T13:33:36Z

@StefanKarpinski, having 'x' == 120 return false seems like an invitation for future bugs. Deprecating it is one thing, but taking a construct that has a widespread clear meaning in many other programming languages and turning it on its head seems far worse than the problem you are trying to fix here. I think the only sane alternatives here are either to make == different from isequal or to plan on a permanent deprecation warning.

@ararslan, the mapping of Char to Int is unambiguous, literally standardized, and widespread in programming languages. Automatic mapping of strings to integers is none of those things.

stevengj · 2016-04-27T13:35:58Z

@ararslan, defining zero(Char) seems like a hack to me: the zero function is supposed to return the additive identity, but + is not defined for Char and so it has no additive identity.

vtjnash · 2016-05-05T19:25:46Z

have to say that I think we should do this. We clearly need to make !isequal('x', 120) and there are two approaches: this or making == and isequal different.

while reviewing/fixup tests, I found one that asserts '/' != "/home/me" (

julia/test/path.jl

Line 3 in e0e93fc

@unix_only @test expanduser("~")[1] != ENV["HOME"]

). it might have been helpful if that was a NotComparableError.

StefanKarpinski · 2016-05-05T20:31:38Z

it might have been helpful if that was a NotComparableError.

If we're going to go down the road of having NotComparableError, then we should really go all in. I'm all for it, but I don't think it belongs in this fairly minimal PR to make Set(Any['x',120]) work right.

tkelman · 2016-05-05T20:34:17Z

base/precompile.jl

@@ -478,6 +478,6 @@ precompile(Base.launch, (Base.LocalManager, Dict, Array{Base.WorkerConfig, 1}, B
 precompile(Base.set_valid_processes, (Array{Int, 1}, ))

 # Speed up repl help
-sprint(Markdown.term, @doc mean)
+# sprint(Markdown.term, @doc mean)


what was wrong here?

Oh, good catch. I commented it out because bootstrap was failing and then forgot to try putting it back in.

See #15983 (comment)

StefanKarpinski · 2016-05-09T07:48:36Z

Any idea why we're not getting AppVeyor CI on this?

StefanKarpinski · 2016-05-09T08:09:19Z

Nevermind, it kicked in after a while.

tkelman · 2016-05-09T17:31:44Z

Needs a NEWS.md item.

ararslan · 2016-05-09T17:47:12Z

My understanding is that once things are deprecated, they'll be removed in a future release. So if char-int comparison is deprecated, what are we envisioning the eventual behavior to be once it's removed entirely? A MethodError from ==?

JeffBezanson · 2016-05-09T17:55:03Z

Saying that deprecation implies removal is a bit too narrow. Sometimes we need to change a behavior, e.g. changing a true result to false as here, and the deprecation mechanism is also useful for this.

ararslan · 2016-05-09T18:17:57Z

@JeffBezanson That's good to know, thanks! So the eventual behavior here is that 0 == '\0' (for example) will be false?

StefanKarpinski · 2016-05-09T23:01:46Z

It's also possible that it will be an error. Either way, this is the intermediate step.

StefanKarpinski mentioned this pull request Apr 23, 2016

modify Base.peek to return a Char and export it #16025

Closed

yuyichao added the kind:breaking This change will break code label Apr 23, 2016

nalimilan reviewed Apr 23, 2016
View reviewed changes

nalimilan mentioned this pull request Apr 24, 2016

Feature request: Remove array type assertion for find to allow other iterables #16022

Closed

StefanKarpinski force-pushed the sk/char_neq_int branch from 434a745 to ed29d76 Compare April 25, 2016 15:00

StefanKarpinski changed the title ~~Deprecate Int-Char comparisons, e.g. 'x' == 120~~ RFC: Deprecate Int-Char comparisons, e.g. 'x' == 120 Apr 25, 2016

StefanKarpinski force-pushed the sk/char_neq_int branch from ed29d76 to 496ebd3 Compare April 25, 2016 16:14

StefanKarpinski force-pushed the sk/char_neq_int branch from f3abb9e to 3de874e Compare April 27, 2016 13:23

StefanKarpinski added this to the 0.5.0 milestone May 5, 2016

StefanKarpinski self-assigned this May 5, 2016

StefanKarpinski force-pushed the sk/char_neq_int branch from 3de874e to 013457b Compare May 5, 2016 18:03

StefanKarpinski added the needs decision A decision on this change is needed label May 5, 2016

tkelman reviewed May 5, 2016
View reviewed changes

StefanKarpinski force-pushed the sk/char_neq_int branch from 013457b to a248e1d Compare May 5, 2016 21:09

This was referenced May 6, 2016

RFC: Remove find type assertion to allow other iterables #16110

Merged

figure what to do about find("string") #16269

Closed

StefanKarpinski added 2 commits May 9, 2016 09:43

comment out test find("julia") == 1:5 pending #16269

23ea0bd

make !isequal('x', 120) and eprecate Int-Char comparisons, 'x' == 120

84349f2

See #15983 (comment)

StefanKarpinski force-pushed the sk/char_neq_int branch from a248e1d to 84349f2 Compare May 9, 2016 07:43

StefanKarpinski merged commit bf73102 into master May 9, 2016

StefanKarpinski deleted the sk/char_neq_int branch May 9, 2016 08:46

StefanKarpinski mentioned this pull request Jun 12, 2016

Manually hoist conversion of end-of-line character to UInt8 #16886

Merged

tkelman mentioned this pull request Jul 24, 2016

NEWS: some improvements and additions #17585

Merged

stevengj mentioned this pull request Jul 27, 2016

fix Julia 0.5 compatibility JuliaStrings/TinySegmenter.jl#19

Merged

ararslan mentioned this pull request Feb 5, 2018

isdigit(Char(::UInt8)) performance regression? #25883

Open

stevengj mentioned this pull request Mar 2, 2018

add AbstractChar supertype of Char #26286

Merged

RFC: Deprecate Int-Char comparisons, e.g. 'x' == 120 #16024

RFC: Deprecate Int-Char comparisons, e.g. 'x' == 120 #16024

Conversation

StefanKarpinski commented Apr 23, 2016

StefanKarpinski commented Apr 23, 2016

yuyichao commented Apr 23, 2016 • edited Loading

nalimilan Apr 23, 2016

Choose a reason for hiding this comment

quinnj Apr 23, 2016

Choose a reason for hiding this comment

StefanKarpinski Apr 26, 2016

Choose a reason for hiding this comment

stevengj commented Apr 23, 2016

Tetralux commented Apr 24, 2016

stevengj commented Apr 24, 2016

Tetralux commented Apr 24, 2016

nalimilan commented Apr 24, 2016

stevengj commented Apr 24, 2016

Tetralux commented Apr 25, 2016

nalimilan commented Apr 25, 2016

stevengj commented Apr 25, 2016 • edited Loading

StefanKarpinski commented Apr 25, 2016

stevengj commented Apr 25, 2016 • edited Loading

StefanKarpinski commented Apr 25, 2016

nalimilan commented Apr 25, 2016

StefanKarpinski commented Apr 25, 2016

stevengj commented Apr 25, 2016

StefanKarpinski commented Apr 25, 2016

JeffBezanson commented Apr 25, 2016

stevengj commented Apr 27, 2016 • edited Loading

stevengj commented Apr 27, 2016

vtjnash commented May 5, 2016

StefanKarpinski commented May 5, 2016

tkelman May 5, 2016

Choose a reason for hiding this comment

StefanKarpinski May 5, 2016

Choose a reason for hiding this comment

StefanKarpinski commented May 9, 2016

StefanKarpinski commented May 9, 2016

tkelman commented May 9, 2016

ararslan commented May 9, 2016

JeffBezanson commented May 9, 2016

ararslan commented May 9, 2016

StefanKarpinski commented May 9, 2016

yuyichao commented Apr 23, 2016 •

edited

Loading

stevengj commented Apr 25, 2016 •

edited

Loading

stevengj commented Apr 25, 2016 •

edited

Loading

stevengj commented Apr 27, 2016 •

edited

Loading