RFC: Preserve type on integer arithmetic #8028

tknopp · 2014-08-16T20:54:55Z

This is my try to make Int8 + Int8 = Int8. It would fix #3759 which now has the 0.4-projects tag indicating that there is hope that this gets accepted :-)

I had to fix some tests and the unicode test still fails with

exception on 2: ERROR: test failed: utf8(u16) == u8

StefanKarpinski · 2014-08-16T21:19:10Z

This would be a good time to "genericize" the code in int.jl. @vtjnash once has a pull request to do this, but there was some objection and it didn't get merged. It would also be good to spec our the precise rules before implementing.

JeffBezanson · 2014-08-16T21:23:40Z

The inevitable fight during bootstrap makes "boxing" an eerily appropriate term :)

tknopp · 2014-08-16T21:31:58Z

@StefanKarpinski: Do you mean by "genericize" to have for loops over the integer types generating the respective functions?

Regarding the spec: Well, +,-,* should keep the types stable. And I have also adapted the promotion rules.

StefanKarpinski · 2014-08-16T22:38:01Z

@StefanKarpinski: Do you mean by "genericize" to have for loops over the integer types generating the respective functions?

Yes, that's what I meant. Glad you understood my very vague description of what I meant :-)

quinnj · 2014-08-17T00:07:19Z

I think "genericize" also meant using type parameters. Last I looked, you could have collapsed a lot of the methods down to just one with a T<:Signed parameter.

tknopp · 2014-08-17T07:20:57Z

Ok I generalized some of the code, which was mostly hitting dd into vim. Some things are still missing.

I have not yet looked at the issue with the unicode functions. If some unicode expert here has a hint this would be great.

ivarne · 2014-08-17T11:27:38Z

I might have missed something, but do we really want to use

-{T<:Integer}(a::T) = [...]

instead of

for T in (Int8, Int16, Int32, Int64, Int128, [...])
    -(a::T) = [...]
end

It seems to me like these definitions rely on implementation details of the integers, rather than a generic Integer interface, and it is uncommon for these implementations to apply to user defined Integers. It seems to me that it will be better to get ERROR: - has no method matching -(::MyInt) than a ERROR: error compiling -: box: expected bits type as first argument when you try to negate immutable MyInt <: Integer.

tknopp · 2014-08-17T11:49:18Z

@ivarne: No you have not missed something. This is just me digging into a new area of base where I have no experience yet.

Would it be an alternative to define some union like the Integer64 one (i.e. Integer128, Signed128, Unsigned128) and change the implementation accordingly? Or is the for-loop encouraged?

tknopp · 2014-08-17T13:15:38Z

ok I have fixed the utf16 code and the entire test suite runs successfully now. I have further introduced the Integer128, Signed128, Unsigned128 union types.

tknopp · 2014-08-18T09:19:45Z

I have changed the title from WIP to RFC because the test suite passes and beside the utf16 change the transition was mostly straight forward.

Implementing the sum(Int, a) reduction can probably be done in a separate PR.

timholy · 2014-08-18T09:49:03Z

I wouldn't merge this, though, until the issues with sum and prod are fixed. Otherwise you're leaving people with broken systems and no way to fix them. Especially, we have to wait until julia 0.3 is officially out, and we can advertise more broadly that users who don't want breakage should stick with release-0.3.

Also: while I think this is great and I am truly excited about it, strategically I wonder if we should wait a bit before merging this. This could well be the most painful breaking change in 0.4. With the possible exception of @quinnj 's awesome Dates addition (and llvmcall, but that's more esoteric) and probably some other cool stuff I'm forgetting, IMO 0.4 doesn't yet contain enough goodies for most users to make it compelling to switch. At least for me personally, once the new GC gets merged and we get stagedfunctions, that will be more than enough to make it worth the effort.

tknopp · 2014-08-18T10:14:03Z

Tim, sure there is absolutely to haste in doing so and I am fine letting this laying around for some time. I did this for fun and because I have argumented at some places in the past for this change. So why just argumenting and not making a PR?

However, independent of this PR, lets please merge things more early in the 0.4 cylce. In the past 8 month master was basically on hold (with the exception of the REPL) because many people thought that 0.3 would be released soon. So I would say once 0.3 is out: Merge all the important things to master even if master is unstable for two month and breaks every package out there. Once we move from dev to pre phase, there should be an anouncement to port the packages.

It is of course great to keep master stable but effectively 0.3 was a little like a rolling release and people where encouraged to use 0.3pre instead of 0.2 because the later is outdated and some packages where not functional on 0.2.

tknopp · 2014-08-18T10:26:27Z

And it would indeed be a good idea to point to a stable branch in the "Source Download and Compilation" section of the README.md.

ivarne · 2014-08-18T10:33:50Z

I added this to 0.4-projects because consensus seems to be that this should be merged before 0.4-dev becomes 0.4-pre. I would second Tim's request that the potentially overflowing reduction functions (sum, prod, ...) gets a implementation that converts elements to a specified type to avoid overflow, before this is merged.

quinnj · 2014-08-18T10:36:51Z

Ignoring sum and prod, there was discussion at one point of being able to run @IainNZ's package evaluator on a given branch to test just how breaking a change would be. This seems like it could be a good candidate since a lot of the breaks would be quite subtle and not easily grepped.

ivarne · 2014-08-18T10:39:52Z

It is also a change that is likely to give silent errors that does not trigger test failures. If you don't explicitly test for overflow, you will often not get a test failure.

quinnj · 2014-08-18T10:41:11Z

Good point.

tknopp · 2014-08-18T10:47:40Z

@ivarne: Is my implementation with the Integer128 union ok now?

Regarding the reduction methods: This is about sum and prod right? Or can someone point me to a complete list of these functions than would have to be implemented?

ivarne · 2014-08-18T11:00:16Z

I'm not sure it is complete but reading base/reduce.jl suggests:

sum
prod
sumabs
sumabs2
sum_kbn? (can you sum with a BigFloat accumulator?)

tknopp · 2014-08-18T12:06:08Z

Thanks! sum_kbn is not defined for integers right now, so this seems independent.

Regarding the "danger" of this PR I don't expect major breakage. We currently have

Int8[255, 255] + Int8[1, 1] == Int8[0, 0]

and this PR is mostly about consistency. And if I am wrong and everything breaks and nobody likes the change it is better to determine this early rather than later.

timholy · 2014-08-18T14:02:12Z

No, it's going to be hugely breaking. I'm not arguing we shouldn't do it, I'm just saying we need to be realistic.

The real problem is that (1) people probably haven't written tests to catch this, and (2) tests mostly look like this:

A = [1 2; 3 4]
@test sum(A) == 10

Even if you had written that test with A = uint8([1 2; 3 4]), it would pass. But now load a real image from a .png and call sum on it: you're going to get junk. So this will affect a lot of real-world usage but hardly trigger any test failures.

timholy · 2014-08-18T14:21:19Z

@tknopp, you raise some good points. As long as we have new versions of the reduction functions in place, I won't object to merging this.

StefanKarpinski · 2014-08-18T14:22:44Z

Reductions should default to using Int or larger for accumulation. Reducing mod 2^8 is not a commonly useful operation. The type stability arguments for + don't really apply to sum.

StefanKarpinski · 2014-08-18T14:32:39Z

Let me expand on that reasoning a bit: the reason you want Array{Uint8} + Array{Uint8} to produce Array{Uint8} is because otherwise if you have large arrays, you may blow out your memory when the result array is 4x larger than its combined inputs. In the case of reduction, this is much less likely to happen. If you reduce a::Array{Uint8} even along only one dimension, unless that dimension is small – size(a,d) < 8 – you will still produce a result that is smaller than a. When adding two Uint8 values, it's reasonable to think that the result may still fit into a Uint8 (although it may well not); when accumulating over any non-trivial number of Uint8 values, it's unlikely that the result will fit in a Uint8.

timholy · 2014-08-18T14:48:02Z

@StefanKarpinski that's how I felt too, but then I started changing my thinking in #8028 (comment). Naturally I still find your arguments pretty compelling. What do you think?

StefanKarpinski · 2014-08-18T14:57:48Z

Why not just implement reductions with an arbitrary accumulator type and then default the accumulator type to Int or larger based on the array element type? I'm not clear on why the complexity of the internals of sum, et al. is relevant to what the default accumulator type.

tknopp · 2014-08-18T15:05:02Z

Please note that I am also unsure what the right thing is for the reductions. When I prepared the PR the aim was to remove all special casing to reach consistency. And type stability is indeed the main motivation behind all this.

I am also fine to question the hole idea again. Now that we have this branch, people can try it out and look if they are fine with the idea.

timholy · 2014-08-18T15:12:00Z

No, we should definitely do this.

@StefanKarpinski, the reason I made those points is that there seemed to be some thought that it would be OK to merge these changes before fixing the reduction functions.

tknopp · 2014-08-18T15:16:29Z

I am fine reverting the reduction changes for the moment. It was even not intentional. I had remove SmallInteger from int.jl and had to cope with that in reducedim.jl

vtjnash · 2014-08-21T03:23:43Z

base/int.jl

+convert{T<:Signed64}(::Type{T}, x::Float32) = box(T,checked_fptosi(T,unbox(Float32,x)))
+convert{T<:Signed64}(::Type{T}, x::Float64) = box(T,checked_fptosi(T,unbox(Float64,x)))
+convert{T<:Unsigned64}(::Type{T}, x::Float32) = box(T,checked_fptoui(T,unbox(Float32,x)))
+convert{T<:Unsigned64}(::Type{T}, x::Float64) = box(T,checked_fptoui(T,unbox(Float64,x)))


these have an oddity (shared with many BLAS methods) that they don't specify quite what you intended, which could lead to incorrect answers / incorrect dispatch, since the type signature Type{TypeVar(:T, Signed64)} also matches any union-of-types Type. For example:

julia> Type{Union(Int8,Int16)} <: Type{TypeVar(:T,Base.SmallSigned)} true

Instead, what you really want is:

typealias Signed64Types Union([Type{T} for T in Signed64.types]...) convert{T<:Signed64Types}(::T, x::Float32) = ...

(this change also needs to be made to many of the BLAS/linalg functions in base – replacing Type with Array – to avoid dispatching to the wrong method)

@vtjnash: Ok, I don't understand this. But this maybe because I do not understand the difference between Int32 and Type{Int32}. Is the difference described in the manual?

@vtjnash I don't think that way of specifying type in convert is specific to BLAS. grep -r "::Type{T}" base gives quite a bit output so I'd say it has been pretty standard.

Is the problem you mention something you have seen in practice? The old, but wrong, way is a bit nicer to the eyes.

It is frequently correct. However, sometimes, such as when interacting with ccall or other IntrinsicFunctions, it is probably not. For example, the following method match was probably not expected by the author:

julia> @which convert(Union(Int32, Int64), float16(2)) convert{T<:Integer}(::Type{T<:Integer},x::Float16) at float16.jl:103

And the author almost certainly was not expecting this:

a = Array(Union(Float32,Float64),1) @which copy!(a,1:1,a,1:1) copy!{T<:Union(Float32,Complex{Float64},Float64,Complex{Float32}),Ti<:Integer}(dest::Array{T<:Union(Float32,Complex{Float64},Float64,Complex{Float32}),N},rdest::Union(UnitRange{Ti<:Integer},Range{Ti<:Integer}),src::Array{T<:Union(Float32,Complex{Float64},Float64,Complex{Float32}),N},rsrc::Union(UnitRange{Ti<:Integer},Range{Ti<:Integer})) at linalg/blas.jl:828

generally, these eventually get caught somewhere else before turning into a ccall parameter.

@tknopp x::Int32 asserts that x is of type 32-bit integer. x::Type{Int32} asserts that x is Int32, the type of a thing that is a 32-bit integer.

tknopp · 2014-08-21T11:00:12Z

@quinnj Could you elaborate on your comment in #7291 (comment) if this PR is a step in the right direction? According to @ivarne your tests pass on this branch.

quinnj · 2014-08-21T11:52:04Z

Sure, it basically came down to

promote_type(Uint8,Uint16) == Uint64

on 64-bit while on 32-bit

promote_type(Uint8,Uint16) == Uint32

While on your branch both promote to Uint16. Having a consistent return type just makes it easier to not get tripped up by WORD_SIZE differences in this case.

Additionally, these tests were kind of particular because I specifically needed to compare the promoting operations to Uint64, so 32-bit was failing on upcasting the values on comparison.

On another note, It seems like your branch hasn't incorporated the Dates module yet. It'd be great if you could rebase it in to make sure everything's happy over there, though I would imagine it'll be ok since everything gets pushed to Int64 pretty quick.

tknopp · 2014-08-21T13:41:14Z

Thanks @quinnj. I had observed a similar issue when looking at the factorial code (see #6579 (comment)). When one has a pattern like

function my_func(A)
  x = zero(A[1])
  for a in A
      x += a
  end
  x
end

the function would become unstable if an Array{Int16} is passed. One would thus have to write

function my_func(A)
  x = zero(A[1])+zero(A[1])
  for a in A
      x += a
  end
  x
end

vtjnash · 2014-08-21T14:11:34Z

base/int.jl

+convert{T<:SignedUpto64}(::Type{T}, x::Float32) = box(T,checked_fptosi(T,unbox(Float32,x)))
+convert{T<:SignedUpto64}(::Type{T}, x::Float64) = box(T,checked_fptosi(T,unbox(Float64,x)))
+convert{T<:UnsignedUpto64}(::Type{T}, x::Float32) = box(T,checked_fptoui(T,unbox(Float32,x)))
+convert{T<:UnsignedUpto64}(::Type{T}, x::Float64) = box(T,checked_fptoui(T,unbox(Float64,x)))


cross-reference: #8028 (comment)

@JeffBezanson @StefanKarpinski any thoughts on this observation?

I have partly understood the issue but I think it is worth opening a new issue dedicated to this problem, especially if this is used more often in base.

From this dedicated issue it would be great if we could get a recommendation how to implement this. Using the for loop with eval or using what Jameson proposed in #8028 (comment)

The motivation to change these lines were actually to remove the eval because I have read several messages by @vtjnash that eval should be rarely used.

tknopp · 2014-08-21T18:45:34Z

@quinnj I have rebased and with Dates included it still passes the test.

JeffBezanson · 2014-09-20T21:42:12Z

superseded by #8420

pao mentioned this pull request Aug 18, 2014

Unary + and * promote numbers #7338

Closed

tknopp changed the title ~~WIP: Preserve type on integer arithmetic~~ RFC: Preserve type on integer arithmetic Aug 18, 2014

ivarne added this to the 0.4-projects milestone Aug 18, 2014

ivarne added the breaking label Aug 18, 2014

ivarne mentioned this pull request Aug 19, 2014

Grisu #7291

Merged

stevengj mentioned this pull request Aug 20, 2014

Pure Julia implementation of SobolSeq JuliaMath/Sobol.jl#3

Merged

vtjnash reviewed Aug 21, 2014
View reviewed changes

tknopp added 5 commits August 21, 2014 19:51

Preserve type on integer arithmetic

8d4949a

Implement integer functions in a generic way

b054afd

fix utf16 code and introduce Integer128 type

9ed2efa

Rename union types to use ...Upto... for clarity

4f24e82

Fix missing renames

2921e30

tknopp force-pushed the intpromotion branch from 3382f31 to 2921e30 Compare August 21, 2014 18:21

timholy mentioned this pull request Sep 20, 2014

WIP: checked integer conversions #8420

Merged

JeffBezanson closed this Sep 20, 2014

RFC: Preserve type on integer arithmetic #8028

RFC: Preserve type on integer arithmetic #8028

Conversation

tknopp commented Aug 16, 2014

StefanKarpinski commented Aug 16, 2014

JeffBezanson commented Aug 16, 2014

tknopp commented Aug 16, 2014

StefanKarpinski commented Aug 16, 2014

quinnj commented Aug 17, 2014

tknopp commented Aug 17, 2014

ivarne commented Aug 17, 2014

tknopp commented Aug 17, 2014

tknopp commented Aug 17, 2014

tknopp commented Aug 18, 2014

timholy commented Aug 18, 2014

tknopp commented Aug 18, 2014

tknopp commented Aug 18, 2014

ivarne commented Aug 18, 2014

quinnj commented Aug 18, 2014

ivarne commented Aug 18, 2014

quinnj commented Aug 18, 2014

tknopp commented Aug 18, 2014

ivarne commented Aug 18, 2014

tknopp commented Aug 18, 2014

timholy commented Aug 18, 2014

timholy commented Aug 18, 2014

StefanKarpinski commented Aug 18, 2014

StefanKarpinski commented Aug 18, 2014

timholy commented Aug 18, 2014

StefanKarpinski commented Aug 18, 2014

tknopp commented Aug 18, 2014

timholy commented Aug 18, 2014

tknopp commented Aug 18, 2014

vtjnash Aug 21, 2014

Choose a reason for hiding this comment

tknopp Aug 21, 2014

Choose a reason for hiding this comment

andreasnoack Aug 21, 2014

Choose a reason for hiding this comment

vtjnash Aug 21, 2014

Choose a reason for hiding this comment

tknopp commented Aug 21, 2014

quinnj commented Aug 21, 2014

tknopp commented Aug 21, 2014

vtjnash Aug 21, 2014

Choose a reason for hiding this comment

tknopp Aug 21, 2014

Choose a reason for hiding this comment

tknopp commented Aug 21, 2014

JeffBezanson commented Sep 20, 2014