Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Preserve type on integer arithmetic #8028

Closed
wants to merge 5 commits into from

Conversation

tknopp
Copy link
Contributor

@tknopp tknopp commented Aug 16, 2014

This is my try to make Int8 + Int8 = Int8. It would fix #3759 which now has the 0.4-projects tag indicating that there is hope that this gets accepted :-)

I had to fix some tests and the unicode test still fails with

exception on 2: ERROR: test failed: utf8(u16) == u8

@StefanKarpinski
Copy link
Sponsor Member

This would be a good time to "genericize" the code in int.jl. @vtjnash once has a pull request to do this, but there was some objection and it didn't get merged. It would also be good to spec our the precise rules before implementing.

@JeffBezanson
Copy link
Sponsor Member

The inevitable fight during bootstrap makes "boxing" an eerily appropriate term :)

@tknopp
Copy link
Contributor Author

tknopp commented Aug 16, 2014

@StefanKarpinski: Do you mean by "genericize" to have for loops over the integer types generating the respective functions?

Regarding the spec: Well, +,-,* should keep the types stable. And I have also adapted the promotion rules.

@StefanKarpinski
Copy link
Sponsor Member

@StefanKarpinski: Do you mean by "genericize" to have for loops over the integer types generating the respective functions?

Yes, that's what I meant. Glad you understood my very vague description of what I meant :-)

@quinnj
Copy link
Member

quinnj commented Aug 17, 2014

I think "genericize" also meant using type parameters. Last I looked, you could have collapsed a lot of the methods down to just one with a T<:Signed parameter.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 17, 2014

Ok I generalized some of the code, which was mostly hitting dd into vim. Some things are still missing.

I have not yet looked at the issue with the unicode functions. If some unicode expert here has a hint this would be great.

@ivarne
Copy link
Sponsor Member

ivarne commented Aug 17, 2014

I might have missed something, but do we really want to use

-{T<:Integer}(a::T) = [...]

instead of

for T in (Int8, Int16, Int32, Int64, Int128, [...])
    -(a::T) = [...]
end

It seems to me like these definitions rely on implementation details of the integers, rather than a generic Integer interface, and it is uncommon for these implementations to apply to user defined Integers. It seems to me that it will be better to get ERROR: - has no method matching -(::MyInt) than a ERROR: error compiling -: box: expected bits type as first argument when you try to negate immutable MyInt <: Integer.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 17, 2014

@ivarne: No you have not missed something. This is just me digging into a new area of base where I have no experience yet.

Would it be an alternative to define some union like the Integer64 one (i.e. Integer128, Signed128, Unsigned128) and change the implementation accordingly? Or is the for-loop encouraged?

@tknopp
Copy link
Contributor Author

tknopp commented Aug 17, 2014

ok I have fixed the utf16 code and the entire test suite runs successfully now. I have further introduced the Integer128, Signed128, Unsigned128 union types.

@tknopp tknopp changed the title WIP: Preserve type on integer arithmetic RFC: Preserve type on integer arithmetic Aug 18, 2014
@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

I have changed the title from WIP to RFC because the test suite passes and beside the utf16 change the transition was mostly straight forward.

Implementing the sum(Int, a) reduction can probably be done in a separate PR.

@timholy
Copy link
Sponsor Member

timholy commented Aug 18, 2014

I wouldn't merge this, though, until the issues with sum and prod are fixed. Otherwise you're leaving people with broken systems and no way to fix them. Especially, we have to wait until julia 0.3 is officially out, and we can advertise more broadly that users who don't want breakage should stick with release-0.3.

Also: while I think this is great and I am truly excited about it, strategically I wonder if we should wait a bit before merging this. This could well be the most painful breaking change in 0.4. With the possible exception of @quinnj 's awesome Dates addition (and llvmcall, but that's more esoteric) and probably some other cool stuff I'm forgetting, IMO 0.4 doesn't yet contain enough goodies for most users to make it compelling to switch. At least for me personally, once the new GC gets merged and we get stagedfunctions, that will be more than enough to make it worth the effort.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

Tim, sure there is absolutely to haste in doing so and I am fine letting this laying around for some time. I did this for fun and because I have argumented at some places in the past for this change. So why just argumenting and not making a PR?

However, independent of this PR, lets please merge things more early in the 0.4 cylce. In the past 8 month master was basically on hold (with the exception of the REPL) because many people thought that 0.3 would be released soon. So I would say once 0.3 is out: Merge all the important things to master even if master is unstable for two month and breaks every package out there. Once we move from dev to pre phase, there should be an anouncement to port the packages.

It is of course great to keep master stable but effectively 0.3 was a little like a rolling release and people where encouraged to use 0.3pre instead of 0.2 because the later is outdated and some packages where not functional on 0.2.

@ivarne ivarne added this to the 0.4-projects milestone Aug 18, 2014
@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

And it would indeed be a good idea to point to a stable branch in the "Source Download and Compilation" section of the README.md.

@ivarne
Copy link
Sponsor Member

ivarne commented Aug 18, 2014

I added this to 0.4-projects because consensus seems to be that this should be merged before 0.4-dev becomes 0.4-pre. I would second Tim's request that the potentially overflowing reduction functions (sum, prod, ...) gets a implementation that converts elements to a specified type to avoid overflow, before this is merged.

@quinnj
Copy link
Member

quinnj commented Aug 18, 2014

Ignoring sum and prod, there was discussion at one point of being able to run @IainNZ's package evaluator on a given branch to test just how breaking a change would be. This seems like it could be a good candidate since a lot of the breaks would be quite subtle and not easily grepped.

@ivarne
Copy link
Sponsor Member

ivarne commented Aug 18, 2014

It is also a change that is likely to give silent errors that does not trigger test failures. If you don't explicitly test for overflow, you will often not get a test failure.

@quinnj
Copy link
Member

quinnj commented Aug 18, 2014

Good point.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

@ivarne: Is my implementation with the Integer128 union ok now?

Regarding the reduction methods: This is about sum and prod right? Or can someone point me to a complete list of these functions than would have to be implemented?

@ivarne
Copy link
Sponsor Member

ivarne commented Aug 18, 2014

I'm not sure it is complete but reading base/reduce.jl suggests:

  • sum
  • prod
  • sumabs
  • sumabs2
  • sum_kbn? (can you sum with a BigFloat accumulator?)

@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

Thanks! sum_kbn is not defined for integers right now, so this seems independent.

Regarding the "danger" of this PR I don't expect major breakage. We currently have

Int8[255, 255] + Int8[1, 1] == Int8[0, 0]

and this PR is mostly about consistency. And if I am wrong and everything breaks and nobody likes the change it is better to determine this early rather than later.

@timholy
Copy link
Sponsor Member

timholy commented Aug 18, 2014

No, it's going to be hugely breaking. I'm not arguing we shouldn't do it, I'm just saying we need to be realistic.

The real problem is that (1) people probably haven't written tests to catch this, and (2) tests mostly look like this:

A = [1 2; 3 4]
@test sum(A) == 10

Even if you had written that test with A = uint8([1 2; 3 4]), it would pass. But now load a real image from a .png and call sum on it: you're going to get junk. So this will affect a lot of real-world usage but hardly trigger any test failures.

@timholy
Copy link
Sponsor Member

timholy commented Aug 18, 2014

@tknopp, you raise some good points. As long as we have new versions of the reduction functions in place, I won't object to merging this.

@StefanKarpinski
Copy link
Sponsor Member

Reductions should default to using Int or larger for accumulation. Reducing mod 2^8 is not a commonly useful operation. The type stability arguments for + don't really apply to sum.

@StefanKarpinski
Copy link
Sponsor Member

Let me expand on that reasoning a bit: the reason you want Array{Uint8} + Array{Uint8} to produce Array{Uint8} is because otherwise if you have large arrays, you may blow out your memory when the result array is 4x larger than its combined inputs. In the case of reduction, this is much less likely to happen. If you reduce a::Array{Uint8} even along only one dimension, unless that dimension is small – size(a,d) < 8 – you will still produce a result that is smaller than a. When adding two Uint8 values, it's reasonable to think that the result may still fit into a Uint8 (although it may well not); when accumulating over any non-trivial number of Uint8 values, it's unlikely that the result will fit in a Uint8.

@timholy
Copy link
Sponsor Member

timholy commented Aug 18, 2014

@StefanKarpinski that's how I felt too, but then I started changing my thinking in #8028 (comment). Naturally I still find your arguments pretty compelling. What do you think?

@StefanKarpinski
Copy link
Sponsor Member

Why not just implement reductions with an arbitrary accumulator type and then default the accumulator type to Int or larger based on the array element type? I'm not clear on why the complexity of the internals of sum, et al. is relevant to what the default accumulator type.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

Please note that I am also unsure what the right thing is for the reductions. When I prepared the PR the aim was to remove all special casing to reach consistency. And type stability is indeed the main motivation behind all this.

I am also fine to question the hole idea again. Now that we have this branch, people can try it out and look if they are fine with the idea.

@timholy
Copy link
Sponsor Member

timholy commented Aug 18, 2014

No, we should definitely do this.

@StefanKarpinski, the reason I made those points is that there seemed to be some thought that it would be OK to merge these changes before fixing the reduction functions.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 18, 2014

I am fine reverting the reduction changes for the moment. It was even not intentional. I had remove SmallInteger from int.jl and had to cope with that in reducedim.jl

convert{T<:Signed64}(::Type{T}, x::Float32) = box(T,checked_fptosi(T,unbox(Float32,x)))
convert{T<:Signed64}(::Type{T}, x::Float64) = box(T,checked_fptosi(T,unbox(Float64,x)))
convert{T<:Unsigned64}(::Type{T}, x::Float32) = box(T,checked_fptoui(T,unbox(Float32,x)))
convert{T<:Unsigned64}(::Type{T}, x::Float64) = box(T,checked_fptoui(T,unbox(Float64,x)))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these have an oddity (shared with many BLAS methods) that they don't specify quite what you intended, which could lead to incorrect answers / incorrect dispatch, since the type signature Type{TypeVar(:T, Signed64)} also matches any union-of-types Type. For example:

julia> Type{Union(Int8,Int16)} <: Type{TypeVar(:T,Base.SmallSigned)}
true

Instead, what you really want is:

typealias Signed64Types Union([Type{T} for T in Signed64.types]...)
convert{T<:Signed64Types}(::T, x::Float32) = ...

(this change also needs to be made to many of the BLAS/linalg functions in base – replacing Type with Array – to avoid dispatching to the wrong method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash: Ok, I don't understand this. But this maybe because I do not understand the difference between Int32 and Type{Int32}. Is the difference described in the manual?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash I don't think that way of specifying type in convert is specific to BLAS. grep -r "::Type{T}" base gives quite a bit output so I'd say it has been pretty standard.

Is the problem you mention something you have seen in practice? The old, but wrong, way is a bit nicer to the eyes.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is frequently correct. However, sometimes, such as when interacting with ccall or other IntrinsicFunctions, it is probably not. For example, the following method match was probably not expected by the author:

julia> @which convert(Union(Int32, Int64), float16(2))
convert{T<:Integer}(::Type{T<:Integer},x::Float16) at float16.jl:103

And the author almost certainly was not expecting this:

a = Array(Union(Float32,Float64),1)
@which copy!(a,1:1,a,1:1)
copy!{T<:Union(Float32,Complex{Float64},Float64,Complex{Float32}),Ti<:Integer}(dest::Array{T<:Union(Float32,Complex{Float64},Float64,Complex{Float32}),N},rdest::Union(UnitRange{Ti<:Integer},Range{Ti<:Integer}),src::Array{T<:Union(Float32,Complex{Float64},Float64,Complex{Float32}),N},rsrc::Union(UnitRange{Ti<:Integer},Range{Ti<:Integer})) at linalg/blas.jl:828

generally, these eventually get caught somewhere else before turning into a ccall parameter.

@tknopp x::Int32 asserts that x is of type 32-bit integer. x::Type{Int32} asserts that x is Int32, the type of a thing that is a 32-bit integer.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 21, 2014

@quinnj Could you elaborate on your comment in #7291 (comment) if this PR is a step in the right direction? According to @ivarne your tests pass on this branch.

@quinnj
Copy link
Member

quinnj commented Aug 21, 2014

Sure, it basically came down to

promote_type(Uint8,Uint16) == Uint64

on 64-bit while on 32-bit

promote_type(Uint8,Uint16) == Uint32

While on your branch both promote to Uint16. Having a consistent return type just makes it easier to not get tripped up by WORD_SIZE differences in this case.

Additionally, these tests were kind of particular because I specifically needed to compare the promoting operations to Uint64, so 32-bit was failing on upcasting the values on comparison.

On another note, It seems like your branch hasn't incorporated the Dates module yet. It'd be great if you could rebase it in to make sure everything's happy over there, though I would imagine it'll be ok since everything gets pushed to Int64 pretty quick.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 21, 2014

Thanks @quinnj. I had observed a similar issue when looking at the factorial code (see #6579 (comment)). When one has a pattern like

function my_func(A)
  x = zero(A[1])
  for a in A
      x += a
  end
  x
end

the function would become unstable if an Array{Int16} is passed. One would thus have to write

function my_func(A)
  x = zero(A[1])+zero(A[1])
  for a in A
      x += a
  end
  x
end

convert{T<:SignedUpto64}(::Type{T}, x::Float32) = box(T,checked_fptosi(T,unbox(Float32,x)))
convert{T<:SignedUpto64}(::Type{T}, x::Float64) = box(T,checked_fptosi(T,unbox(Float64,x)))
convert{T<:UnsignedUpto64}(::Type{T}, x::Float32) = box(T,checked_fptoui(T,unbox(Float32,x)))
convert{T<:UnsignedUpto64}(::Type{T}, x::Float64) = box(T,checked_fptoui(T,unbox(Float64,x)))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cross-reference: #8028 (comment)

@JeffBezanson @StefanKarpinski any thoughts on this observation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have partly understood the issue but I think it is worth opening a new issue dedicated to this problem, especially if this is used more often in base.

From this dedicated issue it would be great if we could get a recommendation how to implement this. Using the for loop with eval or using what Jameson proposed in #8028 (comment)

The motivation to change these lines were actually to remove the eval because I have read several messages by @vtjnash that eval should be rarely used.

@tknopp
Copy link
Contributor Author

tknopp commented Aug 21, 2014

@quinnj I have rebased and with Dates included it still passes the test.

@JeffBezanson
Copy link
Sponsor Member

superseded by #8420

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:breaking This change will break code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Type changes in scalar computation
8 participants