Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int64*Uint64 #9292

Closed
Keno opened this issue Dec 10, 2014 · 50 comments
Closed

Int64*Uint64 #9292

Keno opened this issue Dec 10, 2014 · 50 comments
Assignees
Labels
domain:maths Mathematical functions needs decision A decision on this change is needed
Milestone

Comments

@Keno
Copy link
Member

Keno commented Dec 10, 2014

This doesn't seem like the intended behavior. Are we accidentally trying to do the multiplication in Uint64?

julia> -1*uint64(1)
ERROR: InexactError()
 in * at promotion.jl:159
```?
@stevengj
Copy link
Member

promote_type(Int,UInt) == UInt.

I personally dislike this behavior, similar to the C behavior that unsigned*signed == unsigned (which is even worse — C unsigned is downright dangerous because there is no error if you multiply an unsigned integer by a negative one, it just wraps around to a positive result).

@StefanKarpinski
Copy link
Sponsor Member

@stevengj, what would you prefer? I'm pretty open to any kind of persuasive argument here and now is a good time to change this kind of thing.

@stevengj
Copy link
Member

I would prefer Signed * Unsigned = Signed. Argument: in ordinary math, negative * positive = negative. Deviating from this just to get an extra 1 bit of width in the case of positive values seems unpleasant to me.

(In the days of 16-bit integers, 1 bit was a big deal and maybe this was worth it. Nowadays, it seems a poor tradeoff.)

@stevengj
Copy link
Member

Google "mix unsigned signed C" for lots of examples of how following C's promotion rules leads to grief. As a result, a lot of people recommend that you just don't use unsigned types in C (google "don't use unsigned").

@Keno
Copy link
Member Author

Keno commented Dec 10, 2014

At the very least the current behavior is inconsistent:

julia> -1*uint32(1)
-1

@stevengj stevengj added the needs decision A decision on this change is needed label Dec 10, 2014
@ViralBShah
Copy link
Member

The argument presented earlier was that when you use unsigned integers, you have a good reason for doing so and you want them to persist.

I personally prefer Steve's suggestion here.

@Keno
Copy link
Member Author

Keno commented Dec 10, 2014

I think that argument is fine but it doesn't work well with the checked conversion from integers to unsigned integers.

@StefanKarpinski
Copy link
Sponsor Member

What I suggested elsewhere in passing was having Int * UInt = Int and checked for overflow. Thus, if you want to do unchecked arithmetic you'd have to explicitly cast one of the arguments, as in (a % UInt) * b or a * (b % Int). That's not too terribly inconvenient to write. The advantage I can see for choosing to return an Int rather than UInt is that it can be value-correct for smaller more common arguments, including when the signed argument is negative – at the cost of not being able to represent large positive products. I think that's a good tradeoff, personally.

@rfourquet
Copy link
Member

UInt64 are integers modulo 64 with positive representatives, so a positive Int64 can be directly interpreted as one of them, it's less clear for a negative Int64 (but I would prefer unchecked conversion if this is faster).
I like that promote_type(Int,UInt) == UInt but would want also promote_type(Int,UInt32) == UInt32.

@StefanKarpinski
Copy link
Sponsor Member

@stevengj, I googled "mix unsigned signed C" as you said and found a few relevant links:

This article http:https://critical.eschertech.com/2010/04/07/danger-unsigned-types-used-here/ phrases my argument quite nicely:

It is very easy to underflow the minimum value of an unsigned int (i.e. zero). It is much more difficult to underflow or overflow the minimum or maximum value of a signed int

So in general, I do agree that it is probably better to produced an Int for mixed UInt/Int arithmetic, but I'm not sure that most of the C horror stories are really relevant to the argument.

@StefanKarpinski
Copy link
Sponsor Member

I think that performance concerns are a bit of a false problem here – mixed type operations are not very common in high-performance code and if you really need that speed, you can just use a % Int to convert a UInt to Int with wrapping explicitly.

@StefanKarpinski
Copy link
Sponsor Member

Which would lead to a slightly philosophical question of why it's ok for Int/Int arithmetic to overflow but not Uint/Int arithmetic.

@toivoh
Copy link
Contributor

toivoh commented Dec 10, 2014

+1 for Int * UInt to produce an Int.

@stevengj
Copy link
Member

Where I have personally seen the current behavior as being undesirable is that often a Uint is used to represent the length of an array or a portion thereof, e.g. when an external C library returns a length as size_t. Suppose you want to access the array as A[offset + i * stride] where i = 0:length-1. i is a UInt if length is, so the code won't work if stride is negative. At least in Julia it will throw an exception. (In C, if you have 32-bit i and stride on a 64-bit machine, then the indices will wrap around to positive values and a segfault will ensue.)

(This came up in implementing FFTW, and is why we ended up using ptrdiff_t/intptr_t rather than size_t to represent transform sizes.)

@StefanKarpinski
Copy link
Sponsor Member

This seems to me to mostly be an argument for not using UInt to represent array lengths and offsets. As I've noted before, it has the unfortunate property that if you make a mistake somewhere and compute a negative length, there's no way to tell that you've erred. The very large value just has to be taken at face value and used, resulting in a segfault instead of an easy-to-check-for negative value. With 64-bit words, you really don't need all of those bits for array lengths or indices so giving up half of your range is not a big deal.

@nalimilan
Copy link
Member

Yet it's true, as in @stevengj's example, that external C libraries may use size_t for array lengths, in part for historical reasons.

@StefanKarpinski
Copy link
Sponsor Member

So far the main argument that seems compelling for mixed UInt/Int arithmetic producing Int is that it allows value-correctness for more common arguments – non-huge positive and negative values – whereas returning UInt makes value-correctness impossible for all negative arguments.

@StefanKarpinski
Copy link
Sponsor Member

That is, however, quite a good argument, imo.

@toivoh
Copy link
Contributor

toivoh commented Dec 10, 2014

Agreed. I feel there's an analogy with /(::Int, ::Int) producing a Float64 here - avoiding that functions produce unexpected results because you pad them an unexpected type of real. I guess the full analogy would include -(::UInt,::UInt) producing an Int.

@toivoh toivoh closed this as completed Dec 10, 2014
@toivoh toivoh reopened this Dec 10, 2014
@eschnett
Copy link
Contributor

When you are using uint, you often have bit masks etc. where any bit may be set. Explicit overflow checks don't make sense since these would slow things down (and prevent vectorization), and converting e.g. 0x8000000U to signed int doesn't make sense either.

When mixed-signedness operations always convert to signed, then bit mask operations look very cumbersome:

((mask - uint(1)) << uint(3)) | uint(3)

since every literal has to be converted to unsigned.

If you are worried about unsigned ints coming from C, and if you want to treat them like signed ints -- then you may as well convert them explicitly to signed ints.

@stevengj
Copy link
Member

stevengj commented Jan 6, 2015

@eschnett, I think it makes sense for bitwise operations (&, <<, |, $) to continue to promote to unsigned. To unset a bit I would normally recommend mask & ~0x1 rather than mask - 0x1, as the former does the correct thing even if the bit is already unset.

@eschnett
Copy link
Contributor

eschnett commented Jan 6, 2015

@stevengj I was thinking more of bitpos = uint(4); value &= (1<<bitpos)-1, i.e. the ``-1` would create a mask, not modify a value.

@eschnett
Copy link
Contributor

eschnett commented Jan 6, 2015

If a-b returns Int for UInt arguments, then (for symmetry) so should a+b. It would be quite surprising if a-b (for integers!) leads to a different result than a+(-b), in particular when people re-write expressions or introduce temporaries. Taking this argument to conclusion, a*b should also return Int.

No matter what convention, there's the danger of an overflow. What about simply disallowing all mixed, "dangerous" operations? Adding Int and UInt is probably rare, and people may as well have to explicitly specify what they expect to happen. If we add a special rule for literal constants (they can be used either with Int or UInt), then this should take all ambiguity and most of the inconvenience out of the game.

@StefanKarpinski
Copy link
Sponsor Member

No matter what convention, there's the danger of an overflow. What about simply disallowing all mixed, "dangerous" operations? Adding Int and UInt is probably rare, and people may as well have to explicitly specify what they expect to happen.

That's why I was suggesting that the results of all of these be Int but that they check for overflows. Although there is an argument to be made for disabling them altogether.

@eschnett
Copy link
Contributor

eschnett commented Jan 8, 2015

My proposal also included treating integer literals in a special way, probably similar to the way math constants are currently handled: UInt + Literal = UInt and Int + Literal = Int. Such a rule should cover most cases of Int + UInt, and the conversions from a literal to either Int or UInt can be checked at compile-time.

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@ararslan
Copy link
Member

ararslan commented Oct 5, 2016

I'm with @stevengj here. I think if anywhere in your calculation you have an idea of sign, that should be preserved.

@JuliaLang JuliaLang deleted a comment from nalimilan Jun 22, 2017
@stevengj
Copy link
Member

(deleted offtopic question from @udion)

I looked into implementing this, and I agree with @JeffBezanson's comment from #15489. Implementing this as a change to the promotion rules turns out to be a big mess, because it screws up bitwise manipulations (e.g. in float.jl) where you want e.g. unsigned << signed to be unsigned. It looks like it is best implemented as methods for *, + etc.

@rfourquet
Copy link
Member

if anywhere in your calculation you have an idea of sign, that should be preserved.

It's not totally true, as the simplest way to write a litteral is as an Int. And the same argument could be inversed: "if anywhere you have an idea of unsignedness, that should be preserved", my main use case being using unsigned numbers as bit fields, and incrementing an unsigned (x+1), or constructing more complex masks with multiplications, etc.

I see two main use cases:

  1. an unsigned appears in a computation, coming e.g. from a library (as a length for example), when an Int would be more appropriate: in this case it seems preferable to have UInt*Int -> Int (etc.).
  2. an unsigned is used on purpose in a computation: I would say mixed computations with signed integers should result in unsigned integers. If you have a class ModInt (cf. the examples directory), it wouldn't make sense to have Int*ModInt -> Int; and UInt precisely represents modular integers.

I would favor use case 2 (for use case 1, a conversion to Int can be explicitly written after a ccall to adapt to Julia style, like in "gmp.jl" for example, after all writing the ccall forces the user to know that an unsigned integer is being returned).

@StefanKarpinski
Copy link
Sponsor Member

Implementing this as a change to the promotion rules turns out to be a big mess, because it screws up bitwise manipulations (e.g. in float.jl) where you want e.g. unsigned << signed to be unsigned. It looks like it is best implemented as methods for *, + etc.

The result of a bitshift should have the same type as the left argument – where does promotion come into it? E.g.:

julia> 0x1 << 1
0x02

If there's somewhere we're not doing this, it should be fixed.

@stevengj
Copy link
Member

@StefanKarpinski, it turned out that the problem I was initially running into was not due to bitshift. But I still think we probably want UInt results for bitwise operators like |?

@StefanKarpinski
Copy link
Sponsor Member

This is the core "interesting" part of the promotion table:

Ts = sort!(Base.uniontypes(Base.BitInteger), by=T->(!(T<:Signed),sizeof(T)));

julia> for (i, T) in enumerate(Ts), (j, S) in enumerate(Ts)
           i < j || continue
           (T <: Signed)  (S <: Signed) || continue
           P = promote_type(T, S)
           @printf "%-7s + %-7s => %-7s\n" T S P
       end
Int8    + UInt8   => Int64
Int8    + UInt16  => Int64
Int8    + UInt32  => Int64
Int8    + UInt64  => UInt64
Int8    + UInt128 => UInt128
Int16   + UInt8   => Int64
Int16   + UInt16  => Int64
Int16   + UInt32  => Int64
Int16   + UInt64  => UInt64
Int16   + UInt128 => UInt128
Int32   + UInt8   => Int64
Int32   + UInt16  => Int64
Int32   + UInt32  => Int64
Int32   + UInt64  => UInt64
Int32   + UInt128 => UInt128
Int64   + UInt8   => Int64
Int64   + UInt16  => Int64
Int64   + UInt32  => Int64
Int64   + UInt64  => UInt64
Int64   + UInt128 => UInt128
Int128  + UInt8   => Int128
Int128  + UInt16  => Int128
Int128  + UInt32  => Int128
Int128  + UInt64  => Int128
Int128  + UInt128 => UInt128

@JeffBezanson, could you run this on your little 32-bit machine for the 32-bit version of the table?

@omus
Copy link
Member

omus commented Sep 20, 2017

@StefanKarpinski here's is what I get on 32-bit:

Int8    + UInt8   => Int32
Int8    + UInt16  => Int32
Int8    + UInt32  => UInt32
Int8    + UInt64  => UInt64
Int8    + UInt128 => UInt128
Int16   + UInt8   => Int32
Int16   + UInt16  => Int32
Int16   + UInt32  => UInt32
Int16   + UInt64  => UInt64
Int16   + UInt128 => UInt128
Int32   + UInt8   => Int32
Int32   + UInt16  => Int32
Int32   + UInt32  => UInt32
Int32   + UInt64  => UInt64
Int32   + UInt128 => UInt128
Int64   + UInt8   => Int64
Int64   + UInt16  => Int64
Int64   + UInt32  => Int64
Int64   + UInt64  => UInt64
Int64   + UInt128 => UInt128
Int128  + UInt8   => Int128
Int128  + UInt16  => Int128
Int128  + UInt32  => Int128
Int128  + UInt64  => Int128
Int128  + UInt128 => UInt128

@StefanKarpinski
Copy link
Sponsor Member

Here's the complete picture then:

type A type B 32-bit 64-bit
Int8 UInt8 Int32 Int64
Int8 UInt16 Int32 Int64
Int8 UInt32 UInt32 Int64
Int8 UInt64 UInt64 UInt64
Int8 UInt128 UInt128 UInt128
Int16 UInt8 Int32 Int64
Int16 UInt16 Int32 Int64
Int16 UInt32 UInt32 Int64
Int16 UInt64 UInt64 UInt64
Int16 UInt128 UInt128 UInt128
Int32 UInt8 Int32 Int64
Int32 UInt16 Int32 Int64
Int32 UInt32 UInt32 Int64
Int32 UInt64 UInt64 UInt64
Int32 UInt128 UInt128 UInt128
Int64 UInt8 Int64 Int64
Int64 UInt16 Int64 Int64
Int64 UInt32 Int64 Int64
Int64 UInt64 UInt64 UInt64
Int64 UInt128 UInt128 UInt128
Int128 UInt8 Int128 Int128
Int128 UInt16 Int128 Int128
Int128 UInt32 Int128 Int128
Int128 UInt64 Int128 Int128
Int128 UInt128 UInt128 UInt128

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Sep 21, 2017

One observation is that this is a place where the dependence on Int makes the behavior very hard to reason about across 32-bit and 64-bit systems. I would be appealing to make these rules independent of platform word size so that there's just a single table for this kind of thing. Certainly the cases where whether the result is signed or unsigned depends on the system word size seem pathological and we should at the very least avoid those.

@JeffBezanson
Copy link
Sponsor Member

Agreed that the cases where signedness depends on system word size are most ripe for revision.

@StefanKarpinski
Copy link
Sponsor Member

The only reasonable platform-independent rules that I can come up with are:

  1. Require explicit casting for mixed signedness operations.
  2. Promote to a signed type that is big enough to hold both inputs.
  3. Promote to a signed type that is as big as the larger of the two inputs.
  4. Promote to an unsigned type that is as big as the larger of the two inputs.

@StefanKarpinski
Copy link
Sponsor Member

One observation is that a common pattern to deal with mixed-signedness code is to put local type annotations on the variables and then let implicit conversion ensure that the types are right, so we should consider that pattern when choosing a behavior and ideally pick a design that allows the storage "locations" to be typed and from there let the compiler work things out and generate efficient, correct code.

@StefanKarpinski
Copy link
Sponsor Member

From triage, we've decided to give a try to option 4.

@StefanKarpinski
Copy link
Sponsor Member

Another option:

  1. Pick the type of the bigger argument, preferring the unsigned one if they're the same size.

@JeffBezanson JeffBezanson self-assigned this Sep 21, 2017
JeffBezanson added a commit that referenced this issue Sep 21, 2017
- for different-size arguments, the larger type wins
- otherwise the unsigned type wins
JeffBezanson added a commit that referenced this issue Sep 22, 2017
- for different-size arguments, the larger type wins
- otherwise the unsigned type wins
@JeffBezanson
Copy link
Sponsor Member

The remainder of this is part of #15489.

@StefanKarpinski
Copy link
Sponsor Member

See also #15489 and PR #23827 fixing it.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Oct 14, 2021

Since there was a complaint today that not enough detail about the motivation for this decision was recorded, I thought I'd write a bit about it. There are annoying cases no matter which rule you pick, so we picked a rule that was simple to understand and works most of the time. We had previously tried to design rules so that the result value was always possible to represent in the promoted type, but that led to very complex rules that were quite hard for people to remember or reason about. So we picked a rule that was simple to understand and works most of the time:

  • The result type is the larger of the two argument types
  • If both argument types are the same size and differ in signedness, then the unsigned type wins

The thinking for the last rule is that since Int is the default integer type, if you went to the trouble of introducing UInt presumably you want to continue with that. There are cases where that’s not what you want, but there are also cases where that is what you want.

robertfeldt added a commit to robertfeldt/julia that referenced this issue Mar 11, 2023
…ly in signedness. This choice was explained by Karpinski in JuliaLang#9292 (comment) but wasn't fully clear in the documentation.
vtjnash pushed a commit that referenced this issue Mar 28, 2023
Clarify that the unsigned type is promoted to, if the types differ only
in signedness. This choice was explained by Karpinski in
#9292 (comment)
but wasn't fully clear in the documentation.
Xnartharax pushed a commit to Xnartharax/julia that referenced this issue Apr 19, 2023
Clarify that the unsigned type is promoted to, if the types differ only
in signedness. This choice was explained by Karpinski in
JuliaLang#9292 (comment)
but wasn't fully clear in the documentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:maths Mathematical functions needs decision A decision on this change is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.