Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ranges terribly sensitive to floating point #2333

Closed
alanedelman opened this issue Feb 17, 2013 · 64 comments
Closed

ranges terribly sensitive to floating point #2333

alanedelman opened this issue Feb 17, 2013 · 64 comments
Milestone

Comments

@alanedelman
Copy link
Contributor

This one is kind of funny

julia> v=.1:.1:.3
0.1:0.1:0.2
@JeffBezanson
Copy link
Sponsor Member

We are using the formula floor((stop-start)/step)+1 to determine the length of the range. Is there a better one we should use? (in base/range.jl)

@JeffBezanson
Copy link
Sponsor Member

Ah yes:

julia> .1 + 2*.1
0.30000000000000004

So our current formula is consistent with how we compute the elements. I suppose we'd have to change both. I don't suppose it would be ok to adjust the endpoint to 0.30000000000000004.

@kmsquire
Copy link
Member

More specifically, the issue is here:

julia> a = (.3-.1)/.1
1.9999999999999998

Adding eps(x) works, but I'm not sure it's justified:

julia> eps(a)
2.220446049250313e-16

julia> a+eps(a)
2.0

@JeffBezanson
Copy link
Sponsor Member

In this particular case, it turns out you can get the exact values you want (0.1, 0.2, 0.3) by adjusting the step size to 0.09999999999999999. I don't know how this generalizes, but it seems promising.

@toivoh
Copy link
Contributor

toivoh commented Feb 17, 2013

I'm not sure that you can reasonably expect to avoid this kind of off-by-one errors for floating point ranges. The definition is intrinsically sensitive to roundoff errors.

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

We are using the formula floor((stop-start)/step)+1 to determine the length of the range. Is there a better one we should use? (in base/range.jl)

How about floor((stop-start)/step + sqrt(eps(T)))+1?

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

...or maybe you want to avoid the sqrt for performance reasons, in which case a better choice might be

nf = (stop-start)/step + 1
nf += abs(nf)*eps(T)
n = ifloor(nf)

@alanedelman
Copy link
Contributor Author

I'll bet most of you didn't know you could almost write a book on colon alone. :-)

There are two quesions for start:delta:stop

  1. How to compute the length and the truestop
  2. How to compute the numbers in between

Regarding 2 a few comments

Strategy a) [start+i_delta for i=0:n] <--- terrible strategy
Strategy b) Go halfway forwards from start and halfway backwards from stop
(middle may have an odd length -- and odd/even considerations)
Maybe MATLAB does this
Strategy c) Figure out how to compute the closest floating point number to
start + (i/n)_(stop-start) for all numerical ranges
so start and stop are in the sequence without any rounding
[I prefer c]

Regarding 1

You'll have to decide whether to preserve the user's stop without roundng,
or to modify it drastically == depending on whether stop "snaps" to the grid
well enough.
I'll probably regret this as not being good enough but as something quick and dirty
I'd compute

nf = (stop-start)/step + 1

and

n=round(nf)

if abs(n-nf) < eps(n) (* 3 or 4 or 5)

I would declare stop as official and not to be modified. Otherwise, I would
use start + (n-1)*delta as the official stop

@StefanKarpinski
Copy link
Sponsor Member

I think that dismissing [start+i*delta for i=0:n] as a terrible strategy is a bit undue. It has the enormous advantage of being simple, easy to understand, and fast.

@JeffBezanson
Copy link
Sponsor Member

Excellent, thank you.
In our representation the degrees of freedom are the step and the length, so I'm trying this logic:

    nf = (stop-start)/step + 1
    n = round(nf)
    if abs(n-nf) < eps(n)*3
        step = (stop-start)/(n-1)
    end

@stevengj
Copy link
Member

In the case where you modify step, you also might want to change the rule to i < (n-1)-i ? start+i*step : end - ((n-1)-i)*step, according to Alan's strategy 2(b). The main purpose of this seems to be to ensure that the final point is exactly end. A slightly simpler rule for that purpose would be i == n-1 ? end : start + i*step.

Also, shouldn't the rule use n = floor(nf), so that you never step beyond end (plus roundoff)? e.g. length(1:1:1.8) should be 1, not 2.

JeffBezanson added a commit that referenced this issue Feb 17, 2013
@JeffBezanson
Copy link
Sponsor Member

Ok, I pushed a new implementation that is probably no worse than the old one. I think the best thing to do now is find more test cases, and if the behavior for any of them seems bad enough I can try further improvements.

@stevengj
Copy link
Member

Should be something like:

if abs(n-nf) < eps(n)*3
    step = (stop-start)/(n-1)
else
    n = floor(nf)
end

I think. Your current patch gives

julia> [1:1:1.8]
2-element Float64 Array:
 1.0
 2.0

@stevengj
Copy link
Member

Current patch still doesn't necessarily preserve the endpoint specified by the user:

julia> 1:1/49:27
1.0:0.02040816326530612:26.999999999999996

Note that there is no step::Float64 such that 1 + step*(length(1:1/49:27)-1) == 27, so the algorithm start + step*i is fundamentally flawed if one wants to honor the user's endpoints.

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

I don't think it should be round, because you can get beyond the end point in
a non-roundoff way very easily.

On Sunday, February 17, 2013 12:35:42 PM Jeff Bezanson wrote:

Excellent, thank you.
In our representation the degrees of freedom are the step and the length, so
I'm trying this logic:

    nf = (stop-start)/step + 1
    n = round(nf)
    if abs(n-nf) < eps(n)*3
        step = (stop-start)/(n-1)
    end

Reply to this email directly or view it on GitHub:
#2333 (comment)

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

Oh shoot, I should have checked for updates, @stevengj made the same point already.

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

I do think all this if...else logic can be skipped with the version I proposed, which (slightly improved) is just two lines:

nf = (stop-start)/step + 1
n = ifloor(nf + 3*eps(nf))

@JeffBezanson
Copy link
Sponsor Member

The step adjustment does help in cases where a good step value exists, though. When no good step value exists we're no worse than before.

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

The end point is going to be start + (n-1)*step, where n is an integer, no matter what. Since the only fiddling of n is by +/-1, this is not going to help with issues at the level of machine precision.

I'm pretty sure the only role for the adjustment is to save you from mistakes due to the usage of round (e.g., @stevengj 's 1:1:1.8 example). But if you don't use round in the first place, there is nothing you need to correct.

@JeffBezanson
Copy link
Sponsor Member

If start=.1;step=.1;stop=.3, and we determine that n=3 but don't adjust the step, then start + 2*step is 0.30000000000000004.

@timholy
Copy link
Sponsor Member

timholy commented Feb 17, 2013

You're talking about 1 eps() away, and in general you can't hit it precisely on that level. A one-liner for adjusting the step would be

step -= sign((start+(n-1)*step) - stop)*eps(step)  # adjusts step by 1 eps()

which happens to work for the original example:

julia> start=0.1
0.1

julia> stop = 0.3
0.3

julia> step = 0.1
0.1

julia> nf = (stop-start)/step
1.9999999999999998

julia> n = ifloor(nf+3*eps(nf))+1
3

julia> start + (n-1)*step
0.30000000000000004

julia> step -= sign((start+(n-1)*step) - stop)*eps(step)
0.09999999999999999

julia> start + (n-1)*step
0.3

However, even this doesn't solve @stevengj's second example:

julia> start = 1.0
1.0

julia> stop = 27.0
27.0

julia> step = 1/49
0.02040816326530612

julia> nf = (stop-start)/step
1274.0

julia> n = ifloor(nf+3*eps(nf))+1
1275

julia> start + (n-1)*step
26.999999999999996

julia> step -= sign((start+(n-1)*step) - stop)*eps(step)
0.020408163265306124

julia> start + (n-1)*step
27.000000000000004

So adjusting step by 1 eps(), the minimum possible, still might not hit the endpoint exactly.

@JeffBezanson
Copy link
Sponsor Member

I understand that as long as we use start + (i-1)*step we will not always be able to hit the endpoint exactly. Until we decide to change that, a 1-ulp adjustment that gives a better answer in at least some cases is better than nothing.

@StefanKarpinski
Copy link
Sponsor Member

This is crazy. We should not be changing the step that is given because we feel like it. The problem with the original example is that 1/10 cannot be represented exactly, but it is one of the real values that the binary number 0.1 could be used to represent. So it would be reasonable to infer that what the user wants is 1/10 rather than the closest representable float. But doesn't involve changing the step value – it involves inferring which of the possible step values that a floating point number "represents" was really intended.

@JeffBezanson
Copy link
Sponsor Member

No, this is based on the idea that the end value should be taken more seriously than the step value. In general, the step value is not observed directly, but the start and end values are. We're not changing the step because we feel like it, but to try to hit the endpoint exactly:

julia> [.1:.1:.3]
3-element Float64 Array:
 0.1
 0.2
 0.3

I would argue that's better than what we did before.

@timholy
Copy link
Sponsor Member

timholy commented Feb 18, 2013

Regardless of the other issues, I would argue that 3 lines are better than 13.

@JeffBezanson
Copy link
Sponsor Member

@timholy no argument there, certainly. Unfortunately we still need the special case for FloatingPoint, since for example Rational does not have eps.

I think we need some more examples.

I wouldn't want to change the step unconditionally, since in cases like 1.1:1.3:3 where we can't hit the endpoint no matter what, we might as well use the given step. But I'm still not sure what the best overall definition is.

@pao
Copy link
Member

pao commented Feb 18, 2013

Guido van Rossum asked for an algorithm for this a couple of years ago. It's telling that it was intended as a trick question, and he considers linspace() to be the correct answer.

@JeffBezanson
Copy link
Sponsor Member

We should probably make a bigger deal about the fact that you can avoid these issues using Range(start, step, length).

Python seems to be making its life a bit harder by excluding the end point from ranges. If the last returned value is supposed to be < stop, is a value 1 ulp less than stop "legitimately" less than it, or was the user expecting a range with fewer elements? It's hard to say whether [0, 0.7, 1.4, 2.0999999999999996] or [0, 0.7, 1.4] is the better answer. But if the end point is to be included, then it's clear the first answer is better, and ending with 2.1 instead would have been slightly better still. So I think specifying that the end point can be included also makes the API easier to use.

@StefanKarpinski
Copy link
Sponsor Member

You're not really late as we haven't implemented anything in response to this. I've been considering how best to do it. My current thought is to have a FloatRange type that stores A, B, and n and d and then define ref something like this:

function ref(r::FloatRange, k::Int)
    k -= 1; (0 <= k <= r.n) || error(BoundsError)
    ((r.n-k)*r.A + k*r.B)/r.d
end

However, four numbers is a lot of state to keep track of here, so I'm not entirely sure. That also leaves unclear how to handle the [ a+k*s for k=0:n ] case, but I suspect that one's not all that hard.

@timholy
Copy link
Sponsor Member

timholy commented Aug 6, 2013

Possibly-related:

julia> 1.0:1.0:1.0
ERROR: Range: step cannot be NaN
 in Range at range.jl:13

as reported on julia-users.

@GunnarFarneback
Copy link
Contributor

Corner case. The lines

            if abs(n-nf) < eps(n)*3
                # adjust step to try to hit stop exactly                                                                                                                       
                step = (stop-start)/(n-1)
                len = itrunc(n)

make no sense when n = 1. Fixed by pull request #3961.

@StefanKarpinski
Copy link
Sponsor Member

Just to leave a record here, these are some good tricky test cases for (a,b,s):

frange(0.1, 0.3, 0.1)
frange(0.0, 0.3, 0.1)
frange(0.3, -0.1, -0.1)
frange(0.1, -0.3, -0.1)
frange(1.0, 27.0, 1/49)[end]
frange(0.0, 2.1, 0.7)
frange(0.0, 5.5, 1.0)
frange(prevfloat(0.1), 0.3, 0.1)
frange(nextfloat(0.1), 0.3, 0.1)
frange(prevfloat(0.0), 0.3, 0.1)
frange(nextfloat(0.0), 0.3, 0.1)
frange(0.1, prevfloat(0.3), 0.1)
frange(0.1, nextfloat(0.3), 0.1)
frange(0.0, prevfloat(0.3), 0.1)
frange(0.0, nextfloat(0.3), 0.1)
frange(0.1, 0.3, prevfloat(0.1))
frange(0.1, 0.3, nextfloat(0.1))
frange(0.0, 0.3, nextfloat(0.1))
frange(0.0, 0.3, prevfloat(0.1))

@StefanKarpinski
Copy link
Sponsor Member

This is a fairly remarkable frange algorithm:

function frange(a,b,s)
    @show m = 1/s
    @show A = m * a
    @show n = floor(m*b - A)
    @show A = (A + n) - n
    @show m = a + s == 0 ? A/a : (A+1)/(a+s)
    @show A = m * a
    [ (A + k)/m for k = 0:n ]
end

@assert frange(0.1, 0.3, 0.1) == [1:3]./10
@assert frange(0.0, 0.3, 0.1) == [0:3]./10
@assert frange(0.3, -0.1, -0.1) == [3:-1:-1]./10
@assert frange(0.1, -0.3, -0.1) == [1:-1:-3]./10
@assert frange(0.0, 1.0, 0.1) == [0:10]./10
@assert frange(0.0, 1.0, -0.1) == []
@assert frange(0.0, -1.0, 0.1) == []
@assert frange(0.0, -1.0, -0.1) == [0:-1:-10]./10
@assert frange(1.0, 27.0, 1/49) == [49:1323]./49
@assert frange(0.0, 2.1, 0.7) == [0:7:21]./10
@assert frange(0.0, 5.5, 1.0) == [0:10:55]./10
@assert frange(0.0, 0.5, -1.0) == []
@assert frange(0.0, 0.5, 1.0) == [0.0]

@assert frange(prevfloat(0.1), 0.3, 0.1) == [prevfloat(0.1), 0.2, 0.3]
@assert frange(nextfloat(0.1), 0.3, 0.1) == [nextfloat(0.1), 0.2]
@assert frange(prevfloat(0.0), 0.3, 0.1) == [prevfloat(0.0), 0.1, 0.2, 0.3]
@assert frange(nextfloat(0.0), 0.3, 0.1) == [nextfloat(0.0), 0.1, 0.2, 0.3]
@assert frange(0.1, prevfloat(0.3), 0.1) == [0.1, 0.2]
@assert frange(0.1, nextfloat(0.3), 0.1) == [0.1, 0.2, 0.3]
@assert frange(0.0, prevfloat(0.3), 0.1) == [0.0, 0.1, 0.2]
@assert frange(0.0, nextfloat(0.3), 0.1) == [0.0, 0.1, 0.2, 0.3]
@assert frange(0.1, 0.3, prevfloat(0.1)) == [0.1, 0.2, 0.3]
@assert frange(0.1, 0.3, nextfloat(0.1)) == [0.1, 0.2]
@assert frange(0.0, 0.3, prevfloat(0.1)) == [0.0, 0.1, 0.2, 0.3]
@assert frange(0.0, 0.3, nextfloat(0.1)) == [0.0, nextfloat(0.1), nextfloat(0.2)]

It still has some issues with non-integral m values, however, so I'm not entirely satisfied yet, but it's getting there.

@StefanKarpinski
Copy link
Sponsor Member

Ok, this version passes every test I can throw at it:

function rat(s)
    b, d, y = zero(s), one(s), abs(s)
    while true
        f = trunc(y); y -= f
        d, b = b, f*b + d
        a = round(b*s)
        (y == 0 || a/b == s) && return a, b
        y = inv(y)
    end
end

function frange(a,b,s)
    x = (b-a)/s
    A, n, d, N, r = a, s, one(s), floor(x), round(x)
    if prevfloat(r) <= x <= nextfloat(r)
        T, D = rat(s)
        if (D*b - D*a)/r/D == s
            A, n, d, N = D*a, T, D, r
        end
    end
    [ (A+k*n)/d for k = 0:N ]
end

@assert frange(0.1, 0.3, 0.1)   == [1:3]./10
@assert frange(0.0, 0.3, 0.1)   == [0:3]./10
@assert frange(0.3, -0.1, -0.1) == [3:-1:-1]./10
@assert frange(0.1, -0.3, -0.1) == [1:-1:-3]./10
@assert frange(0.0, 1.0, 0.1)   == [0:10]./10
@assert frange(0.0, 1.0, -0.1)  == []
@assert frange(0.0, -1.0, 0.1)  == []
@assert frange(0.0, -1.0, -0.1) == [0:-1:-10]./10
@assert frange(1.0, 27.0, 1/49) == [49:1323]./49
@assert frange(0.0, 2.1, 0.7)   == [0:7:21]./10
@assert frange(0.0, 3.3, 1.1)   == [0:11:33]./10
@assert frange(0.1, 3.4, 1.1)   == [1:11:34]./10
@assert frange(0.0, 3.9, 1.3)   == [0:13:39]./10
@assert frange(0.1, 4.0, 1.3)   == [1:13:40]./10

@assert frange(0.0, 5.5, 1.0)   == [0:10:55]./10
@assert frange(0.0, 0.5, -1.0)  == []
@assert frange(0.0, 0.5, 1.0)   == [0.0]

@assert frange(prevfloat(0.1), 0.3, 0.1) == [prevfloat(0.1), 0.2, 0.3]
@assert frange(nextfloat(0.1), 0.3, 0.1) == [nextfloat(0.1), 0.2]
@assert frange(prevfloat(0.0), 0.3, 0.1) == [prevfloat(0.0), 0.1, 0.2, 0.3]
@assert frange(nextfloat(0.0), 0.3, 0.1) == [nextfloat(0.0), 0.1, 0.2, 0.3]
@assert frange(0.1, prevfloat(0.3), 0.1) == [0.1, 0.2]
@assert frange(0.1, nextfloat(0.3), 0.1) == [0.1, 0.2, nextfloat(0.3)]
@assert frange(0.0, prevfloat(0.3), 0.1) == [0.0, 0.1, 0.2]
@assert frange(0.0, nextfloat(0.3), 0.1) == [0.0, 0.1, 0.2, nextfloat(0.3)]
@assert frange(0.1, 0.3, prevfloat(0.1)) == [0.1, 0.2, 0.3]
@assert frange(0.1, 0.3, nextfloat(0.1)) == [0.1, 0.2]
@assert frange(0.0, 0.3, prevfloat(0.1)) == [0.0, prevfloat(0.1), prevfloat(0.2), 0.3]
@assert frange(0.0, 0.3, nextfloat(0.1)) == [0.0, nextfloat(0.1), nextfloat(0.2)]

Some comments:

  1. It defaults to the plain [ a + k*s for k = 0:N ] behavior by setting n = s and d = 1; this is what happens if (b-a)/s isn't within 1 ULP of an integer.
  2. It attempts to "lift" the iteration to a level where the step is integral by rationalizing s, using a customized rationalize function.
  3. If lifted form matches the given (a,b,s) tuple, then you can do the iteration lifted, stepping by an integer amount and projecting back down to get a perfect floating point range. Otherwise, it falls through to the default behavior.
  4. Once the parameters A, N, n, d are determined, iteration and indexing are quite simple: ref(r,k) = (r.A+k*r.n)/r.d; the N parameter is only needed to know the length of the range.

@JeffBezanson JeffBezanson added this to the 0.3 milestone Feb 17, 2014
StefanKarpinski added a commit that referenced this issue Feb 21, 2014
This addresses the core behvaioral problems of #2333 but doesn't yet
hook up the colon syntax to constructing FloatRange objects [#5885].
StefanKarpinski added a commit that referenced this issue Feb 23, 2014
This addresses the core behvaioral problems of #2333 but doesn't yet
hook up the colon syntax to constructing FloatRange objects [#5885].
StefanKarpinski added a commit that referenced this issue Feb 24, 2014
This addresses the core behvaioral problems of #2333 but doesn't yet
hook up the colon syntax to constructing FloatRange objects [#5885].
StefanKarpinski added a commit that referenced this issue Feb 24, 2014
FloatRange: new type for accurate floating-point ranges [fix #2333]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants