Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make LinSpace generic #18777

Merged
merged 2 commits into from
Jan 26, 2017
Merged

Make LinSpace generic #18777

merged 2 commits into from
Jan 26, 2017

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Oct 3, 2016

This is my proposed alternative to #18492 for fixing the issues with LinSpace. The main attractions of this PR are:

Given the large amount of work & discussion that has gone into linspace and friends with respect to numerical precision (at least #2333, #9637, #18492), it would be remiss not to comment (extensively 😉) on how this does and does not address previous concerns or implementations. Let me address the issues one-by-one.

Mind-reading

First and foremost, this implementation does not try to read your mind, whereas the implementation currently in master does. The easiest way to see what I mean is to compare master's behavior:

julia> r = linspace(0.0, 0.3, 4)
4-element LinSpace{Float64}:
 0.0,0.1,0.2,0.3

julia> r[2]
0.1

with the behavior of this PR:

julia> r = linspace(0, 0.3, 4)
4-element LinSpace{Float64}:
 0.0,0.1,0.2,0.3

julia> r[2]
0.09999999999999999

Note that, given the specified endpoints, that's the correct answer:

julia> 0.3/3
0.09999999999999999

julia> Float64(BigFloat(0.3)/3)
0.09999999999999999

The version in master returns 0.1 essentially because it interprets the right endpoint 0.3 as 3//10. This is what I refer to as mind-reading, and while it has some undeniable attractions, it's (1) only applicable to special types (e.g., Float32/64), (2) fairly fragile (it's the origin of #14420), and (3) arguably wrong even in principle (the user said 0.3, not 3//10).

Note that, now, people who want mind reading can be told to use Rationals directly:

julia> r = linspace(0, 3//10, 4)
4-element LinSpace{Rational{Int64}}:
 0//1,1//10,1//5,3//10

julia> r[2]
1//10

Precision

All that said, there's still a valid concern about precision: linear interpolation is famously subject to roundoff error. This implementation returns results with full numeric precision by using two calls to fma. The key function (which other types can specialize) is:

# High-precision interpolation. Accurate for t ∈ [0.5,1], so that 1-t is exact.
function lerpi{T}(j::Integer, d::Integer, a::T, b::T)
    t = j/d
    # computes (1-t)*a + t*b
    T(fma(t, b, fma(-t, a, a)))
end

As long as j >= d/2, it turns out you don't need to compute t to higher precision for this to return a numerically-precise result (as long as you use fma).

All the (new) tests compare the result against what happens when you take the two endpoints, convert them to BigFloats, perform all interpolation at BigFloat precision, and then cast back to the element type. That's what I mean by "full precision" results---there is no roundoff error to within the precision of the numerical type, as determined by the inputs upon construction.

Test changes and deletions

First, I have to say that whoever wrote the old tests (mostly Stefan?) did an amazing job. While my final implementation is dirt-simple, I confess it was not the first (or second, or third...) thing I tried; it was very demanding to find an implementation that could pass the "good" ones (spelled out below). The extensive tests saved me from many implementations that seemed equivalent but ran into trouble in corner cases. 💯

That said, anyone who looks at this PR will see many test changes and a few deletions, because the new implementation does not pass many of our old tests. For the vast majority of these I am unconcerned and changed them guilt-free, because most of these turned out to rely on mind-reading: in particular, they don't pass the "convert endpoints to BigFloat before performing operations" test. Even the so-called "additive identity" tests (e.g., (r - 1) + 1) fall in that category when combined with collect, due to the non-associativity of floating-point addition. (The old tests happened to pass for the provided inputs, but it doesn't take long at all to discover that tweaking the endpoints in those tests, in just about any way, breaks them. So we were testing for an "identity" that doesn't actually hold in general.)

However, a few of the test deletions make me sad or are worthy of public confession:

  • Behavior at u = eps(0.0): most tests pass, but ones that specifically test for generation of -0.0 don't. I'm basically unbothered by this---we have several concerns about -0.0 anyway (e.g., Should zero(::Float) = -0.0? #18341). If you think of it as a bug, it's probably best thought of as an fma bug, not a problem with this implementation of linspace.
  • linspace(-3u,3u,7) doesn't pass. I'd rather this one passed, because it's seemingly unrelated to nonsense like -0.0. I suspect this is an fma bug, so I kept the test and marked it @test_skip. Note this test passes if u = 2*eps(0.0), so the failure is specific to the smallest-magnitude floating point number.

Performance

I'm hoping that on machines with CPU-support for fma, this will be competitive with our current implementation. fma support was introduced around 2012-2013, so there many be a good number of machines that don't have such support. Without it, this ends up ccalling C code for the fma and is therefore glacially slow.

If we are concerned about performance on older machines, one potential way to fix it would be to Dekker-split the two endpoints upon construction; then you'd only have to split t at runtime, and perhaps along with some tweaks that take advantage of the symmetry of the problem you might be able to largely close the gap.

We might also consider adding a @fastmath specialization.

But before getting too speculative about performance, let's see what @nanosoldier runbenchmarks(ALL, vs=:master) thinks.

Future changes to Ranges

One source of concern is that LinSpace and FloatRange now have different behavior. I think the right way to handle this is to cause start:step:stop to return a LinSpace when any of those entries are AbstractFloats, and move FloatRange to a package. That way there won't be any inconsistencies. StepRange can still exist, only now (with this more generic LinSpace) I think it can safely be confined to Integer (thus fixing all kinds of problems like those I'm attempting to address in #18744).

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 3, 2016

Oops, put my quotes in the wrong place. @nanosoldier runbenchmarks(ALL, vs=:master).

@yuyichao
Copy link
Contributor

yuyichao commented Oct 3, 2016

I think nanosoldier runs on Haswell hardware. Does it compile julia with a generic arch that disables FMA or does it compile with native arch? @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 3, 2016

Argh, a last-minute untested tweak introduced a build bug. I'll re-launch nanosoldier once CI passes.

@jrevels
Copy link
Member

jrevels commented Oct 3, 2016

No, it doesn't disable FMA AFAIK. I'm assuming that would be an option/flag I'd have to enable before running make (which is all nanosoldier does).

Also, for when you relaunch: runbenchmarks(ALL, vs=:master) --> runbenchmarks(ALL, vs=":master") Nanosoldier picked up your other submission, but rejected it because the vs argument wasn't valid. I don't know why it didn't throw an invalid job error, though - it should have...

@_inline_meta
t = j/d
# computes (1-t)*a + t*b
T(fma(t, b, fma(-t, a, a)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put T(j/d) and remove the T here.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will fail if a, b = rand(5), rand(5).

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the "confused": the point being this is the fallback implementation, so should support any type that implements fma. T might not be a number. Or, T might be a number type like Fixed{Int8,7} (from FixedPointNumbers) which can't represent 1. Linear interpolation between two valid numbers is still perfectly fine in such circumstances, though, so there's no need to fail. That's why I'm not going to take your suggestion 😄.

@musm
Copy link
Contributor

musm commented Oct 3, 2016

Unfortunately non hardware fma is about 40 times slower on my machine. Also, anyone who downloads the windows binaries will also not have access to hardware fma, unless they manually build a system image, even if their cpu supports it.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 3, 2016

Yeah, it's really bad without the hardware acceleration. I suspect the approach I outlined would make things hugely better, but it's hard to know what the final result would be.

Let's see whether enough people like the overall direction, though, before deciding to tackle the performance question.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 3, 2016

Though cross-linking to JuliaMath/DoubleDouble.jl#18 in case anyone gets excited about tackling the performance question now 😄.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 3, 2016

Most of this is passing, so let's @nanosoldier runbenchmarks(ALL, vs=":master"). (Thanks @jrevels, I was going from memory but I must be out of practice!)

Looks like we have a problem on 686 in the nearly-overflowing-numbers tests. Why would overflow result in an InexactError, when on x86_64 it just gives you NaN? Perhaps this is an fma bug?

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Oct 3, 2016

That was indeed me who wrote the linspace tests (and float ranges tests). I'm glad they were useful. It's basically impossible to get this stuff working without a very thorough test suite – including some randomly generated test cases, since you can't (or at least I couldn't) think of all the nasty things that can actually happen. I agree that the cases that are "broken" by this PR are marginal enough to ignore – and maybe we can fix fma somehow to avoid them.

I'm sold on this change except the last part about a:s:b syntax returning a LinSpace for floating point. Occasionally, when someone writes a:s:b, we have the pleasant situation that (b-a)/s is exactly an integer, n, and a + n*s == b, also exactly. In that case, we could indeed return linspace(a,b,n). But that's an excruciatingly rare case. Often (b-a)/s is not an integer, and even when it is, sometimes a + n*s is not exactly equal to b. In such cases, there are two possibilities:

  • The user has some real numbers A, B and S in mind such that (B-A)/S is exactly an integer, n, and therefore, A + n*S = B, exactly, and convert(T,A) == a, convert(T,B) == b and convert(T,S) == s (using Julia syntax as if real numbers were somehow Julia values).
  • The user has no such real numbers in mind and simply intends for you to do the naive thing and produce the values a, a + s, a + 2s, ..., a + n*s for n maximal such that a + n*s ≤ b.

The "no guessing" version of a:s:b is to ignore the former case and do the naive thing all the time – i.e. return a + (i-1)*s for the ith element of a:s:b. We used to do that. TL;DR: that behavior is so broken that we would be better served to simply remove a:s:b for floating point numbers altogether. An example, in case it's not clear:

julia> showall(collect(0:6)*0.1)
[0.0,0.1,0.2,0.30000000000000004,0.4,0.5,0.6000000000000001]

With naive behavior collect(0:0.1:0.6) would be [0.0,0.1,0.2,0.30000000000000004,0.4,0.5] – the collection is too short by a value (0.6000000000000001 gets excluded because it's too big) and has an "off" value in the middle – 0.30000000000000004. There's an endless supply of these broken cases.

The only other option is to guess at values of A, B and N, and by implication S = (B-A)/N, such that

  • convert(T,A) == a
  • convert(T,B) == b
  • convert(T,(B-A)/N) == s

That's what my "mind reading" code does – it finds rational values that project down onto a, b and s when converted to floating point. If it doesn't find any such values, it gives up and does the naive a + (i-1)*b iteration – which is sometimes exactly what the user intended.

People are uneasy about this "guessing" – I suspect, largely because they don't understand what or why it's doing what it's doing, not because it's doing something they don't like. (There haven't been any complaints about a:s:b not doing what someone wanted in a long while.) So let's consider what the potential failure modes of this behavior are:

  • The user actually wanted naive behavior and accidentally chose a, b and s such that lifting just happened to work. No one has ever reported this. I would be shocked (shocked, I tell you!) if anyone ever encountered such a case.
  • The user has some A, B and N in mind that are too complicated for the algorithm to figure out. This does happen sometimes, but it's really rare. The current algorithm figures out any reasonable rational-valued endpoints, so the cases that it misses tend to be ones where a rational range is shifted by a random amount – i.e. they wrote o+a:s:o+b where o is not a nice even rational value but a, b and s are. In these cases, the user is best advised to use linspace or to rewrite the range as o+(a:s:b).
  • The user had a different A, B and N in mind than the ones the algorithm figured out. This is plausible, but again has never actually been reported. I haven't been able to construct a plausible case of this either. Keep in mind, that for many such cases there's no discernible difference since the set of values yielded by the resulting range object are the same.

The current algorithm, although it's technically guessing, is justified in doing so – because some amount of guesswork is inherent in the a:s:b syntax. Matlab also guesses, for what it's worth – it's just a lot less principled about how it guesses. So if we're going to change the behavior of a:s:b to something else, then the only viable option in my view is to remove it for floats entirely.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 4, 2016

OK, I had temporarily forgotten about cases like 1:0.5:3.75, and with that one example I agree it's a no-brainer that something like FloatRange/StepRange is here to stay---indeed, they're perfectly aligned with the maxim I'm promoting here, which is to respect the user's parametrization (which is in terms of start, step, and stop rather than start, stop, length).

So I'd amend my statement about x:y:z returning a LinSpace (you're absolutely right, it definitely should not), but perhaps range(start, step, len) should? I tend to think that when in conflict, length is more "primary" than step? And in this case the two are commensurate, up to roundoff error this API does not permit disagreement among the parameters.

Along these lines: I do suspect something in the FloatRange/StepRange part of the world will also have to change. Right now, StepRange is "mostly" generic (the exception being that you can't use it for AbstractFloat) whereas FloatRange is specific to AbstractFloat. For doing stuff like making ranges of Unitful quantities, that means we have to use StepRange. But because the "barrier" against floating point was placed for a good reason, we have the following situation (aka #18744):

julia> using Unitful

julia> const mm = u"mm"
mm

julia> r = (1:800)*(0.8mm);

julia> typeof(r)
Array{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},1}

julia> last(r)
640.0 mm

julia> r = 0.8mm:0.8mm:(800*0.8mm);

julia> typeof(r)
StepRange{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},Quantity{Float64, Dimensions:{𝐋}, Units:{mm}}}

julia> last(r)
640.0 mm

julia> length(r)
799                          # <--- from the above you know it's lying here, really there are 800 elements

julia> rc = convert(Array, r);

julia> typeof(rc)
Array{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},1}

julia> length(rc)
799

julia> last(rc)
639.1999999999979 mm    # but now there are not

So we have this situation where the last element isn't a reliable guide to the length (computed from div(r.stop+r.step - r.start, r.step)). Moreover, stop gets "recomputed" from the user's input so that it becomes an integral multiple of the length. Consequently one might specify a range with what appears to be a safe API and then witness it truncated downward:

julia> r1 = 0.0mm:0.1mm:0.9mm
0.0 mm:0.1 mm:0.8 mm

julia> r2 = range(0.0mm, 0.1mm, 10) # hmm, OK, try specifying 10 elements
0.0 mm:0.1 mm:0.8 mm     # <---- it gives me 9

julia> r1 == r2
true

# And this is just lovely:
julia> length(r2)    # ... and claims I have only 8
8

julia> rc = convert(Array, r2)   # ...which is all it keeps here
8-element Array{Quantity{Float64, Dimensions:{𝐋}, Units:{mm}},1}:
 0.0 mm
 0.1 mm
 0.2 mm
 0.3 mm
 0.4 mm
 0.5 mm
 0.6 mm
 0.7 mm

so now we're down by two elements from where we intended to start.

As an argument that FloatRange is more robust than StepRange, this makes a pretty compelling case. But its restriction to AbstractFloat is problematic: really, you'd like for its logic to be used for any type subject to problems from rounding, but thanks to Julia's wonderful polymorphism that's very hard to encapsulate with a requirement like T <: AbstractFloat. (To solve this well, we probably need ArithmeticExact and ArithmeticRounds traits, aka IntegerLike and FloatLike.)

Overall, I think we're very much on the same page, it's just a question of figuring out what to do about the "step" ranges. It occurs to me that one possible solution would be to unify them into a single type which preserve the user's original inputs in addition to whatever computations we perform twiddling the parameters. That would seem to be a way to help ensure that length always returns the same thing it did in the beginning, because it will be based on what was used to construct the range rather than some quantity which might have been modified by roundoff error.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 4, 2016

Alright, so it seems this has a future. In addition to the general approval, I notice that the nanosoldier run worked out fairly well, with most changes being improvements and the worst regressions being ~1.5x.

So I think this moves into "what do we need to do to finish this off?" mode. Here are items I see:

  • decide whether we want range(start, len) and/or range(start, step, len) to return a LinSpace.
  • figure out what's up with the 686 architecture failure. fma bugs are presumably openlibm bugs?
  • figure out how to solve the terrible performance when you don't have hardware emulation. Possibilities:
    • say "sorry, them's the breaks---consider buying a newer computer"
    • when hardware support is unavailable, use lower accuracy
    • be accurate/slow by default, but give people the option to accept lower accuracy via new extensions to @fastmath
    • (EDITed): come up with a faster way of implementing fma in software. I think this almost requires that linspace(0.0, 0.3, 4) will (on such platforms) return a SplitFloat{T} which has been pre-processed so that most of this logic is done at construction time rather than evaluation time (at construction time the endpoints are known, and it's also known that the scalar factor will be between 0.5 and 1, so we have several advantages). Again, this would ideally be done for any element type subject to roundoff error, but since linspace has historically only supported AbstractFloat I suspect we could (if necessary) temporarily get away with just doing this for such types.

Of those latter options, I kind of think we want both of the last items, and that at least one must be in place before we can merge this.

@tkelman
Copy link
Contributor

tkelman commented Oct 4, 2016

It's not just windows binaries that are built without fma support. Mac binaries are currently built for core2 (the earliest Intel macs), i686 Linux and Windows binaries are built for pentium4, and x86_64 Linux and Windows binaries are built for x86-64 (though requiring cmpxchg16b recently, not without causing problems - #18701). Building binaries that don't start because you don't have recent hardware isn't very nice. Retuning for the native hardware after install time might work, but is a bit time consuming and requires some toolchain components that aren't necessarily always present.

We can start trying to do what openblas does and pre-build multiple copies of the system image for different instruction set generations and switching at runtime, but that requires some infrastructure that we currently don't have (see #14995 for a not totally general prototype). It also doesn't help with all of the C libraries other than openblas (e.g. openlibm) that don't implement their own optimized kernel runtime dispatch.

@tkelman
Copy link
Contributor

tkelman commented Oct 4, 2016

There was a somewhat regrettable discussion on hacker news a few days back, the conclusion of which (from most onlookers) was that we probably don't need to export a range function since it's mostly redundant and can be confusing since different languages choose different conventions for the inputs.

end

"""
linspace(start::Real, stop::Real, n::Real=50)
linspace(start, stop, n=50)
Copy link
Contributor

@tkelman tkelman Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signature needs adjusting in rst too

(possibly unless the manual conversion gets merged before this - not sure) it did

@simonbyrne
Copy link
Contributor

I guess it would be good to write down exactly what we want. From what I understand, the intention is that x = linspace(a,b,n) should give a vector-ish object (should it be a Range or AbstractVector?) such that:

  1. length(x) == n
  2. x[1] == a
  3. x[n] == b
  4. for integer 1 <= i <= n, x[i] should be equal to the correctly rounded (a*(n-i) + b*(i-1))/(n-1) under the relevant type.

Points 1-3 seem reasonable. 4 is trickier (at least for floating point), and this PR doesn't quite provide that, e.g.

function lerpi{T}(j::Integer, d::Integer, a::T, b::T)
    t = j/d
    # computes (1-t)*a + t*b
    T(fma(t, b, fma(-t, a, a)))
end

function lerpi_big(j,d,a,b)
    ba = big(a); bb = big(b); bj = big(j); bd = big(d)
    oftype(a,(ba*(bd-bj) + bb*bj)/bd)
end

a = 2.0693042755264957
b = 0.18522296289341592
j = 6
d = 11
julia> lerpi(j,d,a,b)
1.041623559544816

julia> lerpi_big(j,d,a,b)
1.0416235595448158

I'm not sure easy it would be to actually implement. Alternatively, we could weaken condition 4, in that case it might be possible to dispense with the fmas.

# This is separate to make it useful even when running with --check-bounds=yes
function unsafe_getindex(r::LinSpace, i::Integer)
d = r.lendiv
j, a, b = ifelse(2i >= length(r), (i-1, r.start, r.stop), (length(r)-i, r.stop, r.start))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you want the branch the other way, as you want t = j/d < 0.5 (otherwise the inner fma buys you nothing, since 1-t is exact).

@simonbyrne
Copy link
Contributor

I've tried out a couple of different strategies here: https://gist.github.com/simonbyrne/45c5924940746d628e94de38a78ae37c

The tldr is that with one small modification, this is the best approach; the next best is a fairly simple one.

Now all these methods fail catastrophically if a ≈ -b and j ≈ d/2, in which case we would need a better approach.

@StefanKarpinski
Copy link
Sponsor Member

Out of curiosity, where does the name lerpi come from?

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 4, 2016

@simonbyrne, really glad to see you chiming in here. Thanks for the help. I haven't had time to tackle this yet today, but I will check this out carefully ASAP.

@StefanKarpinski, lerp is somewhat standard in certain fields, and because I was specifying the point using a pair of integers I added the i. No better reason than that. (I thought about lerpr for Rational, but we're not actually passing a Rational which is why I went with i.)

I'm not remotely wedded to the name, alternatives are welcome.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 5, 2016

Again, thanks @simonbyrne for getting involved. I was sort of hoping you might find this interesting, and was looking forward to your input.

So, first, you're absolutely right that I misspoke above, in two ways:

  • the answer is not precise to all digits. I should have said the answer is precise to within an absolute tolerance Δ that "matters" (I'll define this below)
  • even that is not quite right, because my test used <=Δ, and even for the revised statement to be true it should have been . Oops.

OK, so what tolerance matters? You measured the precision of the answer against itself. My tests measured the precision of the answer against the endpoints, using Δ = max(eps(a), eps(b)). Both criteria matter, and I didn't even think to measure yours. However, I would argue that using the "endpoints" is perhaps the more relevant of the two: the endpoints can't be specified any more precisely than Δ, even if the user wished it were so. So Δ determines the length scale that matters for the answer, too.

Using a slight modification of your test script, here are the answers I get:

function  ans_worst  ans_median      end_worst   end_median
lerp0       2.131      0.351           2.131       0.224
lerp1       1.809      0.333           1.702       0.212
lerp2       2.193      0.352           2.083       0.224
lerp3       1.394      0.273           1.172       0.172
lerp4     132.738      0.292           1.098       0.193
lerp5       2.174      0.291           1.840       0.189

The ans results are using your criterion (the precision of the answer measured against itself), the end results are using my criterion (the precision of the answer relative to Δ).

The most interesting case is naturally lerp4, the one I proposed in this PR. As you noticed (and I didn't), it has enormous worst-case "self" (ans) error. However, it's the best when it comes to "endpoint" error, and very close to 1 which (mostly) explains why my <= Δ criterion passed (your tests presumably hit some values that our linspace tests do not).

The reason I suspect lerp4 exhibits the best "endpoint" error is that with t >= 0.5, t + (1-t) is always exactly equal to 1.0, whereas for t < 0.5 that true only up to some roundoff error. Consequently, in the t >= 0.5 case you're doing "true" linear interpolation (perhaps at a slightly different point than you intended), whereas in the t < 0.5 case you're not always performing true "interpolation."

Now, interestingly, this reveals that lerp4 is not the best in terms of median behavior; that honor indeed goes to lerp3. So in fact I think lerp3 could be the better choice; I'd say the choice between the two is not entirely straightforward.

It is interesting, though, how much better worst-case behavior the two fma algorithms have (almost one Δ-unit) compared to "regular" arithmetic.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 5, 2016

I added two more to the mix, lerp6 and lerp7 posted in that gist. lerp7 uses DoubleDouble for its computations, and lerp6 computes a correction Δt which also uses the same technique (but performs all other computations using normal precision). Results:

lerp6       1.474      0.276           1.119       0.179
lerp7       0.500      0.250           0.500       0.154

lerp7 is (unsurprisingly) the best of the bunch, but I'm not sure I understand why it shows any error.

@simonbyrne
Copy link
Contributor

lerp7 is (unsurprisingly) the best of the bunch, but I'm not sure I understand why it shows any error.

I'm comparing the error to the result computed in BigFloat precision, so you would expect the minimum error to be 0.5 ulps (when the BigFloat answer is halfway between 2 floating point numbers).

@simonbyrne
Copy link
Contributor

Here's how one could do it without DoubleDouble.jl:
https://gist.github.com/simonbyrne/45c5924940746d628e94de38a78ae37c#file-lerptest-jl-L41-L66

Which seems to be correctly rounded (i.e. accurate to within 0.5 ulp): the other nice feature is that the division and fma can be done at the creation of the LinSpace object. I'll do a bit more tuning and compare the performance.

@timholy
Copy link
Sponsor Member Author

timholy commented Oct 5, 2016

Ah, yes, I just figured that out too. If you round the result to Float64 before doing the comparison, here's my current status:

function  ans_worst  ans_mean   ans_median      end_worst   end_mean   end_median
lerp0       2.000      0.337      0.000           2.000       0.233      0.000
lerp1       2.000      0.309      0.000           2.000       0.215      0.000
lerp2       2.000      0.338      0.000           2.000       0.234      0.000
lerp3       1.000      0.199      0.000           1.000       0.140      0.000
lerp4      83.000      0.250      0.000           1.000       0.131      0.000
lerp5       2.000      0.222      0.000           2.000       0.141      0.000
lerp6       1.000      0.194      0.000           1.000       0.119      0.000
lerp7       1.000      0.000      0.000           1.000       0.000      0.000
lerp8       1.000      0.001      0.000           1.000       0.000      0.000

lerp8 is very impressive, and blows my feeble attempt at something similar with lerp6 out of the water 😄.

My new laptop had a motherboard problem and is out for repair, so I don't have easy access to a machine with hardware fma support. Assuming you do, have you benchmarked these to see how they perform? EDIT: oh, I see you're working on that. The fact that you only need fma on the length (i.e., can be done at construction time) is awesome.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 26, 2017

Some kind of rounding is needed, but one could probably get sensible behavior "cheaply" like this:

function colon(a, s, b)
    len_est = (b-a)/s + 1
    StepRangeLen(a, s, floor(Int, len_est+10*eps(len_est)))
end

This doesn't make the returned values "pretty," of course. I just think that @andreasnoack is saying that, to him, -1.11022e-16 is just as beautiful as 0.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 26, 2017

AppVeyor failure is unrelated. Glad to see this emerge from the other side!

@timholy timholy merged commit 32d0645 into master Jan 26, 2017
@timholy timholy deleted the teh/linspace branch January 26, 2017 19:55
@StefanKarpinski
Copy link
Sponsor Member

I don't see how getting within 10 ulps (or any other simplistic criterion) is actually better that what we do now – it's just a dumber way to not take the user's input literally, and will often result in a step and/or endpoint which do not round to the given floating-point values. That's the opposite of taking the given inputs at face value.

I do, however, think that something more comprehensive and principled might be possible. In particular, the main objection to "guessing" can only be that there might be multiple, observably different interpretations of the user's input. In other words for a given a:s:b for a, s, b floating-point values, there's a set of true real values α, σ, β that round to those floating-point values, a subset of which have n = (β - α)/σ an integer and a + n*σ == β exactly. Currently, we're searching for smallish rational α, σ, β such that this is true and giving up if we can't find such. We could instead consider all real values α, σ, β and choose the best in some general sense, keeping in mind that if all admissible α, σ, β result in the same observable values, any choice is fine.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 26, 2017

Sure. Let me clarify that I'm fine with having it stay like it is now (pending real-world feedback). But also keep in mind that even now there's no guarantee that a:s:b hits b exactly, for example 1:1:3.5. (Hitting both endpoints exactly seems to be the job of LinSpace.) One could take that a license to mess up more broadly 😄.

I think an easier simplification might be to consider ditching StepRange in favor of StepRangeLen. But not for 0.6.

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Jan 26, 2017

Right, there's the complication that the user's intention might not be to hit b exactly – the syntax is overloaded like that, which complicates things further. This corresponds to the case where no possible choices of α, σ, β is admissible. If anything, we currently don't search hard enough for possible real triples and conclude that some ranges are inexact when they aren't actually. E.g.:

julia> d = rand()
0.3359637381398932

julia> length(d+0.1:0.1:d+0.9), length(0.1:0.1:0.9)
(8,9)

julia> mean((d = rand(); length(d+0.1:0.1:d+0.9) != 9) for _=1:10^7)
0.4471468

This fails for about 45% of the time but it should succeed every time if we were searching thoroughly. Of course it's possible that some of these are actually impossible to satisfy since d+0.1 and d+0.9 as floating-point are not the same as d+1/10 and d+9/10 as reals.

/(r::FloatRange, x::Real) = FloatRange(r.start/x, r.step/x, r.len, r.divisor)
/(r::LinSpace, x::Real) = LinSpace(r.start / x, r.stop / x, r.len, r.divisor)
# For #18336 we need to prevent promotion of the step type:
+(x::Number, r::AbstractUnitRange) = range(x + first(r), step(r), length(r))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must've not noticed these before, but assuming it's intentional that you've widened these from Real to Number? Should probably be tested if we're sure we want to keep that.

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost certain there's a test something like (1:3)+im, although I don't know whether the output type is tested.

But yes, there's no particular reason to restrict to Real anymore.

@andreasnoack
Copy link
Member

Are you advocating for giving up on a:s:b syntax for ranges, @andreasnoack?

Yes.

I just think that @andreasnoack is saying that, to him, -1.11022e-16 is just as beautiful as 0.

beautiful and floating point numbers usually don't go together, but yes, I don't think you should care if it is -1.11022e-16 or 0.0.

@StefanKarpinski
Copy link
Sponsor Member

Empirically, people do and we don't get bug reports about this sort of thing anymore 😬

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 27, 2017

I've noticed that, 👍. This suggests that even though many of us are happy to tolerate ulp-level differences, expunging them is a good thing. Fewer bug reports (less email) is priceless.

Will be interesting to see if someone comes up with a simpler approach, but for now we may have to live with surprisingly-complex code for ranges. Even if it's complex, getting both generality AND better accuracy out of this most recent change suggests we're doing quite a lot right.

@ajkeller34
Copy link
Contributor

Amazing work here. I am very excited about ranges supporting generic types!

I just noticed, but is this method for _colon actually reachable? This method for colon (which is the only one that calls _colon, afaict) is restricted to T<:Real, for which TypeOrder(T) is always HasOrder(). Perhaps it is just a stub for if/when these traits are meant to be adopted by types outside of Base? I could implement traits for Unitful quantities but since physical quantities are not <:Real I may still need to write my own methods for colon and possibly range, correct?

The most important thing is that the range types themselves now work with generic types, so I'm very excited about that. I just want to make sure I'm implementing things sensibly and in line with any ideas @timholy has about how this should all fit together.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 29, 2017

Right, I suspect you should write your own colon and range methods, rather than extending the internal ones. I agree that method looks unnecessary for the reasons you state.

StefanKarpinski pushed a commit that referenced this pull request Feb 8, 2018
Make trunc, round, floor, ceil testing more precise
tlnagy added a commit to tlnagy/KernelDensity.jl that referenced this pull request Jun 3, 2018
KernelDensity.jl rolled its own range generator which suffers from
similar flaws as Base's linspace prior to being fixed in
JuliaLang/julia#18777. This closes JuliaStats#39.
tlnagy added a commit to tlnagy/KernelDensity.jl that referenced this pull request Jun 10, 2018
KernelDensity.jl rolled its own range generator which suffers from
similar flaws as Base's linspace prior to being fixed in
JuliaLang/julia#18777. This closes JuliaStats#39.
ararslan pushed a commit to JuliaStats/KernelDensity.jl that referenced this pull request Jun 10, 2018
KernelDensity.jl rolled its own range generator which suffers from
similar flaws as Base's linspace prior to being fixed in
JuliaLang/julia#18777. This closes #39.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

linspace(...)[1] can differ from linspace start value