Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Tang's algorithm for log and log1p #10008

Merged
merged 3 commits into from
Apr 28, 2015
Merged

Conversation

simonbyrne
Copy link
Contributor

This implements log and log1p in native Julia, using Tang's table lookup algorithm (see #8869).

Depending on architecture, it seems to be about 5-20% faster than openlibm, and is more accurate (relative error is below 0.56 ulps).

Some issues:

  • It doesn't include any tests: any suggestions for a good way to do this?
  • It uses hex float literals, which will cause problems with MSVC (strtod issue for MSVC #6349): if this issue is likely to get resolved soon (with VS2015) I'd prefer to wait, otherwise I can change the values to decimal.

If we're happy to use this, it should be fairly straightforward to modify this code for log2 and log10 (as well as their 1p versions, see #6148).

@tknopp
Copy link
Contributor

tknopp commented Feb 1, 2015

Is there any advantage of using the hex float representation?

@simonbyrne
Copy link
Contributor Author

It ensures we get exactly the floats we want, which is critical for these types of functions: occasionally weird bugs do arise in strtod functions (and since we depend on the system libc for this, we're at their mercy): see for example http:https://www.exploringbinary.com/tag/bug/.

It is also typically faster to parse, but that's not really a concern here.

@tknopp
Copy link
Contributor

tknopp commented Feb 1, 2015

does this mean we cannot reliably parse a string decimal float with full accuracy?

#6349 was really hard to discover and IMHO we should make @tkelman's life as easy as possible when it comes to MSVC compatibility

@simonbyrne
Copy link
Contributor Author

The other issue is what to do when USE_SYSTEM_LIBM is enabled.

@tkelman
Copy link
Contributor

tkelman commented Feb 1, 2015

as easy as possible when it comes to MSVC compatibility

MSVC compatibility is and has been broken since #9266 added some inline assembly in Julia's C code. Still not fixed because I didn't understand a possible workaround using longjmp that Jameson briefly mentioned on IRC a while ago. I think we're broken on anything that isn't GCC or Clang at the moment, according to the last issue from someone trying to use the Intel compiler.

@tknopp
Copy link
Contributor

tknopp commented Feb 1, 2015

Yes, until we have this fixed and MSVC CI the compatibility can always be silently broken. Still, I think we should try the best to keep MSVC compatibility where possible.

@simonbyrne
Copy link
Contributor Author

@tknopp I guess it depends on your definition of reliable: I don't think there is any implementation that has been proven correct, but the problem cases that arise typically involve incorrect rounding when using a lot of decimal places and extreme exponents, e.g. this one.

We would be extremely unlikely to hit such issues here (at least if using a modern libc), so it shouldn't be an issue to change them. It's more that it makes the algorithm clearer: e.g. some numbers are chosen so that they have zeros in the last few bits.

@tknopp
Copy link
Contributor

tknopp commented Feb 1, 2015

If its worth it then it should be used. In the end we need to clearly define anyway if hex floats are part of the language or not.

@eschnett
Copy link
Contributor

eschnett commented Feb 1, 2015

To test it, you can e.g. compare a thousand (random?) values against openlibm with a certain accuracy. You could also compare to BigFloat's log function.

For Float32, you can even "prove" correctness by testing the function for all 2^32 possible Float32 values. (It's not a real proof since you'd be relying e.g. on BigFloat which hasn't been proven correct.)

Is this log function vectorizable? I assume it is not, given that there is a table lookup and an if statement. For some users, vectorizability is more important than even a 50% speed improvement. (openlibm's log function is not vectorizable.)

On what architectures does this give what of speed-up? For example, Intel's new AVX-512 instruction set contains a log helper instruction that may be more efficient than a manual implementation. There are also helper instruction to fix up results depending on sign, inf, nan, etc. This instruction set may not be important now, but if we implement special functions ourselves, then we need at some point a mechanism to switch between different implementations for different architectures.

It would be interesting to keep the old log function around under a different name (not exported from Base), so that one can compare performance more easily.

I am planning a similar approach than you for @fastmath, which would replace the regular log function with a faster one if that makes sense. I like your idea very much. For @fastmath, a vectorizable log implementation is probably the way to go.

@simonbyrne
Copy link
Contributor Author

For Float32, you can even "prove" correctness by testing the function for all 2^32 possible Float32 values. (It's not a real proof since you'd be relying e.g. on BigFloat which hasn't been proven correct.)

True, but the cost of computing 2^32 BigFloat logs is likely to be prohibitive. I think having few thousand values spaced out over the range might be the way to go.

Is this log function vectorizable?

No, as you point out I don't think this is a pattern that could be easily made to vectorise.

On what architectures does this give what of speed-up?

It varies: on a Xeon Westmere (no AVX), I saw about a 20% boost. On a Sandy Bridge i7, about 5%, and
on a Haswell machine, it was basically a wash about 5-10% (after incorporating #9881).

It would be interesting to keep the old log function around under a different name (not exported from Base), so that one can compare performance more easily.

We could create a Base.Math.Libm submodule?

I am planning a similar approach than you for @fastmath, which would replace the regular log function with a faster one if that makes sense. I like your idea very much. For @fastmath, a vectorizable log implementation is probably the way to go.

Agreed, I think having specialised @fastmath functions is a good idea.

@ViralBShah
Copy link
Member

I think we should not worry about USE_SYSTEM_LIBM if we are using an internal function. This can be noted in the documentation for log, and one should be able to get to it with Base.Math.log.

Coming to think of it, I really would just like to have different namespaces for math functions - Base.Math.OpenLibm, Base.Math.SysLibm, Base.Math.Native (julia implementations). Then, we can have all of these co-exist, and export different defaults as our implementations improve, while still having the ability to call a specific version. It would also make it easy then, to have a CrLibm.jl package.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 3, 2015

fwiw, i'm pushing for removal of the hex float parsing support entirely and require using reinterpret(Float64, 0xABC) instead

@eschnett
Copy link
Contributor

eschnett commented Feb 3, 2015

@vtjnash Distributing a working strtod with Julia should be straightforward. I like hex floats, since they are sometimes easier to understand than decimal ones. Do you want to have them removed from Julia, or just from this patch?

@StefanKarpinski
Copy link
Sponsor Member

I'm pretty sure that the double-conversion library includes a strtod implementation.

@tkelman
Copy link
Contributor

tkelman commented Feb 3, 2015

I'm pretty sure that came up in #6349 and/or #5988, but a) we don't use that library any more, and more importantly b) the C++ implementation there is a very different API from the C version we use.

@ViralBShah
Copy link
Member

It would be nice to get this merged.

@ViralBShah ViralBShah added the performance Must go faster label Apr 16, 2015
@simonbyrne
Copy link
Contributor Author

Okay, I can change the hex floats to decimal floats (we can revert later if Windows support improves).

It did occur to me that we could generate the tables via stagedfunctions, though perhaps that's taking it a bit too far (it would also make it more difficult to create a "julia-lite" without MPFR).

I don't have time to look at the log2 or log10 versions for a bit, but I'll try them out once I get a chance.


# determine if hardware FMA is available
# should probably check with LLVM, see #9855.
const FMA_NATIVE = muladd(nextfloat(1.0),nextfloat(1.0),-nextfloat(1.0,2)) == -4.930380657631324e-32
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a hack until #9855 is solved.

@tkelman
Copy link
Contributor

tkelman commented Apr 16, 2015

The MSVC build is mostly working again (some small issues remaining, ref #10775) so using decimal floats would be preferred. I'll check VS 2015 at some point after it gets officially released, so far I've only been using the prerelease versions remotely on appveyor.

@simonbyrne
Copy link
Contributor Author

Okay, I've updated and added some tests.

@ViralBShah
Copy link
Member

Should we retain a way to call the Openlibm log? The exported one can be the new one.

@simonbyrne
Copy link
Contributor Author

That's probably a good idea. Any ideas on a good way to do this?

@tkelman
Copy link
Contributor

tkelman commented Apr 22, 2015

Libm module?

@ViralBShah
Copy link
Member

How about OpenLibm? In the future this could co-exist with the system Libm module too then.

@ViralBShah
Copy link
Member

I am thinking that we release the new log then in a new module that contains fast julia implementations, and after it has been in the wild for a little bit, with more people kicking the tyres, we make it the default - kind of in two steps. Just being a bit paranoid.

@tkelman
Copy link
Contributor

tkelman commented Apr 22, 2015

Since the API's are more or less compatible between openlibm and system libm, wouldn't the code on the Julia side be the same between the two, just with different library names in ccall?

@ViralBShah
Copy link
Member

Yes - just some macros to generate both are needed.

@ViralBShah ViralBShah added this to the 0.4.0 milestone Apr 22, 2015
@simonbyrne
Copy link
Contributor Author

So is the idea that we have 3 modules:

Base.Math.SystemLibm
Base.Math.OpenLibm
Base.Math.JuliaLibm

and have a build variable to switch between them?

@ViralBShah
Copy link
Member

Perhaps eventually. What do you think? Is it overkill? Perhaps the SystemLibm could just be a package, and we have just two then.

@tkelman
Copy link
Contributor

tkelman commented Apr 23, 2015

One thing that I found when trying to wrap an alternate blas implementation as a package is that library bindings for things already in base would pretty much just be copying code straight out of base, and would be fairly messy to keep up to date over time (I didn't really bother trying). For libraries like blas, lapack, libm, (MKL's FFTW maybe?), it might be worth trying to make the base wrapper code a little more modular, so we expose a "wrap this library" entry point and base just uses one instance of that.

@ViralBShah
Copy link
Member

Could we allow users to change the libname that gets passed to ccall, so that a different library gets picked up? That way we don't have to replicate code for every library and bloat the system image, but users can switch.

In this particular case, I still prefer JuliaLibm to contain the new julia implementations as they come.

@simonbyrne
Copy link
Contributor Author

Okay, so should I just wrap this PR in a module? Or would we better off keeping it in a package?

@ViralBShah
Copy link
Member

My preference is to reinstate the original log, and create a JuliaLibm module into which this and other julia implementations go. Then we ask people to test it out a bit more, and make the new one default in a few days.

ViralBShah pushed a commit that referenced this pull request Apr 28, 2015
use Tang's algorithm for log and log1p
@ViralBShah ViralBShah merged commit af7a8c2 into JuliaLang:master Apr 28, 2015
@ViralBShah
Copy link
Member

I just realized that it would be nice to update the documentation to mention the new faster log functions as well, and how to access them.

@simonbyrne simonbyrne deleted the log-tang branch April 28, 2015 15:01
@simonbyrne
Copy link
Contributor Author

Any suggestion for where would be appropriate?

@ViralBShah
Copy link
Member

I was thinking of having a note in the documentation of the existing log functions about the faster variants. Perhaps even link to Yeppp.jl?

@simonbyrne
Copy link
Contributor Author

I've added a note in 939e9e2 (though I didn't mention Yeppp.jl)

@ViralBShah
Copy link
Member

Thanks!

@stevengj
Copy link
Member

This has been here for a while... time to update the default log and log1p functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants