Openlibm's `pow`, `hypot`, `exp` and `log` are considerably slower than Glibc's #234

jessymilare · 2021-06-02T15:08:48Z

I just ran benchmarks of openlibm (compiled by gcc and clang) and system libm (Glibc), and these were the results:

            openlibm(gcc)       syslibm       openlibm(clang)
  pow     :  17.4427 MPS   :  69.7010 MPS   :  18.1047 MPS
  hypot   :  72.7657 MPS   :  107.5491 MPS  :  76.4914 MPS
  exp     :  132.5313 MPS  :  197.3627 MPS  :  145.4958 MPS
  log     :  129.2455 MPS  :  216.8139 MPS  :  130.2769 MPS
  log10   :  103.1023 MPS  :  118.6334 MPS  :  96.9040 MPS
  sin     :  176.9966 MPS  :  157.4333 MPS  :  150.2365 MPS
  cos     :  162.1996 MPS  :  166.1857 MPS  :  161.8407 MPS
  tan     :  93.6082 MPS   :  76.9096 MPS   :  90.7496 MPS
  asin    :  124.9897 MPS  :  114.8927 MPS  :  128.0259 MPS
  acos    :  140.7629 MPS  :  110.8343 MPS  :  129.6873 MPS
  atan    :  120.1154 MPS  :  79.6637 MPS   :  130.1246 MPS
  atan2   :  54.1519 MPS   :  43.4398 MPS   :  58.3252 MPS

System Libm is considerably faster than Openlibm on pow (4x) hypot, (1.5x) exp (1.5x) and log (1.7x), while Openlibm outperforms system Libm in almost all trigonometric functions (except cos).

Running on Kubuntu 21.04 (kernel 5.11.0-17-generic) in an Asus Rog Strix G531GT, CPU Intel Core i7-9750H CPU 2.60GHz.

Edit:
GNU libc version: 2.33
GNU libc release: release

The text was updated successfully, but these errors were encountered:

zimmermann6 · 2021-06-02T15:36:30Z

maybe you should mention the version of glibc you are comparing to, since starting from 2.28, the slow paths have been removed from some functions, improving their speed at the expense of less accurate results. In the next release (2.34) the slow path will be removed for tan, asin, acos, atan, and atan2 (all for double precision).

#include <gnu/libc-version.h>
      printf("GNU libc version: %s\n", gnu_get_libc_version ());
      printf("GNU libc release: %s\n", gnu_get_libc_release ());

jessymilare · 2021-06-02T16:28:39Z

maybe you should mention the version of glibc you are comparing to, since starting from 2.28, the slow paths have been removed from some functions, improving their speed at the expense of less accurate results. In the next release (2.34) the slow path will be removed for tan, asin, acos, atan, and atan2 (all for double precision).
#include <gnu/libc-version.h>
      printf("GNU libc version: %s\n", gnu_get_libc_version ());
      printf("GNU libc release: %s\n", gnu_get_libc_release ());

Got this:
GNU libc version: 2.33
GNU libc release: release

jessymilare · 2021-06-02T20:04:45Z

What does MPS mean? I usually see speed comparisons reported as microsecond per call or cycles per call. Also, what are the compiler options used in building system libm and openlibm with both compilers?

The benchmark code is test/libm-bench.cpp. MPS means millions (calls) per second (see #36).

jessymilare · 2021-06-03T01:15:38Z

What does MPS mean? I usually see speed comparisons reported as microsecond per call or cycles per call. Also, what are the compiler options used in building system libm and openlibm with both compilers?

The benchmark code is test/libm-bench.cpp. MPS means millions (calls) per second (see #36).

Thanks for pointing to the benchmark. What I'm curious about is the compilation options for the libraries and when building the benchmark. For example, the system libm may have been compiled for the lowest common denominator for an X86_64 system while openlibm is compiled with -march=native. In addition, it is important at least with GCC to use the -fno-builtins option when linking against libm to ensure the math library functions are actually called.

Openlibm was built with flags -fno-gnu89-inline -fno-builtin -O3 -fPIC -m64 -std=c99 -Wall and the output of make bench was:

jessica@jessica-Kubuntu-ROG-Strix-G531GT:~/Dev/openlibm-0.7.5$ cd test/
jessica@jessica-Kubuntu-ROG-Strix-G531GT:~/Dev/openlibm-0.7.5/test$ make bench
cc   -fno-gnu89-inline -fno-builtin -O3 -fPIC -m64 -std=c99 -Wall -I/home/jessica/Dev/openlibm-0.7.5 -I/home/jessica/Dev/openlibm-0.7.5/include -I/home/jessica/Dev/openlibm-0.7.5/amd64 -I/home/jessica/Dev/openlibm-0.7.5/src -DASSEMBLER -D__BSD_VISIBLE -Wno-implicit-function-declaration -I/home/jessica/Dev/openlibm-0.7.5/ld80  -m64 libm-bench.cpp -lm -o bench-syslibm
cc1plus: warning: command-line option ‘-Wno-implicit-function-declaration’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-std=c99’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-fno-gnu89-inline’ is valid for C/ObjC but not for C++
cc   -fno-gnu89-inline -fno-builtin -O3 -fPIC -m64 -std=c99 -Wall -I/home/jessica/Dev/openlibm-0.7.5 -I/home/jessica/Dev/openlibm-0.7.5/include -I/home/jessica/Dev/openlibm-0.7.5/amd64 -I/home/jessica/Dev/openlibm-0.7.5/src -DASSEMBLER -D__BSD_VISIBLE -Wno-implicit-function-declaration -I/home/jessica/Dev/openlibm-0.7.5/ld80  -m64 libm-bench.cpp -L.. -lopenlibm -Wl,-rpath=/home/jessica/Dev/openlibm-0.7.5 -o bench-openlibm
cc1plus: warning: command-line option ‘-Wno-implicit-function-declaration’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-std=c99’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-fno-gnu89-inline’ is valid for C/ObjC but not for C++

t-bltg · 2021-09-24T19:03:01Z

Also reproduced with glibc 2.34, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:

$ cd test; make bench
$ diff -y <(echo bench-syslibm; ./bench-syslibm) <(echo bench-openlibm; ./bench-openlibm)
bench-syslibm						      |	bench-openlibm
  pow     :  30.8371 MPS				      |	  pow     :  12.1320 MPS
  hypot   :  82.2698 MPS				      |	  hypot   :  56.0207 MPS
  exp     :  98.3850 MPS				      |	  exp     :  79.9451 MPS
  log     :  101.1390 MPS				      |	  log     :  71.0276 MPS
  log10   :  63.9289 MPS				      |	  log10   :  57.6193 MPS
  sin     :  80.4622 MPS				      |	  sin     :  103.3160 MPS
  cos     :  85.3477 MPS				      |	  cos     :  92.5908 MPS
  tan     :  65.5768 MPS				      |	  tan     :  53.8914 MPS
  asin    :  88.9095 MPS				      |	  asin    :  64.9857 MPS
  acos    :  81.0402 MPS				      |	  acos    :  70.5646 MPS
  atan    :  82.0650 MPS				      |	  atan    :  70.4644 MPS
  atan2   :  28.3463 MPS				      |	  atan2   :  34.4602 MPS

zimmermann6 · 2023-03-13T10:30:26Z

see https://members.loria.fr/PZimmermann/papers/accuracy.pdf (which uses Intel's compiler from 2023)

zimmermann6 · 2023-03-13T14:39:42Z

Do standard functions use other standard functions internally?

no, as far as I know.

Also, should not it be possible to write a wrapper around the double precision worst cases? Will that descrease max error?

I believe the answer is no, since worst cases are in regions where the algorithm used has a bad accuracy. If you add a special case for one input, another one near that input will also give a large error.

Yes all CRlibm functions have maximal error 0.5 ulp, as do the functions from CORE-MATH (https://core-math.gitlabpages.inria.fr/)

zimmermann6 · 2023-03-15T16:36:54Z

Why possible? cannot you trace the memory in IDA pro/radare2 to say that for sure?

I don't have the knowledge to do that.

Also reading your paper I see there is no mention of complex numbers at all (why?)

the paper is already quite long, feel free to do the same for complex numbers !

and I see there is no speed comparisons

we prefer to focus on accuracy

which THIS issue is all about

sorry I answered your question about worst cases

zimmermann6 · 2023-03-15T17:21:10Z

Are you aware of any evidence of Windows vs Linux issues?

yes, the next update of https://members.loria.fr/PZimmermann/papers/accuracy.pdf will include the Microsoft math library

BTW, is it possible to fix fdlibm with regards to pow(10, -5) printing 0.000009999999999999999 instead of just 0.0001?

1/10^5 is not exactly representable in binary. I prefer 0.000009999999999999999 which recalls you this fact.
But OpenJDK has now a new algorithm for printing floating-point numbers.

zimmermann6 · 2023-03-16T07:50:48Z

Microsoft just uses AMD

I'm not sure, since for the binary64 pow function, the largest error I get with AMD Libm 4.0 is 0.762 ulp (for x=0x1.00a000205d461p+1 and y=-0x1.fd35c41fc20bbp+9), whereas with Visual Studio 2022 I get 91.3 ulps (with x=0x1.fffff9c61ce40p-1 and y=0x1.c4e304ed4c734p+31).

jessymilare mentioned this issue Jun 2, 2021

Faster logl implementation available in FreeBSD msun #203

Open

jmather-sesi mentioned this issue Oct 14, 2023

openlibm performance on ARM server is very poor #282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Openlibm's `pow`, `hypot`, `exp` and `log` are considerably slower than Glibc's #234

Openlibm's `pow`, `hypot`, `exp` and `log` are considerably slower than Glibc's #234

jessymilare commented Jun 2, 2021 •

edited

Loading

zimmermann6 commented Jun 2, 2021

jessymilare commented Jun 2, 2021

jessymilare commented Jun 2, 2021

jessymilare commented Jun 3, 2021

t-bltg commented Sep 24, 2021

zimmermann6 commented Mar 13, 2023

zimmermann6 commented Mar 13, 2023 •

edited

Loading

zimmermann6 commented Mar 15, 2023

zimmermann6 commented Mar 15, 2023

zimmermann6 commented Mar 16, 2023

Openlibm's pow, hypot, exp and log are considerably slower than Glibc's #234

Openlibm's pow, hypot, exp and log are considerably slower than Glibc's #234

Comments

jessymilare commented Jun 2, 2021 • edited Loading

zimmermann6 commented Jun 2, 2021

jessymilare commented Jun 2, 2021

jessymilare commented Jun 2, 2021

jessymilare commented Jun 3, 2021

t-bltg commented Sep 24, 2021

zimmermann6 commented Mar 13, 2023

zimmermann6 commented Mar 13, 2023 • edited Loading

zimmermann6 commented Mar 15, 2023

zimmermann6 commented Mar 15, 2023

zimmermann6 commented Mar 16, 2023

Openlibm's `pow`, `hypot`, `exp` and `log` are considerably slower than Glibc's #234

Openlibm's `pow`, `hypot`, `exp` and `log` are considerably slower than Glibc's #234

jessymilare commented Jun 2, 2021 •

edited

Loading

zimmermann6 commented Mar 13, 2023 •

edited

Loading