Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openlibm's pow, hypot, exp and log are considerably slower than Glibc's #234

Open
jessymilare opened this issue Jun 2, 2021 · 10 comments
Open

Comments

@jessymilare
Copy link

jessymilare commented Jun 2, 2021

I just ran benchmarks of openlibm (compiled by gcc and clang) and system libm (Glibc), and these were the results:

            openlibm(gcc)       syslibm       openlibm(clang)
  pow     :  17.4427 MPS   :  69.7010 MPS   :  18.1047 MPS
  hypot   :  72.7657 MPS   :  107.5491 MPS  :  76.4914 MPS
  exp     :  132.5313 MPS  :  197.3627 MPS  :  145.4958 MPS
  log     :  129.2455 MPS  :  216.8139 MPS  :  130.2769 MPS
  log10   :  103.1023 MPS  :  118.6334 MPS  :  96.9040 MPS
  sin     :  176.9966 MPS  :  157.4333 MPS  :  150.2365 MPS
  cos     :  162.1996 MPS  :  166.1857 MPS  :  161.8407 MPS
  tan     :  93.6082 MPS   :  76.9096 MPS   :  90.7496 MPS
  asin    :  124.9897 MPS  :  114.8927 MPS  :  128.0259 MPS
  acos    :  140.7629 MPS  :  110.8343 MPS  :  129.6873 MPS
  atan    :  120.1154 MPS  :  79.6637 MPS   :  130.1246 MPS
  atan2   :  54.1519 MPS   :  43.4398 MPS   :  58.3252 MPS

System Libm is considerably faster than Openlibm on pow (4x) hypot, (1.5x) exp (1.5x) and log (1.7x), while Openlibm outperforms system Libm in almost all trigonometric functions (except cos).

Running on Kubuntu 21.04 (kernel 5.11.0-17-generic) in an Asus Rog Strix G531GT, CPU Intel Core i7-9750H CPU 2.60GHz.

Edit:
GNU libc version: 2.33
GNU libc release: release

@zimmermann6
Copy link

maybe you should mention the version of glibc you are comparing to, since starting from 2.28, the slow paths have been removed from some functions, improving their speed at the expense of less accurate results. In the next release (2.34) the slow path will be removed for tan, asin, acos, atan, and atan2 (all for double precision).

#include <gnu/libc-version.h>
      printf("GNU libc version: %s\n", gnu_get_libc_version ());
      printf("GNU libc release: %s\n", gnu_get_libc_release ());

@jessymilare
Copy link
Author

maybe you should mention the version of glibc you are comparing to, since starting from 2.28, the slow paths have been removed from some functions, improving their speed at the expense of less accurate results. In the next release (2.34) the slow path will be removed for tan, asin, acos, atan, and atan2 (all for double precision).

#include <gnu/libc-version.h>
      printf("GNU libc version: %s\n", gnu_get_libc_version ());
      printf("GNU libc release: %s\n", gnu_get_libc_release ());

Got this:
GNU libc version: 2.33
GNU libc release: release

@jessymilare
Copy link
Author

What does MPS mean? I usually see speed comparisons reported as microsecond per call or cycles per call. Also, what are the compiler options used in building system libm and openlibm with both compilers?

The benchmark code is test/libm-bench.cpp. MPS means millions (calls) per second (see #36).

@jessymilare
Copy link
Author

What does MPS mean? I usually see speed comparisons reported as microsecond per call or cycles per call. Also, what are the compiler options used in building system libm and openlibm with both compilers?

The benchmark code is test/libm-bench.cpp. MPS means millions (calls) per second (see #36).

Thanks for pointing to the benchmark. What I'm curious about is the compilation options for the libraries and when building the benchmark. For example, the system libm may have been compiled for the lowest common denominator for an X86_64 system while openlibm is compiled with -march=native. In addition, it is important at least with GCC to use the -fno-builtins option when linking against libm to ensure the math library functions are actually called.

Openlibm was built with flags -fno-gnu89-inline -fno-builtin -O3 -fPIC -m64 -std=c99 -Wall and the output of make bench was:

jessica@jessica-Kubuntu-ROG-Strix-G531GT:~/Dev/openlibm-0.7.5$ cd test/
jessica@jessica-Kubuntu-ROG-Strix-G531GT:~/Dev/openlibm-0.7.5/test$ make bench
cc   -fno-gnu89-inline -fno-builtin -O3 -fPIC -m64 -std=c99 -Wall -I/home/jessica/Dev/openlibm-0.7.5 -I/home/jessica/Dev/openlibm-0.7.5/include -I/home/jessica/Dev/openlibm-0.7.5/amd64 -I/home/jessica/Dev/openlibm-0.7.5/src -DASSEMBLER -D__BSD_VISIBLE -Wno-implicit-function-declaration -I/home/jessica/Dev/openlibm-0.7.5/ld80  -m64 libm-bench.cpp -lm -o bench-syslibm
cc1plus: warning: command-line option ‘-Wno-implicit-function-declaration’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-std=c99’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-fno-gnu89-inline’ is valid for C/ObjC but not for C++
cc   -fno-gnu89-inline -fno-builtin -O3 -fPIC -m64 -std=c99 -Wall -I/home/jessica/Dev/openlibm-0.7.5 -I/home/jessica/Dev/openlibm-0.7.5/include -I/home/jessica/Dev/openlibm-0.7.5/amd64 -I/home/jessica/Dev/openlibm-0.7.5/src -DASSEMBLER -D__BSD_VISIBLE -Wno-implicit-function-declaration -I/home/jessica/Dev/openlibm-0.7.5/ld80  -m64 libm-bench.cpp -L.. -lopenlibm -Wl,-rpath=/home/jessica/Dev/openlibm-0.7.5 -o bench-openlibm
cc1plus: warning: command-line option ‘-Wno-implicit-function-declaration’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-std=c99’ is valid for C/ObjC but not for C++
cc1plus: warning: command-line option ‘-fno-gnu89-inline’ is valid for C/ObjC but not for C++

@t-bltg
Copy link

t-bltg commented Sep 24, 2021

Also reproduced with glibc 2.34, Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz:

$ cd test; make bench
$ diff -y <(echo bench-syslibm; ./bench-syslibm) <(echo bench-openlibm; ./bench-openlibm)
bench-syslibm						      |	bench-openlibm
  pow     :  30.8371 MPS				      |	  pow     :  12.1320 MPS
  hypot   :  82.2698 MPS				      |	  hypot   :  56.0207 MPS
  exp     :  98.3850 MPS				      |	  exp     :  79.9451 MPS
  log     :  101.1390 MPS				      |	  log     :  71.0276 MPS
  log10   :  63.9289 MPS				      |	  log10   :  57.6193 MPS
  sin     :  80.4622 MPS				      |	  sin     :  103.3160 MPS
  cos     :  85.3477 MPS				      |	  cos     :  92.5908 MPS
  tan     :  65.5768 MPS				      |	  tan     :  53.8914 MPS
  asin    :  88.9095 MPS				      |	  asin    :  64.9857 MPS
  acos    :  81.0402 MPS				      |	  acos    :  70.5646 MPS
  atan    :  82.0650 MPS				      |	  atan    :  70.4644 MPS
  atan2   :  28.3463 MPS				      |	  atan2   :  34.4602 MPS

@zimmermann6
Copy link

see https://members.loria.fr/PZimmermann/papers/accuracy.pdf (which uses Intel's compiler from 2023)

@zimmermann6
Copy link

zimmermann6 commented Mar 13, 2023

Do standard functions use other standard functions internally?

no, as far as I know.

Also, should not it be possible to write a wrapper around the double precision worst cases? Will that descrease max error?

I believe the answer is no, since worst cases are in regions where the algorithm used has a bad accuracy. If you add a special case for one input, another one near that input will also give a large error.

Yes all CRlibm functions have maximal error 0.5 ulp, as do the functions from CORE-MATH (https://core-math.gitlabpages.inria.fr/)

@zimmermann6
Copy link

Why possible? cannot you trace the memory in IDA pro/radare2 to say that for sure?

I don't have the knowledge to do that.

Also reading your paper I see there is no mention of complex numbers at all (why?)

the paper is already quite long, feel free to do the same for complex numbers !

and I see there is no speed comparisons

we prefer to focus on accuracy

which THIS issue is all about

sorry I answered your question about worst cases

@zimmermann6
Copy link

Are you aware of any evidence of Windows vs Linux issues?

yes, the next update of https://members.loria.fr/PZimmermann/papers/accuracy.pdf will include the Microsoft math library

BTW, is it possible to fix fdlibm with regards to pow(10, -5) printing 0.000009999999999999999 instead of just 0.0001?

1/10^5 is not exactly representable in binary. I prefer 0.000009999999999999999 which recalls you this fact.
But OpenJDK has now a new algorithm for printing floating-point numbers.

@zimmermann6
Copy link

Microsoft just uses AMD

I'm not sure, since for the binary64 pow function, the largest error I get with AMD Libm 4.0 is 0.762 ulp (for x=0x1.00a000205d461p+1 and y=-0x1.fd35c41fc20bbp+9), whereas with Visual Studio 2022 I get 91.3 ulps (with x=0x1.fffff9c61ce40p-1 and y=0x1.c4e304ed4c734p+31).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@jessymilare @t-bltg @zimmermann6 and others