Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance hit with Honer's algorithm #2219

Closed
Dapid opened this issue Jul 1, 2024 · 4 comments · Fixed by #2222
Closed

Performance hit with Honer's algorithm #2219

Dapid opened this issue Jul 1, 2024 · 4 comments · Fixed by #2222

Comments

@Dapid
Copy link
Contributor

Dapid commented Jul 1, 2024

Consider this code to evaluate polynomials (and make pretty fractals):

import sys

import numpy as np
import matplotlib.pyplot as plt

import tqdm

accelerate = '--aot' in sys.argv

if accelerate:
    from extern import newton_step

else:
    def eval_poly(coeffs, x):
        # Taken from numpy.polynomial.polynomial._val
        c0 = coeffs[-1] + x*0
        for i in range(2, len(coeffs) + 1):
            c0 = coeffs[-i] + c0*x
        return c0

    def newton_step(z, poly_coeffs, der_coeffs):
        return z - eval_poly(poly_coeffs, z) / eval_poly(der_coeffs, z)

N = 1024
iter = 4

extent = 1.5
real = np.linspace(-extent, extent, N)
imag = np.linspace(-extent, extent, N)

xv, yv = np.meshgrid(real, imag)
z = xv + 1j * yv

roots = [1, np.exp(2j * np.pi / 3), np.exp(-2j * np.pi / 3)]

polynomial = np.polynomial.Polynomial.fromroots(roots)
deriv = polynomial.deriv()

poly_coeffs = polynomial.coef
der_coeffs = deriv.coef

for it in tqdm.trange(iter):
    z = newton_step(z, poly_coeffs, der_coeffs)

# -- This is just to make it pretty
frame = np.zeros((N, N, 3), dtype=np.float64)

for i, root in enumerate(roots):
    dist = np.abs(z - root)
    frame[:, :, i] = 1/(dist ** 2 + 1)

frame_norm = frame / np.linalg.norm(frame, axis=-1)[:, :, None]
plt.imshow(frame_norm)
plt.show()

with extern.py:

    c0 = coeffs[-1] + x*0
    for i in range(2, len(coeffs) + 1):
        c0 = coeffs[-i] + c0*x
    return c0

#pythran export newton_step(complex128[:, :], complex128[:], complex128[:])
def newton_step(z, poly_coeffs, der_coeffs):
    return z - eval_poly(poly_coeffs, z) / eval_poly(der_coeffs, z)

Running in pure Python mode, I get over 7 it/s, but with Pythran I get over 4 s/it. This is with both release and master, Python 3.11, on Fedora 40 (GCC 14.1.1).

I don't know what is going on, but this should be the kind of problems Pythran shines at.

Restricting ourselves to reals we get an even larger slow down, from 17 it/s in pure Python, down to 3.8 s/it with Pythran:

z = xv + yv

roots = [1, np.exp(2j * np.pi / 3), np.exp(-2j * np.pi / 3)]

polynomial = np.polynomial.Polynomial.fromroots(roots)
deriv = polynomial.deriv()


poly_coeffs = np.real(polynomial.coef).copy()
der_coeffs = np.real(deriv.coef).copy()

for it in tqdm.trange(iter):
    z = newton_step(z, poly_coeffs, der_coeffs)

and #pythran export newton_step(float64[:, :], float64[:], float64[:])

@serge-sans-paille
Copy link
Owner

I can reproduce the issue. Investigating...

@serge-sans-paille
Copy link
Owner

Can you confirm that passing -O2 to pythran fixes the issue? It looks like the python-config --cflags no longer includes any optimization flag o_O

serge-sans-paille added a commit that referenced this issue Jul 13, 2024
… step

This used to be enforced by distutils but it seems to no longer be the
case.

Fix #2219
@serge-sans-paille
Copy link
Owner

#2222 may do the trick. @Dapid I'll wait for your confirmation to merge.

@Dapid
Copy link
Contributor Author

Dapid commented Jul 13, 2024

It does work, now I have a 4.5x over pure Python, and a 6.5x on the real case.

Thanks!

serge-sans-paille added a commit that referenced this issue Jul 14, 2024
… step

This used to be enforced by distutils but it seems to no longer be the
case.

Fix #2219
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants