Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss typing of Cython API #236

Open
kylebarron opened this issue Apr 25, 2022 · 2 comments
Open

Discuss typing of Cython API #236

kylebarron opened this issue Apr 25, 2022 · 2 comments

Comments

@kylebarron
Copy link
Contributor

Making a new issue to come back to what the Cython API should look like for best performance.

I added a couple commits to show how the vectorized functions could call our internal cython API instead of calling in to h3lib directly. But note this benchmark comparing against the master branch:

Calling internal cython functions (commit 798da86 (#224) )

geo_to_h3 bench
76.5 ms ± 941 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
h3_to_parent bench
6.52 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
h3_get_resolution bench
2.39 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On master branch (commit d1a5a3b)

geo_to_h3 bench
75.8 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
h3_to_parent bench
371 µs ± 52.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
h3_get_resolution bench
188 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

geo_to_h3 is the same speed in each, but h3_to_parent and h3_get_resolution is roughly an order of magnitude on this branch at 798da86 (#224).

I believe the reasons for this is

  1. the Cython geo_to_h3 signature is fully typed, while the parent signature is not. Additionally, the parent function accepts Python, not C types.

    cpdef H3int geo_to_h3(double lat, double lng, int res) except 1:

    cpdef H3int parent(H3int h, res=None) except 0:

  2. The parent and resolution functions do extra checks with check_cell and check_res.

I think it will be valuable in terms of code reuse for the vectorized API to call the internal Cython functions. So to keep performance high, I'd advocate for the Cython API to be fully typed. I.e.:

  • Instead of res = None, use res = -1
  • Create exception messages in Python instead of Cython?
  • Analyze the Cython report to see where it's calling into Python

If we also remove any Python types from Cython, our Cython API can be marked as nogil, which would allow our vectorized API to release the GIL.

Bench script

Note this needs to be saved as an .ipy file and run with ipython bench.ipy

import h3.unstable.vect as vect
import numpy as np

N = 100_000
K = 10

lats, longs = np.random.uniform(0, 90, N), np.random.uniform(0, 90, N)
h3_index = vect.geo_to_h3(lats, longs, 9)

print('geo_to_h3 bench')
%timeit vect.geo_to_h3(lats, longs, 9)

print('h3_to_parent bench')
%timeit vect.h3_to_parent(h3_index, 8)
# 411 µs ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

print('h3_get_resolution bench')
%timeit vect.h3_get_resolution(h3_index)

Originally posted by @kylebarron in #224 (comment)

@ajfriend
Copy link
Contributor

ajfriend commented May 1, 2022

Totally on board with this as we start to expose the Cython API! The appearance of Python types here was a result of the fact that the original API wasn't designed with vectorization in mind.

@ajfriend
Copy link
Contributor

In terms completely removing any Python types from the Cython level, I think there's some overlap with #267 and #265

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants