Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster getpoint #369

Merged
merged 7 commits into from
May 6, 2023
Merged

faster getpoint #369

merged 7 commits into from
May 6, 2023

Conversation

rafaqz
Copy link
Collaborator

@rafaqz rafaqz commented Feb 22, 2023

getpoint is pretty slow currently, to the point rasterizing a shapefile from ArchGDAL.jl is 20x slower than one from Shapefile.jl. Mostly because getting points allocates every time.

This PR greatly (but not completely) reduces the gap by:

  1. Not returning GDAL points. They are pretty inefficient when we have thousands of points. A Tuple is better.
  2. Preallocating Ref for getpoint! in the iterator for Geointerfce.getpoint(linestring), and reusing them for each point.

This is implemented for line string but that means polygon will also work. It's probably not worth the effort of passing the Ref through from polygons, but I haven't checked. There may be more objects that need this treatment.

Edit: I assumed this would have a test already, but seems not.

@yeesian yeesian self-requested a review February 23, 2023 23:37
@evetion
Copy link
Collaborator

evetion commented Feb 24, 2023

I was slightly hesitant about changing the return type here, but I think this still fits as it qualifies as point.

We could better document that we essentially are changing GeoInterface.coordinates to be a nested Vector of a Tuple instead of a Vector. That has impact on some construction methods as well.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 24, 2023

Yeah, I tried to optimise the gdal point first but the overhead of allocating them all individually is just really bad.

Once all the methods here accept any point it won't make much difference either.

And yes I've noticed there's still a lot of nested vector around. It would be a good performance gain to swap them all to tuples.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 24, 2023

Another thought I had writing this is it would be faster to use getx, gety etc because they don't need allocations at all.

But it assumes trad point order, so I went with getpoint!.

Copy link
Owner

@yeesian yeesian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd err on the side of assuming this is a breaking/major change.

There may be more objects that need this treatment. [...] Once all the methods here accept any point it won't make much difference either. [...] And yes I've noticed there's still a lot of nested vector around. It would be a good performance gain to swap them all to tuples.

Can we do it all within a single PR? I don't think it'd be a good idea to have inconsistencies in return types across geometry types.

src/geointerface.jl Show resolved Hide resolved
src/geointerface.jl Show resolved Hide resolved
@rafaqz
Copy link
Collaborator Author

rafaqz commented Feb 25, 2023

I guess its a breaking change? But calling GeoInerface methods returns a lot of types of points already, and it doesn't promise what kind of point comes from what parent object.

I also doubt it will actually break much.

We should actually put this on getgeom and I guess it's just multipoint missing? I'm not sure that will be faster because it's accessed differently.

yeesian
yeesian previously approved these changes Mar 30, 2023
Copy link
Owner

@yeesian yeesian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM because of #369 (comment) --

I was slightly hesitant about changing the return type here, but I think this still fits as it qualifies as point.

Will get a second opinion from @evetion nevertheless just in case there's anything other context from GeoInterface I might be missing

@yeesian yeesian requested a review from evetion March 30, 2023 23:44
@evetion
Copy link
Collaborator

evetion commented Mar 31, 2023

Do we have a roundtrip test? I.e. ArchGDAL.createpoint(GeoInterface.getpoint))? Similarly for GeoInterface.coordinates, if this PR changes that.

@rafaqz
Copy link
Collaborator Author

rafaqz commented Apr 23, 2023

It doesn't touch coordinates. But yes round trip on points would be good. I think that should just work already.

Edit: a tuple is just a normal input for createpoint, it's already tested..

Edit2: I added round trip tests for LineaRing and LineString

Copy link
Collaborator

@visr visr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup, looks good to me!

@yeesian yeesian merged commit 3997df4 into yeesian:master May 6, 2023
8 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants