Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise extract for Regular lookups #625

Closed
alex-s-gardner opened this issue Apr 5, 2024 · 12 comments · Fixed by #641
Closed

Optimise extract for Regular lookups #625

alex-s-gardner opened this issue Apr 5, 2024 · 12 comments · Fixed by #641

Comments

@alex-s-gardner
Copy link
Contributor

I feel like this is an incredibly naïve question but I don't think I've been able to find the proper way to index into a Raster with a list of points.

Here's what I do now:

using Rasters
# Rasters v0.10.1

x = X(0.5:0.1:90);
y = Y(0.5:0.1:90);
A = Raster(zeros(UInt8, y, x))

f = GeoInterface.PointTuple.([(X=1.4, Y=9.9), (X=1.3, Y=10), (X=70, Y=30)])

pts = extract(A, f, atol=0.2; index=true, geometry=false)

For very large datasets this is slower than I would expect and requires me to specify a tolerance which is a bit of a hassle, so I try to see if I can use Near() on points handed to extract like this:

pts = extract(A, Near(f); index=true, geometry=false)

But that doesn't work. Next I tried

v = similar(A, 0)
for (x, y) in f
    push!(v, A[X(Near(x)), Y(Near(y))])
end

But this was much slower than extract.

Is there a better way to do what I'm trying to do?

Thanks!

@rafaqz
Copy link
Owner

rafaqz commented Apr 5, 2024

Your raster data is Points not Intervals? Thats why atol is needed.

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Apr 5, 2024

What if atol was automatically set to the smallest grid resolution / 2?

@alex-s-gardner
Copy link
Contributor Author

alex-s-gardner commented Apr 5, 2024

I'm not sure if this will work for Rasters.jl generally but I've found it faster (15x), for large numbers of points, to calculate the index directly and avoid using extract:

pt_x = getindex.(f,1)
pt_y = getindex.(f,2)

r = round.(Int64,(pt_x .- x[1])./step(x)).+1
c = round.(Int64, (pt_y .- y[1]) ./ step(y)).+1

idx = [CartesianIndex(a,b) for (a,b) in  zip(r,c)]

A[idx]

@rafaqz
Copy link
Owner

rafaqz commented Apr 6, 2024

We should implement something like that yes. But it only works when your lookup is a fixed step range (some netcdf dimensions are not), and what we have now is generic.

And we can't index to half steps for Points because its missing the whole "point" of having them! They don't span the whole interval. There is just no data between them. If there is data to extract between your points, what you have should be intervals.

I know GDAL only half heartedly implements points and intervals, but we do it properly here ;)

To extract from real point data you should do some kind of interpolation.

@rafaqz
Copy link
Owner

rafaqz commented Apr 9, 2024

@alex-s-gardner wondering if you agree that switching sampling to Intervals() is a tolerable solution here.

@alex-s-gardner
Copy link
Contributor Author

Intervals() vs Points() is undocumented. How would it change the behavior of extract?

@rafaqz
Copy link
Owner

rafaqz commented Apr 9, 2024

It may not be amazingly well documented but it is documented:
https://rafaqz.github.io/DimensionalData.jl/v0.26.7/api/lookuparrays#DimensionalData.Dimensions.Lookups.Intervals
https://rafaqz.github.io/DimensionalData.jl/v0.26.7/selectors#Lookups

And most of the junk printed in the repl along with a Raster is actually useful information!! It shows if you have points or intervals for each dimension. Points are the default, because they are the simplest.

Most of these things are in DimensionalData.jl.

To convert your raster to Intervas on X and Y, do:

using Rasters.LookupArrays
newraster = Rasters.set(raster, X=Intervals(), Y=Intervals())

But you probably know if the lookup values represent the start or center of the pixel. If they are the start like in GDAL you want to specify that, as center is the default :

newraster = Rasters.set(raster, X=Intervals(Start()), Y=Intervals(Start()))

@rafaqz
Copy link
Owner

rafaqz commented Apr 9, 2024

In this example the quick way is to pass the sampling keyword to the dimension constructor - which will forward it to the detected lookup:

x = X(0.5:0.1:90; sampling=Intervals(Start()));
y = Y(0.5:0.1:90; sampling=Intervals(Start()));

And you can see you now have intervals:

julia> A = Raster(zeros(UInt8, y, x))
╭─────────────────────────╮
│ 896×896 Raster{UInt8,2} │
├─────────────────────────┴──────────────────────────────────────────── dims ┐
   Y Sampled{Float64} 0.5:0.1:90.0 ForwardOrdered Regular Intervals{Start},
   X Sampled{Float64} 0.5:0.1:90.0 ForwardOrdered Regular Intervals{Start}
├──────────────────────────────────────────────────────────────────── raster ┤
  extent: Extent(Y = (0.5, 90.1), X = (0.5, 90.1))
  missingval: missing
└────────────────────────────────────────────────────────────────────────────┘
        0.5     0.6     0.7     0.8     0.9     1.0     1.1     1.2     1.3     1.4     1.5     1.6      88.9    89.0    89.1    89.2    89.3    89.4    89.5    89.6    89.7    89.8    89.9    90.0
  0.5  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
  0.6  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
  0.7  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
  0.8  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
  0.9  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00      0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
  1.0  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
                                                                                                                                                                                            
 89.5  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
 89.6  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00      0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
 89.7  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
 89.8  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
 89.9  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
 90.0  0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00       0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

@alex-s-gardner
Copy link
Contributor Author

Does extract behave differently for Intervals()? Just changing the Rater to intervals had no impact on performance.

@rafaqz
Copy link
Owner

rafaqz commented Apr 9, 2024

Im fixing your complaint about needing atol??

@rafaqz
Copy link
Owner

rafaqz commented Apr 10, 2024

I will PR at some stage with your performance suggestion too.

The issue is we need to separate out dispatch just for Regular lookups where that optimisation is possible.

@rafaqz rafaqz changed the title How should I extract a set of points from a Raster? Optimise extract for Regular lookups Apr 10, 2024
@rafaqz
Copy link
Owner

rafaqz commented Apr 11, 2024

@tiemvanderdeure also thinks the option of using Near would also be good for Points to avoid atol. So we can have that too - at least it makes it explicit what we are doing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants