Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rasterize by a specified function and column #333

Closed
harithmorgadinho opened this issue Nov 14, 2022 · 2 comments · Fixed by #336
Closed

rasterize by a specified function and column #333

harithmorgadinho opened this issue Nov 14, 2022 · 2 comments · Fixed by #336

Comments

@harithmorgadinho
Copy link

Hi there,
I was wondering if the Raster.jl package is able to rasterize by a specified function and using a specific column.
Thanks

@rafaqz
Copy link
Owner

rafaqz commented Nov 14, 2022

This is a great idea

Maybe instead using fill for the column name as we currently do we can have a source keyword that takes e.g. source=:area and could also accept a function of the whole row, like source=row -> 1/GeoDataFrames.area(row.geometry)

Then for the function we should be more specific than R raster and call it reduce - as it has to be a reducing function that accepts a list and returns a scalar. The default would be last, which is what it does now by just writing over whatever was there before.

So something like:

rasterize(df; to=myraster, source=:mycolumnname, reduce=mean)

What do you think?

Getting this to be fast and light on memory is the problem, as we need to store the values from each layer for every cell for the reducing function.

One option for doing these reductions quickly but limit memory use is to break the raster into chunks, see which polygons overlap the chunk and calculate the vectors for each chunk. Then the memory can be much smaller and also be reused.

For sum, mean first and last we can write optimizations where we don't actually store all the values, so memory/chunking isn't such a problem. But I think running any arbitrary user-defined functions is more powerful than just using these basic functions.

Edit: you might also want a groupby keyword that groups e.g. polygons from the same :species so they aren't double counted.

@harithmorgadinho
Copy link
Author

Thanks,

This is very close to what I believe would be the most useful feature of the rasterizing function on the Rasters.jl package. The one feature I would also propose is that you also add the option to calculate the values of the raster cells on the touching polygons. This is because:

(1) in rasterize from the raster package you can only obtain a rasterization of polygons if the polygon crosses the center of the grid cell. The exception is if we use the getCover option, which then can only be done to one layer.

(2) fasterize only does sums and it doesn't have the function of touching polygons - it is, however, faster than rasterize but it is a one-purpose tool.

(3) the rasterize from the terra package does rasters based on polygon touches and is able to use different functions to calculate the values for the grid cells. What it doesn't do is both.

If this could be implemented in the Rasters.jl package it would be great.

Thank you,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants