Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected polygon rasterization results #376

Open
Mikumikunisiteageru opened this issue Jan 18, 2023 · 14 comments
Open

Unexpected polygon rasterization results #376

Mikumikunisiteageru opened this issue Jan 18, 2023 · 14 comments
Labels
enhancement New feature or request

Comments

@Mikumikunisiteageru
Copy link

Dear Mr. Schouten and other developers,

I am new to this package. Please forgive me for my recklessness if the following issues are due to my misuse or misunderstanding.

I guess that there may be a bug lying in the rasterization routine for (multi)polygons when boundary is set to :touches, or more specifically, in the function _fill_line!. Here is an minimal working example (running in Julia v1.8.2):

using Rasters # v0.5.1 (latest version)
using Plots   # v1.38.2

x1, y1 = 1.00, 4.50
x2, y2 = 4.75, 0.75

raster = Raster(zeros(Bool, X(0.5:1.0:6.5), Y(0.5:1.0:5.5)))
Rasters._fill_line!(raster, (start=(x=x1, y=y1), stop=(x=x2, y=y2)), true)

plot(raster, xlims=(0, 7), ylims=(0, 6), cmap=:summer)
for x = 0:7
	plot!([x, x], [0, 6], lc=:black, lw=0.4, label=nothing)
end
for y = 0:6
	plot!([0, 7], [y, y], lc=:black, lw=0.4, label=nothing)
end
plot!([x1, x2], [y1, y2], lc=:coral, linewidth=3, label=nothing)
xticks!(0:7)
yticks!(0:6)

savefig("rasterize.pdf")

In the given raster grid, the pixels are bounded by adjacent integer coordinates. Thus we may denote a pixel by its right-above coordinates, e.g., (3,5) for the pixel ranging from 2 to 3 on x-axis and from 4 to 5 on y-axis. In this example, the straight line connecting (1.00, 4.50) and (4.75, 0.75) should touches exactly eight pixels including

  • (2,5), (2,4), (3,4), (3,3), (4,3), (4,2), (5,2), (5,1).

But the current routine returns a different set of pixels (Figure), i.e.,

  • (2,5), (3,5), (3,4), (4,4), (4,3), (5,3), (5,2), (6,2).

Figure: Rasterization of the line from (1.00, 4.50) to (4.75, 0.75)

The unexpected behavior is observed when x1 is an integer. When x1 is something like 0.99 or 1.01, the package returns the correct rasterization results. Therefore I suspect that the problem is caused by rounding of floating numbers, although I am not yet able to locate it.

Another issue, which I don't think is a bug like the one above, is about symmetry. Please consider an edge totally overlapping with some line of the grid, e.g., from (1.2, 4.0) to (3.4, 4.0). Now the rasterization functions return the pixels on the positive side of the line, i.e., (2,5), (3,5), and (4,5). But positivity may sometimes be arbitrary --- thus the preference may not be fair. From my point of view, it may be better to have these edges contribute no pixels in the rasterization process.

@rafaqz
Copy link
Owner

rafaqz commented Jan 18, 2023

Thanks for the issue and full MWE!

Making issues is never reckless, please report anything you think is even slightly wrong, slow, or confusing! But also it's best to separate your two concerns into two separate issues, or one of them may be forgotten.

There are a lot of changes to the rasterization algorithms on the way, I'm just struggling to find time to write tests for them: #336

But I will be sure to add tests for the integer case.

As for if the line should contribute, GDAL has the same problem, and a more confusing solution which is to go positive on x and negative on y (or the opposite, I can't remember).

We can do better than GDAL as we have all the flexibility of writing it in Julia. We just don't have the manpower to workshop these things much.

So if you have time, please make another issue detailing the exact logic (even with plot mockups like above) of how you think rasterization should work in regards to lines and their interactions with the intervals of a pixel.

If you write a good, logical spec I would implement it exactly. Working out what should happen is the hardest part and you seem to have more ideas than I do.

Edit: We have the option of adding IntervalSets.jl intervals as the values in the lookup index, which should let us specify closed/open intervals for the pixels to make it work however we want. Even per pixel!

It's also just something I haven't had time to implement

@Mikumikunisiteageru
Copy link
Author

Thanks for pointing out that the two issues should be separated. I just thought that they were of the same topic and highly related.

Well, to make the discussions consecutive, I choose to give out my answer here instead of opening another issue.

Actually I am the author of Mikrubi.jl, a package you probably still remember. (Thank you so much for the suggestions! I haven't make the repo public here just because I am not sure whether I would break any rules if I do so.) In that package, I wrote the rasterization routine carefully. The logic there is:

  1. Rescale / affine the line segments to an integer grid;
  2. Fill the polygon inside by line scanning algorithm; and
  3. Collect the pixels touched by the polygon edges.

The problem we discuss here lies in the last step. To illustrate how I handle that, please allow me to paste my core implementation verbatim:

function strokeedge(x1, y1, x2, y2, xb, yb)
	pixels = Tuple{Int,Int}[]
	x1, y1, x2, y2 = trimedge(x1, y1, x2, y2, xb, yb)
	isnan(x1) && return pixels
	xr = floor(Int, min(x1, x2)) + 1 : ceil(Int, max(x1, x2))
	yr = floor(Int, y1) + 1 : ceil(Int, y2)
	(length(xr) == 0 || length(yr) == 0) && return pixels
	(length(xr) == 1 || length(yr) == 1) && return I.(xr, yr)
	y2x = interpolate(y1, y2, x1, x2)
	if x1 < x2
		xleft = min(xb, max(0, x1))
		for y = yr[begin:end-1]
			xright = min(xb, max(0, y2x(y)))
			append!(pixels, I.(floor(Int, xleft) + 1 : ceil(Int, xright), y))
			xleft = xright
		end
		xright = min(xb, max(0, x2))
		return append!(pixels, 
			I.(floor(Int, xleft) + 1 : ceil(Int, xright), ceil(Int, y2)))
	else
		xright = min(xb, max(0, x1))
		for y = yr[begin:end-1]
			xleft = min(xb, max(0, y2x(y)))
			append!(pixels, I.(floor(Int, xleft) + 1 : ceil(Int, xright), y))
			xright = xleft
		end
		xleft = min(xb, max(0, x2))
		return append!(pixels, 
			I.(floor(Int, xleft) + 1 : ceil(Int, xright), ceil(Int, y2)))
	end
end

In this function, x1 and y1 locate the starting point of the straight line, and x2 and y2 mark the stopping point. The grid ranges from 0 to xb on x-axis and from 0 to yb on y-axis, where xb and yb are integers. The line is first trimmed by trimedge, which keeps its sensible part (visible in the grid), and then the pixels the trimmed edge touches are calculated directly. As you can see, floors and ceils are used alternately, according to the relative positions of the two endpoints.

Not sure if there are better or smarter ways to do that, but at least I think the codes work well. I hope this can help a little bit.

@rafaqz
Copy link
Owner

rafaqz commented Jan 18, 2023

Oh right, I have actually read your method!

In practice we need a much faster method, thus using the Bresenham style algorithm, for use cases of the scale we use in #336

But we do need to get it right here.

If you can, please check out the branch in the pull request (instructions in the description at the top) and see if it fixes your problem.

Run this in the REPL to use:
] add Rasters#reduce_rasterize

(But I will also look at incorporating your algorithm instead, it does seem simpler than mine)

@rafaqz
Copy link
Owner

rafaqz commented Jan 18, 2023

The internal method is now called:

Rasters._burn_line!(raster, (start=(x=x1, y=y1), stop=(x=x2, y=y2)), true)

And no it still has this bug. Ok, I will try to fix before next week. Its strange how it only occurs with exact integer values and is otherwise fine.

@rafaqz
Copy link
Owner

rafaqz commented Jan 18, 2023

Ok good news, the first issue is now fixed in the PR.

It was just wrong use of ceil and floor that only shows up in these exact integer examples.

Your second problem is harder because we need to agree on the exact specification - I don't understand your code enough to know what you mean from just that.
rasterize

Do you mean that the 0-1, 4-5 cell in the "1.0" plot should not be true because the line is only just touched?

@Mikumikunisiteageru
Copy link
Author

Good to know the progress in the morning!

It is always reasonable to care about time. I admit that my overall rasterization routine is not as fast as yours, but in this very step (your _fill_line!, my strokeedge), our algorithms would not differ too much. After all, they are both of $\mathcal O(d)$ time complexity, where $d$ is the number of pixels that is filled, or in your term, (approximately) the Manhattan distance between the two endpoints. I don't think it is apparent that _fill_line! (in v0.5.1) runs faster than strokeedge.

And yes, I think the $[0,1] \times [4,5]$ cell should not be included in the "1.0" case, because its "inside", the open set $(0,1) \times (4,5)$, is untouched. My exact specification is: a pixel is included if and only if its inside is touched. (By the way, it is strange that (1,5) grid cell is selected in the "1.00001" case but not so in the "0.99999" case. Did you misplace the titles?)

Let me explain my idea. trimedge does an extra job that is not summarized in its name, which is to ensure y1 <= y2 (otherwise (x1, y1) and (x2, y2) are swapped). Thus, yr is the range on y-axis containing all pixels (in your term, cells) that may be touched. Since y1 is always smaller, the direction (floor or ceil) has been determined. Similarly, xr is the range on x-axis.

If xr or yr has zero length, the edge (in your term, the line) lies in some "tile joint" and therefore does not contribute. If xr or yr has length of one, all pixels involved are aligned, so they are directly output. Most practical cases has now been included. In other cases, we need to do that row by row, i.e., by iterating on y-axis.

Without loss of generality, let us assume x1 < x2. Since y1 <= y2, the edge is ascending (axis style "xy" instead of "ij" in MATLAB). For each row of pixels (row number in yr), we calculate the leftmost and the rightmost pixels whose inside is touched by the edge, and then push all pixels between the two ends (inclusive) into pixels, a container for pixels that are involved in the rasterization process (in your term, those filled). Here I iteratively assign xright to xleft so that I can call the interpolation function y2x less frequently (although that is already fast). We can see that this is an $\mathcal O(d)$ implementation, because every possible pixel is processed only once. Perhaps it is even faster than the Bresenham algorithm. By the way, I don't think adding a constant shift every row is a good idea, because the error accumulates and therefore it is not very numerically robust.

I would be happy if my codes and explanation can be of a little use to you and to the package. But I am still not very familiar with the function interfaces in Rasters.jl, so I am afraid that I would break your design somewhere if I randomly modify your codes by myself.

@Mikumikunisiteageru
Copy link
Author

The amount of computation can be further reduced. The trick is to check whether the slope of the edge has an absolute value greater than one. When length(xr) >= length(yr), we apply the algorithm given above, i.e., iterate on y; otherwise, we iterate on x. In other words, we can always iterate along the axis with narrower range.

@rafaqz
Copy link
Owner

rafaqz commented Jan 19, 2023

Ok, I think I see what you want.

If I understand correctly, a vertical line on the line between two pixels does not actually touch any pixel?

:touches is used (e.g. by @harithmorgadinho) to get the largest possible pixel area the geometries represent so that even a single point would still always take one pixel. So we need to keep that behaviour somehow.

I don't think any of these methods is right or wrong, they just serve different requirements. So I think we should just handle all of them? We need to correctly define the intervals we are working with and what "touches" means in relation to the intervals of the pixels.

I think you want pixels to cover the area (x1, x2), (y1, y2). We currently choose [x1, x2), [y1, y2), and gdal chooses something like [x1, x2), (y1, y2].

[x1, x2], [y1, y2] could also be useful for some things (edit: and maybe the ArcGIS default).

Probably using https://github.com/JuliaMath/IntervalSets.jl is the right way, and it's already a dependency here.

We can allow the lookup array to contain intervals, and also add an intervals keyword to all rasterization which accepts the type pixel=Interval{:open,:open} (what you want) and pixel=Interval{:closed,:open} ( the current behaviour). It would also allow customising per dimension, likepixel=(X => Interval{:closed,:open}, Y => Interval{:open,:closed}) to get a GDAL like behaviour.

@rafaqz
Copy link
Owner

rafaqz commented Jan 19, 2023

Oh and with the plot, there is a pixel missing for the 0.99999 case because the line is shorter with a step of 0.99999 and does not actually touch that pixel at all. The end is just rounded so it looks like it crosses.

@Mikumikunisiteageru
Copy link
Author

Yes, I fully agree with you. They are just different needs for different situations. "Touching inside" is only my personal preference for a special application. I didn't mean the current criterion you apply here is incorrect.

Anyway, it will be the best if users can freely determine how the package behaves for cases involving pixel boundaries. And I guess this can probably be made by simply changing the directions of rounding (floor or ceil) accordingly.

Sorry I misunderstood the plot titles --- I thought they were values of x1 in the MWE.

@rafaqz
Copy link
Owner

rafaqz commented Jan 19, 2023

Ok, I'm very happy that we agree on this and have a way forward. I will look into the implementation over the weekend.

Making this change also solves why I have taken so long to merge the reduce_rasterize PR - the results are slightly different to GDAL and I don't know how to justify it or test it rigorously. So thank you for the input!

Now we should be able to get exactly the same results as GDAL and test against them by specifying the same interval rules.

In terms of your use case, it will likely take me a month or more to have this properly finished, tested and merged - I hope that is not too long.

The fix to the first problem (the actual bug) will be merged soon.

@Mikumikunisiteageru
Copy link
Author

I am really happy that I learned a lot of new things from your codes and our discussion here.

Thank you for considering my request! I am looking forward to the new version. Please take your time --- the current behavior is already sufficient for most practical cases.

@rafaqz
Copy link
Owner

rafaqz commented Jan 19, 2023

Great. Good to hear there is no rush 😅

If you would like to collaborate on anything here in future (even just by making more issues with your needs) we would benefit from the amount of thought and care you put questions like this.

@Mikumikunisiteageru
Copy link
Author

No problem! I will come here again if I have any other ideas worth discussing.

@rafaqz rafaqz added the enhancement New feature or request label May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants