TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

ptbannister · 2021-06-03T14:27:34Z

TieredSFCIndexStrategy calculates query ranges that are always overinclusive on the edges. This behavior is good for some use cases, such as asking for data that intersects the queried data. But for other use cases, such as asking for data that contains the queried data, it searches an excessive amount of data.

For a good example of when we would want to parameterize whether we should be overinclusive on edges, consider the square between coordinates (0,0) and (1,1), with a TieredSFCIndexStrategy using the Hilbert curve. On the tier with two bits of precision, we would expect the query range from 0202 to 0202, but because the strategy is hardwired to be overinclusive on edges, we get the range from 0200 to 0203, searching the whole planet for data. At three bits of precision, we would expect the query range from 0308 to 0308, but instead we get a range that includes 0302, 0307, and 030d as well as the expected 0308.

There are at least two possible ways this could be improved. One would be to make the index strategy configurable, so its behavior is either always overinclusive, or always not overinclusive; this would involve adding a parameter to the constructor. Another would be to add overInclusiveOnEdges as a parameter of getQueryRanges, so it could be determined at query time what kind of search would be most efficient.

The text was updated successfully, but these errors were encountered:

rfecher · 2021-06-14T12:57:59Z

if I'm following correctly, the issue is the edge is always rounded and the nature of querying is that you don't a piori know whats stored (ie. you have to be fully inclusive of the query geometry). So an inserted point exactly at (0,0) would be 0202 and an inserted point exactly at (1,1) would be 0203 and if you search a bbox of 0,0 though 1,1 you would have to return both points so your ranges would have to include 0202 and 0203. We do a range decomposition to not strictly use a single range from min Hilbert Value to max Hilbert value but a more fine-grained set to represent the query geometry (ref: https://locationtech.github.io/geowave/latest/devguide.html#theorydecomposition).

Basically on ingest you always want to give exact hilbert values for points and not be over-inclusive for ranges (bboxes) but then on query you need to be over-inclusive on the edge to include points on each of the edges of a bbox or you can miss results (unless you're doing a contains relationship instead of an intersection, but most users expect an intersection relationship on bbox queries).

ptbannister · 2021-08-19T18:01:27Z

Yes, exactly.

If the most simple approach is acceptable (making the index strategy configurable for overInclusiveOnEdges) then we already have a solution that we could contribute.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

ptbannister commented Jun 3, 2021

rfecher commented Jun 14, 2021

ptbannister commented Aug 19, 2021

TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

TieredSFCIndexStrategy configurable overInclusiveOnEdge #1816

Comments

ptbannister commented Jun 3, 2021

rfecher commented Jun 14, 2021

ptbannister commented Aug 19, 2021