Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ES|QL ST_DISTANCE Function #108764

Merged
merged 33 commits into from
Jun 21, 2024
Merged

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented May 17, 2024

Initial work in support of #108212

This PR covers the addition of one new spatial function:

  • ST_DISTANCE(geomA, geomB) which returns a double value

This function can be used for calcualting distance, filtering results by distance and sorting by distance:

Distance calculations

FROM airports
| EVAL distance = ST_DISTANCE(location, city_location)

Filtering results

FROM airports
| WHERE ST_DISTANCE(location, TO_GEOPOINT("POINT(12 55)")) < 500000

Sorting results

FROM airports
| EVAL distance = ST_DISTANCE(location, TO_GEOPOINT("POINT(12 55)"))
| SORT distance ASC

Example

We can use it for all three capabilities in the same query:

FROM airports
| WHERE ST_DISTANCE(location, TO_GEOPOINT("POINT(12.565 55.673)")) < 600000
| EVAL distance = ROUND(ST_DISTANCE(location, TO_GEOPOINT("POINT(12.565 55.673)"))/1000,2)
| EVAL city_distance = ROUND(ST_DISTANCE(city_location, TO_GEOPOINT("POINT(12.565 55.673)"))/1000,2)
| KEEP abbrev, name, location, country, city, city_location, distance, city_distance
| SORT distance ASC
| LIMIT 10

Which results in something like:

abbrev:k | name:text               | location:geo_point                       | country:k | city:k          | city_location:geo_point | distance:d | city_distance:d    
CPH      | Copenhagen              | POINT(12.6493508684508 55.6285017221528) | Denmark   | Copenhagen      | POINT(12.5683 55.6761)  | 7.24       | 0.4
GOT      | Gothenburg              | POINT(12.2938269092573 57.6857493534879) | Sweden    | Gothenburg      | POINT(11.9675 57.7075)  | 224.42     | 229.15
HAM      | Hamburg                 | POINT(10.005647830925 53.6320011640866)  | Germany   | Norderstedt     | POINT(10.0103 53.7064)  | 280.34     | 273.42
TXL      | Berlin-Tegel Int'l      | POINT(13.2903090925074 52.5544287044101) | Germany   | Hohen Neuendorf | POINT(13.2833 52.6667)  | 349.97     | 337.53
BRE      | Bremen                  | POINT(8.7858617703132 53.052287104156)   | Germany   | Bremen          | POINT(8.8 53.0833)      | 380.5      | 377.22
NRK      | Norrköping Airport      | POINT(16.2339407695814 58.5833805017541) | Sweden    | Norrköping      | POINT(16.2 58.6)        | 392.0      | 392.35
GDN      | Gdansk Lech Walesa      | POINT(18.4684422165911 54.3807025352925) | Poland    | Gdańsk          | POINT(18.6453 54.3475)  | 402.61     | 414.59
NYO      | Stockholm-Skavsta       | POINT(16.9216055584254 58.7851041303448) | Sweden    | Nyköping        | POINT(17.0086 58.7531)  | 433.99     | 434.43
OSL      | Oslo Gardermoen         | POINT(11.0991032762581 60.1935783171386) | Norway    | Oslo            | POINT(10.7389 59.9133)  | 510.03     | 483.71
DRS      | Dresden                 | POINT(13.7649671440047 51.1250912428871) | Germany   | Dresden         | POINT(13.74 51.05)      | 511.9      | 519.91

This function was conceptually based on the similarly named PostGIS function, ST_DISTANCE, but also satisfies the needs of a related PostGIS function, ST_DWITHIN.

There are a few differences between ES|QL and PostGIS:

  • In PostGIS the geometries can be any supported spatial geometry, and the closest distance between the geometries will be calculated. In ES|QL we only support distances between points. Supporting closest distance between shapes could be considered in future.
  • In PostGIS the ST_DISTANCE function cannot benefit directly from the spatial index, and is usually used in combination with a bounding box query to achieve higher performance results. PostGIS recommends the usage of ST_DWITHIN for index-based distance filtering in WHERE clauses. In ES|QL we can optimise ST_DISTANCE for filtering, and so do not need to implement ST_DWITHIN at all.
  • In PostGIS the distance is always the cartesian distance for the CRS, while in ES|QL with geo_point data types we perform a haversin distance. This is because ES|QL prioritises compatibility with Elasticsearch's _search API (and the underlying Lucene implementation) over perfect compatibility with OGC or PostGIS.

Things not done in the PR and planned for followup work:

@craigtaverner craigtaverner added >enhancement :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v8.15.0 labels May 17, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class
So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases.

The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests.

We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types.
@craigtaverner craigtaverner marked this pull request as ready for review June 20, 2024 10:26
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is suppose to add a distance function, but it seems you are adding two functions here. I disagree on introducing two functions for the same thing. Let's keep it simple and only add ST_DISTANCE?

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

…expression/function/scalar/spatial/StDistance.java

Co-authored-by: Ignacio Vera <[email protected]>
Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@craigtaverner
Copy link
Contributor Author

@elasticsearchmachine update branch

@craigtaverner
Copy link
Contributor Author

The work done on ST_DWITHIN has been moved into a separate PR for later consideration at #109985

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Craig, LGTM!

@craigtaverner
Copy link
Contributor Author

@elasticsearchmachine update branch

@craigtaverner craigtaverner merged commit 536d614 into elastic:main Jun 21, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL :Analytics/Geo Indexing, search aggregations of geo points and shapes >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants