add support for structured queries (opensearch only) #815

tobiass-sdl · 2024-06-06T16:49:41Z

this adds a new function

http:https://localhost:2322/structured?city=berlin

Supported parameters are

"lang", "limit",  "lon", "lat", "osm_tag", "location_bias_scale", "bbox", "debug", "zoom", "layer", "countrycode", "state", "county", "city", "postcode", "district", "housenumber", "street"

Result format is equal to /api.

For those parameters that are also available for /api the meaning is the same.
CountryCode has to be the ISO2 code (e.g. DE instead of "Germany"). The country code must match exactly. Fuzzy matching does not make sense for 2 letter codes. In my experience the case that address is fine except for the country code is rare compared to nonsense hits in other countries.

In order to use structured queries, the new option "-structured" has to be used when importing from Nominatim. Expect 10-20% higher file size. If you run photon on an existing index folder, photon detects automatically if the data was imported with structured query support.
For data imported without "-structured" /structured?city=berlin returns a technical error as the route is not mapped.
Structured queries are only supported for OpenSearch based photon.

Known issues

state information is used with low priority. This can cause issues with cities that exist in several states (e.g. "Springfield" in the US). Reason is that states are not normalized - some documents have abbreviations like "NY", other spell "New York" out. To fix this either an alias list is needed or this is normalized on import (maybe even for nominatim).
no fine tuning of the scores yet - would require a large test set with reliable expected results and some automated way for guessing "good" scores.
/structured?city=some_hamlet_or_isolated_dwelling does not always work

tobiass-sdl · 2024-06-06T16:50:13Z

adapt readme / documentation

lonvia

The overall structure is fine. I've added some smaller comments regarding style as well as comments regarding index structure.

My main concern is the handling of mismatches or missing address parts in the structured query. The lenient 'should' matching produces a lot of false positive results. This would be rather unexpected.

lonvia · 2024-06-10T12:46:22Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/AddressQueryBuilder.java

+ private static final float StreetBoost = 5.0f; // we filter streets in the wrong city / district / ... so we can use a high boost value
+ private static final float HouseNumberBoost = 10.0f;
+ private static final float HouseNumberUnmatchedBoost = 5f;
+ private static final float FactorForWrongLanguage = 0.1f;


Style nitpicking: can you make these constant upper-case.

lonvia · 2024-06-10T12:50:40Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java

 .copyTo("collector.base", "collector." + lang)));
 }

 mappings.properties("name." + lang,
- b -> b.text(p -> p.index(false)
+ b -> b.text(p -> p.index(supportStructuredQueries)


Do you really need this index? Names really should be queried only through the .ngrams or .raw indexes.

lonvia · 2024-06-10T13:33:14Z

src/main/java/de/komoot/photon/query/StructuredPhotonRequest.java

+
+ public boolean hasStreet() { return this.street != null || this.houseNumber != null; }
+
+ public boolean hasHouseNumber() { return this.houseNumber != null; }


You might want to extend all these to check for the empty string. (Or do the check in the set functions and set to null when the input string is empty.)

lonvia · 2024-06-10T13:34:58Z

app/es_embedded/src/main/java/de/komoot/photon/Server.java

@@ -183,6 +184,16 @@ public DatabaseProperties recreateIndex(String[] languages, Date importDate) thr
 return dbProperties;
 }

+ private void createAndPutIndexMapping(String[] languages, boolean supportStructuredQueries)
+ {
+ if(supportStructuredQueries) {


style nitpicking: please leave spaces between if and the bracket. (Also in other parts of the code.)

lonvia · 2024-06-10T13:37:42Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/AddressQueryBuilder.java

+ }
+ }
+
+ private void AddQuery(Query clause) {


style-nitpicking: functions should start with a lower-case letter.

lonvia · 2024-06-10T14:00:05Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/AddressQueryBuilder.java

+ .boost(boost)
+ .build()
+ .toQuery());
+ }


Instead of creating a complex query over all languages here, you could also work with a collector field for each address part. Define a field 'state_collector' which is indexed but not stored. Then instead of setting the index for 'state.', you copy to 'state_collector'.

You'd loose the boost factor for wrong language. In my experiments it made little enough difference in accuracy for quite a bit of performance boost.

lonvia · 2024-06-10T14:06:49Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/SearchQueryBuilder.java

+ var query4QueryBuilder = new AddressQueryBuilder(lenient, language, languages)
+ .addCountryCode(request.getCountryCode())
+ .addState(request.getState(), request.hasCounty() || request.hasCityOrPostcode())
+ .addCounty(request.getCounty(), request.hasCityOrPostcode())


Those two are missing hasStreet() in the 'hasMoreDetails' field. It might seem that it doesn't make much sense to have 'street' and 'county' in the query but no 'city' parameter, but that has never stopped users from trying anyway.

lonvia · 2024-06-11T08:06:20Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/AddressQueryBuilder.java

+ .build()
+ .toQuery())
+ .boost(boost);
+ }


I'm not sure I understand the logic here. I would expect that you'd always search in the address field when hasMoreDetails is true and in the name otherwise. Right now it looks like you can also return results where all searched fields are in the address only. That seems rather confusing. For example a query with /structured?city=vaduz returns the city of Vaduz as expected but then also a lot of address results.

the idea was that you still find the city if the street/housenumber is wrong (or not yet in openstreetmap) but I clearly messed something up. Returning houses or streets when no street is given is obviously a bad idea and I thought I handled that case (and failed) -> have to investigate.

lonvia · 2024-06-11T08:16:13Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/AddressQueryBuilder.java

+ .boost(HouseNumberBoost)
+ .build()
+ .toQuery());
+ }


style: bad indent

lonvia · 2024-06-11T08:33:13Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java

@@ -32,12 +36,12 @@ public IndexMapping addLanguages(String[] languages) {
 for (var field: ADDRESS_FIELDS) {
 mappings.properties(String.format("%s.%s", field, lang),
 b -> b.text(p -> p
- .index(false)
+ .index(shouldIndexAddressField(field))


All the new indexes simply use the default analyser. I wonder if 'analyser("index_raw")' and searchAnalyser("search") wouldn't be better here. At a minimum, it gives you German Umlaut-Folding (ö vs. oe).

You can set analysers unconditionally. They simply have no effect when the index is disabled.

lonvia · 2024-06-11T13:13:11Z

Sorry for creating the conflict. I found #816 while reviewing this PR. Please just rebase on master at some point. I don't mind force pushes to PR branches.

tobiass-sdl · 2024-06-13T15:19:45Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/OpenSearchStructuredSearchHandler.java

+ results = sendQuery(buildQuery(photonRequest, true).buildQuery(), extLimit);
+
+ if (results.hits().total().value() == 0 && photonRequest.hasStreet()) {
+ var street = photonRequest.getStreet();


this turned out to be easier than having an unmaintainable complex query.
Of course this could also be done on the client side...

tobiass-sdl · 2024-06-13T15:20:15Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/AddressQueryBuilder.java

+ // match only name
+ if (hasPostCode) {
+ // post code can implicitly specify a district that has the city in the address field (instead of the name)
+ combinedQuery = QueryBuilders.bool()


e.g. /structured?state=Baden Württemberg&district=Plotzsägmühle&housenumber=2

lonvia

The query works much better now.

You don't allow typos in the second lenient try. Is this intentional? I would have expected auto fuzziness to be enabled for it.

/structured/?city=vaduz&street=lettstrasse still returns me the bus stop "Lettstrasse". We might need a layer filter here.

The other thing that stands out is that the usually abbreviations don't work ('strasse' -> 'str'). The freetext search mitigates that somewhat by always doing a prefix search so that some abbreviations just accidentally work (doesn't help with 'road' -> 'rd', though). We don't have this here. I would defer this topic to later. Just wanted to mention it, in case somebody tries this and doesn't get results.

On the nitpicking side: there are still some function names that start with an upper-case letter.

lonvia · 2024-06-17T09:16:04Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java

+ }
+
+ mappings.properties(propertyName,
+ b -> b.text(p -> p.copyTo(collectors)));


Now the 'index(false)' is gone. I'm not entirely sure what the default is, so prefer explicit statements.

lonvia · 2024-06-17T09:18:50Z

app/opensearch/src/main/java/de/komoot/photon/opensearch/IndexMapping.java

 mappings.properties(field + ".default", b -> b.text(p -> p
- .index(false)
- .copyTo("collector.default", "collector.base")));
+ .index(shouldIndexAddressField(field))


Do you still need an index here?

tobiass-sdl · 2024-06-24T15:31:42Z

Luckily I got rid of the bus stops without an overly complex layer filter. There was already a filter to filter out results with house numbers if the request does not contain one. Bus stops and a few other locations passed that filter because they have no house number. Therefore fixing the filter from "no house number" to "no house number and not a house" should be sufficient.
I totally agree with ignoring abbrevations for know - that's similar to the issue with state="NY" vs state="New York".
No fuzziness for lenient=false was a bug.

lonvia · 2024-06-26T09:44:47Z

Therefore fixing the filter from "no house number" to "no house number and not a house" should be sufficient.

You may even want to put a global filter on the query "when type is house then it must have a house number". And also add a global filter for "type != other". This should consistently filter out all object that are not "address-like".

…not address-like.

lonvia · 2024-07-09T13:13:13Z

Looks good as a first version. We can fine-tune in follow-up PRs, if necessary.

lonvia · 2024-07-09T13:14:37Z

Oh dear, now I forgot about documentation. Do you mind adding a little bit of documentation for the new call in a separate PR?

I would suggest to do this in an extra file docs/structured.md and in the README.md just link to it (with the warning that it is optional). If the call is described directly in the main README, we will end up with repeated questions why it doesn't work on photon.komoot.io.

tobiass-sdl · 2024-07-09T14:26:40Z

Yes, I can add some documentation. I'm off next week, not sure whether I get to it before the 20th.

lonvia reviewed Jun 11, 2024

View reviewed changes

add support for structured queries (opensearch only)

d9bc19f

tobiass-sdl force-pushed the sdl/structured_queries branch from ae6a9d7 to d9bc19f Compare June 13, 2024 15:06

tobiass-sdl commented Jun 13, 2024

View reviewed changes

lonvia reviewed Jun 17, 2024

View reviewed changes

review comments / bug fixes

6e48f88

stricter filter for houses without house number and objects that are …

bc230d2

…not address-like.

lonvia merged commit be99d7d into komoot:master Jul 9, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for structured queries (opensearch only) #815

add support for structured queries (opensearch only) #815

tobiass-sdl commented Jun 6, 2024 •

edited

Loading

tobiass-sdl commented Jun 6, 2024

lonvia left a comment

lonvia Jun 10, 2024

lonvia Jun 10, 2024

lonvia Jun 10, 2024

lonvia Jun 10, 2024

lonvia Jun 10, 2024

lonvia Jun 10, 2024

lonvia Jun 10, 2024

lonvia Jun 11, 2024

tobiass-sdl Jun 11, 2024

lonvia Jun 11, 2024

lonvia Jun 11, 2024

lonvia commented Jun 11, 2024

tobiass-sdl Jun 13, 2024

tobiass-sdl Jun 13, 2024

lonvia left a comment

lonvia Jun 17, 2024

lonvia Jun 17, 2024

tobiass-sdl commented Jun 24, 2024

lonvia commented Jun 26, 2024

lonvia commented Jul 9, 2024

lonvia commented Jul 9, 2024

tobiass-sdl commented Jul 9, 2024


		public boolean hasStreet() { return this.street != null \|\| this.houseNumber != null; }

		public boolean hasHouseNumber() { return this.houseNumber != null; }

add support for structured queries (opensearch only) #815

add support for structured queries (opensearch only) #815

Conversation

tobiass-sdl commented Jun 6, 2024 • edited Loading

Known issues

tobiass-sdl commented Jun 6, 2024

lonvia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lonvia commented Jun 11, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lonvia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tobiass-sdl commented Jun 24, 2024

lonvia commented Jun 26, 2024

lonvia commented Jul 9, 2024

lonvia commented Jul 9, 2024

tobiass-sdl commented Jul 9, 2024

tobiass-sdl commented Jun 6, 2024 •

edited

Loading