-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make all SOLR taxa fields case insensitive #76
Comments
Indexing issue - goes in biocache-store |
This might break stuff so will need testing |
Im not sure the solution here is to make all SOLR fields case in sensitve. I think this will cause the index to increase dramatically in size which will have big implications for not a lot of benefit. Heres a quicker/easier alternative:
When the above query is ran we match the string "Animalia" to the the GUID for the taxon kingdom:ANIMALIA. We then search with left/right values associated with the guid. This works for any level in the hierarchy (not just major linnean ranks e.g.subfamily). The match is case insensitive and it also has the smarts to parse taxonomic names with authorship string etc. So if we just extend this support to taking a query "kingdom:animalia", parsing it and doing the same matching as above, then we get case insensitive searches and we get more accurate searches for all taxonomic ranks (which you don't get if you just make all fields case insensitive). Make sense ? FQs shouldn't be case insensitive as they are intended to be exact. They differ from Qs in this aspect. |
Also - just noticed the original query raised by Paul was for the BIE not the Biocache... |
Changing to a case insensitive SOLR field might work in BIE because it does not have facet listing services. Is that right? A test on 50999 records indicated that less storage space is required for a case insensitive index with solr.TextField than a case sensitive index with solr.StrField. At least as an overall result, field to field may vary.
Wild card searching and exact searching appear to operate the same. Stored values do retain case. Unfortunately facet listings return only lower case values so I do not think we can use it in biocache-service. It looks like the biocache-hubs param taxa operates with a GUID search. https://biocache.ala.org.au/occurrences/search?taxa=animalia&facet=off and biocache-service with |
Yeah, facet listings return the indexed values, not the stored. I think we came to conclusion last time we looked at this that we'd need to store/index the fields we want to be case insensitive twice.
Yes, probably. The aim for the services was to make clients as dumb as possible. So if theres search term mangling in biocache-hubs it would be better to push this back to the service if possible. That way we dont have multiple clients (SP, biocache, outside world) all replicating the logic. |
From @nickdos on September 10, 2014 5:19
Or add a case-insensitive copyField.
As reported by a user:
On 9 Sep 2014, at 6:23 pm, Gioia, Paul [email protected] wrote:
Copied from original issue: AtlasOfLivingAustralia/biocache-service#5
The text was updated successfully, but these errors were encountered: