-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient $regex
Mango view
queries on indexed fields
#4775
Labels
Comments
it would be nice (and I think fairly easy) to at least narrow the query with startkey/endkey if the regex does not start with a wildcard. Parsing the regex to determine that sounds like the tricky part. |
rnewson
added a commit
that referenced
this issue
Sep 26, 2023
for selector; {"selector":{"_id":{"$regex":"doc.+"}}} before; { "include_docs": true, "view_type": "map", "reduce": false, "partition": null, "start_key": [], "end_key": [ "<MAX>" ], "direction": "fwd", "stable": false, "update": true, "conflicts": "undefined" } after; { "include_docs": true, "view_type": "map", "reduce": false, "partition": null, "start_key": [ "doc" ], "end_key": [ "doc�", "<MAX>" ], "direction": "fwd", "stable": false, "update": true, "conflicts": "undefined" } closes: #4775
5 tasks
rnewson
added a commit
that referenced
this issue
Sep 26, 2023
for selector; {"selector":{"_id":{"$regex":"doc.+"}}} before; { "include_docs": true, "view_type": "map", "reduce": false, "partition": null, "start_key": [], "end_key": [ "<MAX>" ], "direction": "fwd", "stable": false, "update": true, "conflicts": "undefined" } after; { "include_docs": true, "view_type": "map", "reduce": false, "partition": null, "start_key": [ "doc" ], "end_key": [ "doc�", "<MAX>" ], "direction": "fwd", "stable": false, "update": true, "conflicts": "undefined" } closes: #4775
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
On the Mango query interface, when the
$regex
operator is used in a selector with a field that could be backed up by aview
(json
) index, it is not leveraged. The use of$regex
in its current form rather implies a brute-force scan through all the documents of the underlying view, which makes the whole approach inefficient and resource-intensive.The source of the problem is that no index ranges could be computed for
$regex
(to narrow down the scan) because regular expressions do not naturally generate an ordering on the set of strings, like for example comparisons do on integers. However, there is a prior art in for example, Lucene where it is possible to achieve that for specialized cases, such as "starts with"-type of expressions, i.e.^foo.*
.Some more links about how Lucene does it, just for inspiration:
Another factor that can slow down the evaluation is that documents are included unconditionally, and
$regex
is not matched against the key itself first and only documents for the matched keys are fetched and returned. Something along the lines of covering indexes.The text was updated successfully, but these errors were encountered: