Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inefficient $regex Mango view queries on indexed fields #4775

Open
pgj opened this issue Sep 25, 2023 · 1 comment
Open

Inefficient $regex Mango view queries on indexed fields #4775

pgj opened this issue Sep 25, 2023 · 1 comment

Comments

@pgj
Copy link
Contributor

pgj commented Sep 25, 2023

On the Mango query interface, when the $regex operator is used in a selector with a field that could be backed up by a view (json) index, it is not leveraged. The use of $regex in its current form rather implies a brute-force scan through all the documents of the underlying view, which makes the whole approach inefficient and resource-intensive.

The source of the problem is that no index ranges could be computed for $regex (to narrow down the scan) because regular expressions do not naturally generate an ordering on the set of strings, like for example comparisons do on integers. However, there is a prior art in for example, Lucene where it is possible to achieve that for specialized cases, such as "starts with"-type of expressions, i.e. ^foo.*.

Some more links about how Lucene does it, just for inspiration:

Another factor that can slow down the evaluation is that documents are included unconditionally, and $regex is not matched against the key itself first and only documents for the matched keys are fetched and returned. Something along the lines of covering indexes.

@rnewson
Copy link
Member

rnewson commented Sep 25, 2023

it would be nice (and I think fairly easy) to at least narrow the query with startkey/endkey if the regex does not start with a wildcard. Parsing the regex to determine that sounds like the tricky part.

rnewson added a commit that referenced this issue Sep 26, 2023
for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

closes: #4775
rnewson added a commit that referenced this issue Sep 26, 2023
for selector;

{"selector":{"_id":{"$regex":"doc.+"}}}

before;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [],
  "end_key": [
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

after;

{
  "include_docs": true,
  "view_type": "map",
  "reduce": false,
  "partition": null,
  "start_key": [
    "doc"
  ],
  "end_key": [
    "doc�",
    "<MAX>"
  ],
  "direction": "fwd",
  "stable": false,
  "update": true,
  "conflicts": "undefined"
}

closes: #4775
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants