-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: build Mango indexes from dynamic expressions #3912
Open
jcoglan
wants to merge
6
commits into
apache:3.x
Choose a base branch
from
neighbourhoodie:mango-jq-indexes
base: 3.x
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Commits on Jan 25, 2022
-
Index on a "virtual" field computed from normal fields
This puts in just enough machinery to support indexing on a function of a document's content inside Mango, without going through the JS engine. Say we have a document like: { "bar": "a b c" } If we want to index on the individual "words" of `bar`, i.e. find this doc via the keys "a", "b" or "c", we'd normally need a JS map function: function (doc) { for (let word of doc.bar.split(" ")) { emit(word) } } This patch lets us do this inside Mango by defining an index on a "virtual" field whose value is a function of the doc's other fields. We put a view containing this in a design doc: "map": { "fields": { "bar_words": { "$explode": { "$field": "bar", "$separator": " " } } } } And this lets us perform `_find` queries for e.g. { "bar_words": "b" } to get our original document. As this is a proof of concept designed to get the index machinery working, `$explode` is the only function defined.
Configuration menu - View commit details
-
Copy full SHA for 661cda3 - Browse repository at this point
Copy the full SHA 661cda3View commit details -
Here we add a module named `couch_jq` which exports an interface to the jq [1] library, letting us run jq programs against CouchDB docs held in memory as Erlang terms. The exported functions are: - `couch_jq:compile/1`: takes a binary containing a jq expression, and compiles it. Returns a resource object holding the resulting struct that stored the parsed expression. - `couch_jq:eval/2`: takes a compiled jq program and a JSON document and returns the results of evaluating the program against the doc, as a list. Both functions return either `{ok, Result}` or `{error, Message}`. This assumes that jq programs will in general return multiple results. Because the Erlang list is built from the tail, the results are returned in the reverse order that they're emitted by jq. We're going to put these values into an index where they'll get re-sorted so the order is not important. Most of the native code here is concerned with translating between Erlang and jq representations of JSON values. Erlang terms are checked to make sure they conform to the Couch document representation, and anything else is rejected. [1]: https://stedolan.github.io/jq/
Configuration menu - View commit details
-
Copy full SHA for 783dcc7 - Browse repository at this point
Copy the full SHA 783dcc7View commit details -
Replace the
$explode
operator with$jq
Here we replace our prototype `$explode` operator with the more general `$jq` operator. The index is built by executing the jq expression against each doc and storing each result into the index.
Configuration menu - View commit details
-
Copy full SHA for 4c27463 - Browse repository at this point
Copy the full SHA 4c27463View commit details -
Allow multiple jq expressions in index definitions
If have a document like: { "foo": "a b", "bar": "x y" } And an index definition like: [ { "foo_words": { "$jq": '.foo | split(" ") | .[]' } }, { "bar_words": { "$jq": '.bar | split(" ") | .[]' } } ] Then this should produce four index keys for the document: ("a", "x") ("a", "y") ("b", "x") ("b", "y") This lets us query on multiple virtual fields in a single query. The implementation here allows jq expressions (that return multiple values) to be mixed with normal field access that returns a single value; `flatten_keys/1` returns the product of any multi-valued index fields. For example, above `foo_words` produces values `["a", "b"]` and `bar_words` produces `["x", "y"]`, and we multiply this out giving the four keys above.
Configuration menu - View commit details
-
Copy full SHA for 1852c31 - Browse repository at this point
Copy the full SHA 1852c31View commit details -
Compile jq expressions once per index, not once per document
The first implementation of building Mango indexes from jq expressions re-compiled the jq program for every document it processed. This is wasteful and we can instead to this once, when the index definitions are updated. When the index definition is loaded in, field descriptions of the form `{[{<<"$jq">>, Expr}]}` are converted to `{jq, CompiledExpr}`. This non-JSON term is used because the compiled jq program can be held in memory but not serialised back out into the design doc.
Configuration menu - View commit details
-
Copy full SHA for c311f83 - Browse repository at this point
Copy the full SHA c311f83View commit details -
Allow the /{db}/_index endpoint to accept jq definitions
We currently have to create jq-based indexes by writing out the whole design document containing the Mango index definition, because the `/{db}/_index` endpoint does not except anything except "asc" or "desc" next to field names in the sort syntax. Here we expand this to allow { "$jq": "..." } as well, letting us create jq-based indexes without knowing the internal document format. Even though internally we support mixing jq expressions with normal field lookups in multi-field indexes, for now I'm restricting this so that multi-field indexes have to either all use the same sort direction, or all be jq expressions.
Configuration menu - View commit details
-
Copy full SHA for f5c661e - Browse repository at this point
Copy the full SHA f5c661eView commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.