Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: build Mango indexes from dynamic expressions #3912

Open
wants to merge 6 commits into
base: 3.x
Choose a base branch
from

Commits on Jan 25, 2022

  1. Index on a "virtual" field computed from normal fields

    This puts in just enough machinery to support indexing on a function of
    a document's content inside Mango, without going through the JS engine.
    Say we have a document like:
    
        {
          "bar": "a b c"
        }
    
    If we want to index on the individual "words" of `bar`, i.e. find this
    doc via the keys "a", "b" or "c", we'd normally need a JS map function:
    
        function (doc) {
          for (let word of doc.bar.split(" ")) {
            emit(word)
          }
        }
    
    This patch lets us do this inside Mango by defining an index on a
    "virtual" field whose value is a function of the doc's other fields. We
    put a view containing this in a design doc:
    
        "map": {
          "fields": {
            "bar_words": {
              "$explode": { "$field": "bar", "$separator": " " }
            }
          }
        }
    
    And this lets us perform `_find` queries for e.g. { "bar_words": "b" }
    to get our original document.
    
    As this is a proof of concept designed to get the index machinery
    working, `$explode` is the only function defined.
    janl authored and jcoglan committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    661cda3 View commit details
    Browse the repository at this point in the history
  2. NIF bindings for jq

    Here we add a module named `couch_jq` which exports an interface to the
    jq [1] library, letting us run jq programs against CouchDB docs held in
    memory as Erlang terms. The exported functions are:
    
    - `couch_jq:compile/1`: takes a binary containing a jq expression, and
      compiles it. Returns a resource object holding the resulting struct
      that stored the parsed expression.
    
    - `couch_jq:eval/2`: takes a compiled jq program and a JSON document and
      returns the results of evaluating the program against the doc, as a
      list.
    
    Both functions return either `{ok, Result}` or `{error, Message}`.
    
    This assumes that jq programs will in general return multiple results.
    Because the Erlang list is built from the tail, the results are returned
    in the reverse order that they're emitted by jq. We're going to put
    these values into an index where they'll get re-sorted so the order is
    not important.
    
    Most of the native code here is concerned with translating between
    Erlang and jq representations of JSON values. Erlang terms are checked
    to make sure they conform to the Couch document representation, and
    anything else is rejected.
    
    [1]: https://stedolan.github.io/jq/
    jcoglan committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    783dcc7 View commit details
    Browse the repository at this point in the history
  3. Replace the $explode operator with $jq

    Here we replace our prototype `$explode` operator with the more general
    `$jq` operator. The index is built by executing the jq expression
    against each doc and storing each result into the index.
    jcoglan committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    4c27463 View commit details
    Browse the repository at this point in the history
  4. Allow multiple jq expressions in index definitions

    If have a document like:
    
        {
          "foo": "a b",
          "bar": "x y"
        }
    
    And an index definition like:
    
        [
          { "foo_words": { "$jq": '.foo | split(" ") | .[]' } },
          { "bar_words": { "$jq": '.bar | split(" ") | .[]' } }
        ]
    
    Then this should produce four index keys for the document:
    
        ("a", "x")
        ("a", "y")
        ("b", "x")
        ("b", "y")
    
    This lets us query on multiple virtual fields in a single query. The
    implementation here allows jq expressions (that return multiple values)
    to be mixed with normal field access that returns a single value;
    `flatten_keys/1` returns the product of any multi-valued index fields.
    For example, above `foo_words` produces values `["a", "b"]` and
    `bar_words` produces `["x", "y"]`, and we multiply this out giving the
    four keys above.
    jcoglan committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    1852c31 View commit details
    Browse the repository at this point in the history
  5. Compile jq expressions once per index, not once per document

    The first implementation of building Mango indexes from jq expressions
    re-compiled the jq program for every document it processed. This is
    wasteful and we can instead to this once, when the index definitions are
    updated.
    
    When the index definition is loaded in, field descriptions of the form
    `{[{<<"$jq">>, Expr}]}` are converted to `{jq, CompiledExpr}`. This
    non-JSON term is used because the compiled jq program can be held in
    memory but not serialised back out into the design doc.
    jcoglan committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    c311f83 View commit details
    Browse the repository at this point in the history
  6. Allow the /{db}/_index endpoint to accept jq definitions

    We currently have to create jq-based indexes by writing out the whole
    design document containing the Mango index definition, because the
    `/{db}/_index` endpoint does not except anything except "asc" or "desc"
    next to field names in the sort syntax.
    
    Here we expand this to allow { "$jq": "..." } as well, letting us create
    jq-based indexes without knowing the internal document format.
    
    Even though internally we support mixing jq expressions with normal
    field lookups in multi-field indexes, for now I'm restricting this so
    that multi-field indexes have to either all use the same sort direction,
    or all be jq expressions.
    jcoglan committed Jan 25, 2022
    Configuration menu
    Copy the full SHA
    f5c661e View commit details
    Browse the repository at this point in the history