feat(`mango`): extend `_explain` with candidate indexes and selector hints #4662

pgj · 2023-07-03T14:40:02Z

Execution of _find queries is often considered opaque by the users. They do not have enough insights about which index was selected and why, which can make hard to compose better queries. To address these problems, the change proposes to extend the /{db}/_explain endpoint with the following:

index_candidates: This attribute contains the list of all the other indexes that were not chosen for serving the query. The items in the list tell whether the given index is usable, what the reason was for its exclusion, and how "far" it is from being selected.
selector_hints: This attribute shows which fields are indexable and unindexable in the selector per each (available) index kind.

Warning:

The covered attribute in the _explain output has been changed to covering so that it now matches the attribute name used in the analysis.
The _explain will return HTTP 200 even if there is no usable index. This makes it different from how _find would respond to the same set of parameters.

Samples, index_candidates:

  "index_candidates": [
    {
      "index": {
        "ddoc": null,
        "name": "_all_docs",
        "type": "special",
        "def": {
          "fields": [
            {
              "_id": "asc"
            }
          ]
        }
      },
     "analysis": {
        "usable": false,
        "reasons": [
          {
            "name": "unfavored_type"
          }
        ],
        "ranking": 1,
        "covering": null
     }
    },
    {
      "index": {
        "ddoc": "_design/age",
        "name": "age",
        "type": "json",
        "partitioned": false,
        "def": {
          "fields": [
            {
              "age": "asc"
            }
          ]
        }
      },
      "analysis": {
        "usable": false,
        "reasons": [
          {
            "name": "field_mismatch"
          }
        ],
        "ranking": 2,
        "covering": false
      }
    }
  ]

and selector_hints:

  "selector_hints": [
    {
      "type": "json",
      "indexable_fields": [],
      "unindexable_fields": [
        "name.first"
      ]
    },
    {
      "type": "text",
      "indexable_fields": [
        "name.first"
      ],
      "unindexable_fields": []
    },
    {
      "type": "nouveau",
      "indexable_fields": [
        "name.first"
      ],
      "unindexable_fields": []
    }
  ]

Based on this information, Fauxton could display more useful hints and tips for the users on building the queries.

As an extra, the unit test coverage is increased throughout the Mango modules. Brief stats (excerpt from make eunit apps=mango):

number of tests: 173 ⟶ 195
mango_cursor: 19% ⟶ 86%
mango_cursor_view: 99% ⟶ 100%
mango_cursor_special: 0% ⟶ 100%
mango_idx: 20% ⟶ 54%
mango_idx_special: 0% ⟶ 48%
mango_idx_text: 63% ⟶ 65%
mango_idx_view: 40% ⟶ 52%
total: 47% ⟶ 55%

Checklist for the review

Code is written and works correctly
Changes are covered by tests
Documentation changes were made in the src/docs folder

willholley

I added a few notes on the documentation side, though suggest somebody more experienced comments on the Erlang. Broadly, this looks like a really useful enhancement!

src/docs/src/api/database/find.rst

willholley · 2023-07-13T10:42:54Z

src/docs/src/api/database/find.rst

+- ``"text"``: The selector must contain references to fields that are
+ indexable. Sometimes it cannot be pre-determined if the given field
+ would exist in the text index, hence it will not be supported.


I'm not entirely clear what "fields that are indexable" means here.

Can you provide an example of a situation where we cannot determine whether a field will exist in a text index?

Suggested change

- ``"text"``: The selector must contain references to fields that are

indexable. Sometimes it cannot be pre-determined if the given field

would exist in the text index, hence it will not be supported.

- ``"text"``: The index must contain fields that are referenced by the query ``"selector"`` or ``"sort"``. Sometimes it cannot be predetermined that a field will exist in a text index, in which case the text index will not be supported.

I agree that using the concept "indexable" does not help here. The corresponding code has a relatively complicated logic to determine whether a field in the selector is indexable, I might be failing to translate it properly.

Mango queries first translated to some abstract intermediate data structure which is then translated further to its counterpart in Lucene. These are the specific Erlang function alternatives where a field is excluded, with the respective comments to explain (from mango_idx_text):

% forces "$exists" : false to use _all_docs indexable_fields(_, {op_not, {_, false}}) -> []; % fieldname.[]:length is not a user[-]defined field. indexable_fields(Fields, {op_field, {[_, <<":length">>], _}}) -> Fields; % ... % In this particular case, the [L]ucene index is doing a field_exists query % so it is looking at all sorts of combinations of field:* and field.* % We don't add the field because we cannot predetermine what field will exist. % Hence we just return Fields and make it less restrictive. indexable_fields(Fields, {op_fieldname, {_, _}}) -> Fields; % Similar idea to op_fieldname but with fieldname:null indexable_fields(Fields, {op_null, {_, _}}) -> Fields;

For the record, here is the conclusion of the related off-line discussion with @willholley: remove the sentence "Sometimes it cannot be predetermined if the given field will exist in the text index, in which case the text index will not be supported" because this is not proven that I can happen in practice.

src/docs/src/api/database/find.rst

janl · 2023-08-01T11:14:50Z

The covered attribute in the _explain output has been changed to covering so that it now matches the attribute name used in the analysis.

Just checking: have we made a release since the covered property was introduced? If not, then renaming is not controversial, if yes, then this is strictly an API change and would at least require a lazy consensus vote on dev@.

pgj · 2023-08-01T11:27:36Z

I do not think that there was a CouchDB release with this property. It was introduced to main on April 19 and since then only patch-level releases were made for the 3.2 and 3.3 branches for which the corresponding feature was not backported.

nickva

I just have a few cursory comments from first glance over the code. I am not as familiar with mango so most of it is Erlang style nits and such.

Good point @janl about cover/covering API compatibility. I double-checked version 3.3.2 as well to confirm we didn't release the previous version yet so we should be clear there.

src/mango/src/mango_idx.erl

src/mango/src/mango_cursor.erl

src/mango/src/mango_cursor.hrl

src/mango/src/mango.hrl

src/mango/src/mango_cursor.erl

src/mango/src/mango_idx.hrl

src/mango/src/mango_idx_special.erl

src/mango/src/mango_idx.erl

nickva

+1 outstanding work, well done, @pgj!

Record the state of index candidates in the different stages of the selection procedure in the cursor so it could be pulled out to provide a detailed analysis. The associated tracing logic tries to avoid mimicking the related parts of the algorithm but bluntly perform naive set operations to infer the changes. This rule is only broken at the last stage for `view` indexes, where that is the viable way. In order to provide useful answer even when there is no usable index at all, `_explain` is modified to return HTTP 200, in contrast to `_find` which would give a HTTP 400. Also, the name of the `covered` field is adjusted to match with that of the attributes in the `analysis` field of the candidates. Selector hints utilize the `indexable_fields/1` functions from each index module, which requires them to be exported and taught about edge cases such as empty selectors. Document the index selection in details to make more sense of the enhanced output of `_explain`.

willholley

+1. Great job @pgj

nickva · 2023-09-01T18:34:34Z

I noticed two failures in the full CI failure, both on ppc64le (Power) platform:

[2023-09-01T17:10:20.217Z] module 'mango_cursor'
[2023-09-01T17:10:24.962Z]   mango_cursor:561: create_test_ (t_create_regular)...*timed out*
[2023-09-01T17:10:24.962Z] in function proc_lib:sync_start/2 (proc_lib.erl, line 293)
[2023-09-01T17:10:24.962Z] in call from meck_proc:set_expect/2 (src/meck_proc.erl, line 110)
[2023-09-01T17:10:24.962Z] in call from meck:expect/4 (src/meck.erl, line 272)
[2023-09-01T17:10:24.962Z] in call from mango_cursor:t_create_regular/1 (src/mango_cursor.erl, line 587)
[2023-09-01T17:10:24.962Z] in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
[2023-09-01T17:10:24.962Z] in call from eunit_proc:run_test/1 (eunit_proc.erl, line 531)
[2023-09-01T17:10:24.963Z] in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 356)
[2023-09-01T17:10:24.963Z] in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 514)
[2023-09-01T17:10:24.963Z]   undefined
[2023-09-01T17:10:34.455Z] mango_cursor:562: create_test_ (t_create_user_specified_index)...[4.819 s] ok
[2023-09-01T17:10:43.255Z] mango_cursor:563: create_test_ (t_create_invalid_user_specified_index)...[4.833 s] ok

It looks like the surrounding tests are close to 5 seconds and the Power runner is usually the slowest runner. So it maybe that we need a longer timeout for those tests.

Tried to increase the timeouts a bit: #4745

pgj force-pushed the feat/explain-candidate-indexes branch 7 times, most recently from 32f8c5f to bc88d24 Compare July 6, 2023 15:02

pgj force-pushed the feat/explain-candidate-indexes branch from bc88d24 to ca32f47 Compare July 12, 2023 21:54

pgj marked this pull request as ready for review July 12, 2023 22:36

willholley reviewed Jul 13, 2023

View reviewed changes

pgj force-pushed the feat/explain-candidate-indexes branch from ca32f47 to 73e9301 Compare July 13, 2023 19:55

pgj force-pushed the feat/explain-candidate-indexes branch 3 times, most recently from 610a6c5 to 9e11baf Compare July 27, 2023 19:48

pgj force-pushed the feat/explain-candidate-indexes branch 2 times, most recently from a02d371 to 1eb04f7 Compare August 1, 2023 10:28

pgj mentioned this pull request Aug 1, 2023

feat(mango): strict index selection #4710

Draft

4 tasks

pgj requested a review from nickva August 24, 2023 20:57

nickva reviewed Aug 25, 2023

View reviewed changes

src/mango/src/mango_idx.erl Outdated Show resolved Hide resolved

pgj force-pushed the feat/explain-candidate-indexes branch 2 times, most recently from 81d7745 to bd29290 Compare August 27, 2023 00:00

nickva approved these changes Sep 1, 2023

View reviewed changes

pgj force-pushed the feat/explain-candidate-indexes branch 2 times, most recently from d9996b4 to 5b7f0d5 Compare September 1, 2023 15:26

pgj requested a review from willholley September 1, 2023 16:25

willholley approved these changes Sep 1, 2023

View reviewed changes

pgj merged commit 16bf9df into apache:main Sep 1, 2023
15 checks passed

pgj deleted the feat/explain-candidate-indexes branch September 1, 2023 16:40

nickva mentioned this pull request Sep 1, 2023

Increase timeout in mango_cursor tests #4745

Merged

mharrigan26 mentioned this pull request Nov 14, 2023

Next Fauxton Release v1.3.0 apache/couchdb-fauxton#1420

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(`mango`): extend `_explain` with candidate indexes and selector hints #4662

feat(`mango`): extend `_explain` with candidate indexes and selector hints #4662

pgj commented Jul 3, 2023 •

edited

Loading

willholley left a comment

willholley Jul 13, 2023

pgj Jul 13, 2023

pgj Sep 1, 2023

janl commented Aug 1, 2023

pgj commented Aug 1, 2023

nickva left a comment

nickva left a comment

willholley left a comment

nickva commented Sep 1, 2023 •

edited

Loading

feat(mango): extend _explain with candidate indexes and selector hints #4662

feat(mango): extend _explain with candidate indexes and selector hints #4662

Conversation

pgj commented Jul 3, 2023 • edited Loading

Checklist for the review

willholley left a comment

Choose a reason for hiding this comment

willholley Jul 13, 2023

Choose a reason for hiding this comment

pgj Jul 13, 2023

Choose a reason for hiding this comment

pgj Sep 1, 2023

Choose a reason for hiding this comment

janl commented Aug 1, 2023

pgj commented Aug 1, 2023

nickva left a comment

Choose a reason for hiding this comment

nickva left a comment

Choose a reason for hiding this comment

willholley left a comment

Choose a reason for hiding this comment

nickva commented Sep 1, 2023 • edited Loading

feat(`mango`): extend `_explain` with candidate indexes and selector hints #4662

feat(`mango`): extend `_explain` with candidate indexes and selector hints #4662

pgj commented Jul 3, 2023 •

edited

Loading

nickva commented Sep 1, 2023 •

edited

Loading