Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mango): extend _explain with candidate indexes and selector hints #4662

Merged
merged 1 commit into from
Sep 1, 2023

Conversation

pgj
Copy link
Contributor

@pgj pgj commented Jul 3, 2023

Execution of _find queries is often considered opaque by the users. They do not have enough insights about which index was selected and why, which can make hard to compose better queries. To address these problems, the change proposes to extend the /{db}/_explain endpoint with the following:

  • index_candidates: This attribute contains the list of all the other indexes that were not chosen for serving the query. The items in the list tell whether the given index is usable, what the reason was for its exclusion, and how "far" it is from being selected.
  • selector_hints: This attribute shows which fields are indexable and unindexable in the selector per each (available) index kind.

Warning:

  • The covered attribute in the _explain output has been changed to covering so that it now matches the attribute name used in the analysis.
  • The _explain will return HTTP 200 even if there is no usable index. This makes it different from how _find would respond to the same set of parameters.

Samples, index_candidates:

  "index_candidates": [
    {
      "index": {
        "ddoc": null,
        "name": "_all_docs",
        "type": "special",
        "def": {
          "fields": [
            {
              "_id": "asc"
            }
          ]
        }
      },
     "analysis": {
        "usable": false,
        "reasons": [
          {
            "name": "unfavored_type"
          }
        ],
        "ranking": 1,
        "covering": null
     }
    },
    {
      "index": {
        "ddoc": "_design/age",
        "name": "age",
        "type": "json",
        "partitioned": false,
        "def": {
          "fields": [
            {
              "age": "asc"
            }
          ]
        }
      },
      "analysis": {
        "usable": false,
        "reasons": [
          {
            "name": "field_mismatch"
          }
        ],
        "ranking": 2,
        "covering": false
      }
    }
  ]

and selector_hints:

  "selector_hints": [
    {
      "type": "json",
      "indexable_fields": [],
      "unindexable_fields": [
        "name.first"
      ]
    },
    {
      "type": "text",
      "indexable_fields": [
        "name.first"
      ],
      "unindexable_fields": []
    },
    {
      "type": "nouveau",
      "indexable_fields": [
        "name.first"
      ],
      "unindexable_fields": []
    }
  ]

Based on this information, Fauxton could display more useful hints and tips for the users on building the queries.

As an extra, the unit test coverage is increased throughout the Mango modules. Brief stats (excerpt from make eunit apps=mango):

  • number of tests: 173 ⟶ 195
  • mango_cursor: 19% ⟶ 86%
  • mango_cursor_view: 99% ⟶ 100%
  • mango_cursor_special: 0% ⟶ 100%
  • mango_idx: 20% ⟶ 54%
  • mango_idx_special: 0% ⟶ 48%
  • mango_idx_text: 63% ⟶ 65%
  • mango_idx_view: 40% ⟶ 52%
  • total: 47% ⟶ 55%

Checklist for the review

  • Code is written and works correctly
  • Changes are covered by tests
  • Documentation changes were made in the src/docs folder

@pgj pgj force-pushed the feat/explain-candidate-indexes branch 7 times, most recently from 32f8c5f to bc88d24 Compare July 6, 2023 15:02
@pgj pgj force-pushed the feat/explain-candidate-indexes branch from bc88d24 to ca32f47 Compare July 12, 2023 21:54
@pgj pgj marked this pull request as ready for review July 12, 2023 22:36
Copy link
Member

@willholley willholley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few notes on the documentation side, though suggest somebody more experienced comments on the Erlang. Broadly, this looks like a really useful enhancement!

src/docs/src/api/database/find.rst Outdated Show resolved Hide resolved
src/docs/src/api/database/find.rst Outdated Show resolved Hide resolved
src/docs/src/api/database/find.rst Outdated Show resolved Hide resolved
src/docs/src/api/database/find.rst Outdated Show resolved Hide resolved
src/docs/src/api/database/find.rst Outdated Show resolved Hide resolved
Comment on lines 1606 to 1608
- ``"text"``: The selector must contain references to fields that are
indexable. Sometimes it cannot be pre-determined if the given field
would exist in the text index, hence it will not be supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear what "fields that are indexable" means here.

Can you provide an example of a situation where we cannot determine whether a field will exist in a text index?

Suggested change
- ``"text"``: The selector must contain references to fields that are
indexable. Sometimes it cannot be pre-determined if the given field
would exist in the text index, hence it will not be supported.
- ``"text"``: The index must contain fields that are referenced by the query ``"selector"`` or ``"sort"``. Sometimes it cannot be predetermined that a field will exist in a text index, in which case the text index will not be supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that using the concept "indexable" does not help here. The corresponding code has a relatively complicated logic to determine whether a field in the selector is indexable, I might be failing to translate it properly.

Mango queries first translated to some abstract intermediate data structure which is then translated further to its counterpart in Lucene. These are the specific Erlang function alternatives where a field is excluded, with the respective comments to explain (from mango_idx_text):

% forces "$exists" : false to use _all_docs
indexable_fields(_, {op_not, {_, false}}) ->
    [];
% fieldname.[]:length is not a user[-]defined field.
indexable_fields(Fields, {op_field, {[_, <<":length">>], _}}) ->
    Fields;
% ...
% In this particular case, the [L]ucene index is doing a field_exists query
% so it is looking at all sorts of combinations of field:* and field.*
% We don't add the field because we cannot predetermine what field will exist.
% Hence we just return Fields and make it less restrictive.
indexable_fields(Fields, {op_fieldname, {_, _}}) ->
    Fields;
% Similar idea to op_fieldname but with fieldname:null
indexable_fields(Fields, {op_null, {_, _}}) ->
    Fields;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, here is the conclusion of the related off-line discussion with @willholley: remove the sentence "Sometimes it cannot be predetermined if the given field will exist in the text index, in which case the text index will not be supported" because this is not proven that I can happen in practice.

src/docs/src/api/database/find.rst Outdated Show resolved Hide resolved
src/docs/src/api/database/find.rst Show resolved Hide resolved
@pgj pgj force-pushed the feat/explain-candidate-indexes branch from ca32f47 to 73e9301 Compare July 13, 2023 19:55
@pgj pgj force-pushed the feat/explain-candidate-indexes branch 3 times, most recently from 610a6c5 to 9e11baf Compare July 27, 2023 19:48
@pgj pgj force-pushed the feat/explain-candidate-indexes branch 2 times, most recently from a02d371 to 1eb04f7 Compare August 1, 2023 10:28
@janl
Copy link
Member

janl commented Aug 1, 2023

The covered attribute in the _explain output has been changed to covering so that it now matches the attribute name used in the analysis.

Just checking: have we made a release since the covered property was introduced? If not, then renaming is not controversial, if yes, then this is strictly an API change and would at least require a lazy consensus vote on dev@.

@pgj
Copy link
Contributor Author

pgj commented Aug 1, 2023

I do not think that there was a CouchDB release with this property. It was introduced to main on April 19 and since then only patch-level releases were made for the 3.2 and 3.3 branches for which the corresponding feature was not backported.

@pgj pgj mentioned this pull request Aug 1, 2023
4 tasks
@pgj pgj requested a review from nickva August 24, 2023 20:57
Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a few cursory comments from first glance over the code. I am not as familiar with mango so most of it is Erlang style nits and such.

Good point @janl about cover/covering API compatibility. I double-checked version 3.3.2 as well to confirm we didn't release the previous version yet so we should be clear there.

src/mango/src/mango_idx.erl Outdated Show resolved Hide resolved
src/mango/src/mango_cursor.erl Outdated Show resolved Hide resolved
src/mango/src/mango_cursor.erl Outdated Show resolved Hide resolved
src/mango/src/mango_cursor.hrl Outdated Show resolved Hide resolved
src/mango/src/mango.hrl Outdated Show resolved Hide resolved
src/mango/src/mango_cursor.erl Outdated Show resolved Hide resolved
src/mango/src/mango_cursor.erl Outdated Show resolved Hide resolved
src/mango/src/mango_cursor.erl Outdated Show resolved Hide resolved
src/mango/src/mango_idx.hrl Outdated Show resolved Hide resolved
src/mango/src/mango_idx_special.erl Show resolved Hide resolved
src/mango/src/mango_idx.erl Outdated Show resolved Hide resolved
@pgj pgj force-pushed the feat/explain-candidate-indexes branch 2 times, most recently from 81d7745 to bd29290 Compare August 27, 2023 00:00
Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 outstanding work, well done, @pgj!

Record the state of index candidates in the different stages of
the selection procedure in the cursor so it could be pulled out
to provide a detailed analysis.  The associated tracing logic
tries to avoid mimicking the related parts of the algorithm but
bluntly perform naive set operations to infer the changes.  This
rule is only broken at the last stage for `view` indexes, where
that is the viable way.

In order to provide useful answer even when there is no usable
index at all, `_explain` is modified to return HTTP 200, in
contrast to `_find` which would give a HTTP 400.  Also, the
name of the `covered` field is adjusted to match with that of
the attributes in the `analysis` field of the candidates.

Selector hints utilize the `indexable_fields/1` functions from
each index module, which requires them to be exported and taught
about edge cases such as empty selectors.

Document the index selection in details to make more sense of the
enhanced output of `_explain`.
@pgj pgj force-pushed the feat/explain-candidate-indexes branch 2 times, most recently from d9996b4 to 5b7f0d5 Compare September 1, 2023 15:26
@pgj pgj requested a review from willholley September 1, 2023 16:25
Copy link
Member

@willholley willholley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Great job @pgj

@pgj pgj merged commit 16bf9df into apache:main Sep 1, 2023
15 checks passed
@pgj pgj deleted the feat/explain-candidate-indexes branch September 1, 2023 16:40
@nickva
Copy link
Contributor

nickva commented Sep 1, 2023

I noticed two failures in the full CI failure, both on ppc64le (Power) platform:

[2023-09-01T17:10:20.217Z] module 'mango_cursor'
[2023-09-01T17:10:24.962Z]   mango_cursor:561: create_test_ (t_create_regular)...*timed out*
[2023-09-01T17:10:24.962Z] in function proc_lib:sync_start/2 (proc_lib.erl, line 293)
[2023-09-01T17:10:24.962Z] in call from meck_proc:set_expect/2 (src/meck_proc.erl, line 110)
[2023-09-01T17:10:24.962Z] in call from meck:expect/4 (src/meck.erl, line 272)
[2023-09-01T17:10:24.962Z] in call from mango_cursor:t_create_regular/1 (src/mango_cursor.erl, line 587)
[2023-09-01T17:10:24.962Z] in call from eunit_test:run_testfun/1 (eunit_test.erl, line 71)
[2023-09-01T17:10:24.962Z] in call from eunit_proc:run_test/1 (eunit_proc.erl, line 531)
[2023-09-01T17:10:24.963Z] in call from eunit_proc:with_timeout/3 (eunit_proc.erl, line 356)
[2023-09-01T17:10:24.963Z] in call from eunit_proc:handle_test/2 (eunit_proc.erl, line 514)
[2023-09-01T17:10:24.963Z]   undefined
[2023-09-01T17:10:34.455Z] mango_cursor:562: create_test_ (t_create_user_specified_index)...[4.819 s] ok
[2023-09-01T17:10:43.255Z] mango_cursor:563: create_test_ (t_create_invalid_user_specified_index)...[4.833 s] ok

It looks like the surrounding tests are close to 5 seconds and the Power runner is usually the slowest runner. So it maybe that we need a longer timeout for those tests.

Tried to increase the timeouts a bit: #4745

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants