-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(mango
): extend _explain
with candidate indexes and selector hints
#4662
Conversation
32f8c5f
to
bc88d24
Compare
bc88d24
to
ca32f47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few notes on the documentation side, though suggest somebody more experienced comments on the Erlang. Broadly, this looks like a really useful enhancement!
src/docs/src/api/database/find.rst
Outdated
- ``"text"``: The selector must contain references to fields that are | ||
indexable. Sometimes it cannot be pre-determined if the given field | ||
would exist in the text index, hence it will not be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely clear what "fields that are indexable" means here.
Can you provide an example of a situation where we cannot determine whether a field will exist in a text index?
- ``"text"``: The selector must contain references to fields that are | |
indexable. Sometimes it cannot be pre-determined if the given field | |
would exist in the text index, hence it will not be supported. | |
- ``"text"``: The index must contain fields that are referenced by the query ``"selector"`` or ``"sort"``. Sometimes it cannot be predetermined that a field will exist in a text index, in which case the text index will not be supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that using the concept "indexable" does not help here. The corresponding code has a relatively complicated logic to determine whether a field in the selector is indexable, I might be failing to translate it properly.
Mango queries first translated to some abstract intermediate data structure which is then translated further to its counterpart in Lucene. These are the specific Erlang function alternatives where a field is excluded, with the respective comments to explain (from mango_idx_text
):
% forces "$exists" : false to use _all_docs
indexable_fields(_, {op_not, {_, false}}) ->
[];
% fieldname.[]:length is not a user[-]defined field.
indexable_fields(Fields, {op_field, {[_, <<":length">>], _}}) ->
Fields;
% ...
% In this particular case, the [L]ucene index is doing a field_exists query
% so it is looking at all sorts of combinations of field:* and field.*
% We don't add the field because we cannot predetermine what field will exist.
% Hence we just return Fields and make it less restrictive.
indexable_fields(Fields, {op_fieldname, {_, _}}) ->
Fields;
% Similar idea to op_fieldname but with fieldname:null
indexable_fields(Fields, {op_null, {_, _}}) ->
Fields;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, here is the conclusion of the related off-line discussion with @willholley: remove the sentence "Sometimes it cannot be predetermined if the given field will exist in the text index, in which case the text index will not be supported" because this is not proven that I can happen in practice.
ca32f47
to
73e9301
Compare
610a6c5
to
9e11baf
Compare
a02d371
to
1eb04f7
Compare
Just checking: have we made a release since the |
I do not think that there was a CouchDB release with this property. It was introduced to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a few cursory comments from first glance over the code. I am not as familiar with mango so most of it is Erlang style nits and such.
Good point @janl about cover/covering API compatibility. I double-checked version 3.3.2 as well to confirm we didn't release the previous version yet so we should be clear there.
81d7745
to
bd29290
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 outstanding work, well done, @pgj!
Record the state of index candidates in the different stages of the selection procedure in the cursor so it could be pulled out to provide a detailed analysis. The associated tracing logic tries to avoid mimicking the related parts of the algorithm but bluntly perform naive set operations to infer the changes. This rule is only broken at the last stage for `view` indexes, where that is the viable way. In order to provide useful answer even when there is no usable index at all, `_explain` is modified to return HTTP 200, in contrast to `_find` which would give a HTTP 400. Also, the name of the `covered` field is adjusted to match with that of the attributes in the `analysis` field of the candidates. Selector hints utilize the `indexable_fields/1` functions from each index module, which requires them to be exported and taught about edge cases such as empty selectors. Document the index selection in details to make more sense of the enhanced output of `_explain`.
d9996b4
to
5b7f0d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Great job @pgj
I noticed two failures in the full CI failure, both on ppc64le (Power) platform:
It looks like the surrounding tests are close to 5 seconds and the Power runner is usually the slowest runner. So it maybe that we need a longer timeout for those tests. Tried to increase the timeouts a bit: #4745 |
Execution of
_find
queries is often considered opaque by the users. They do not have enough insights about which index was selected and why, which can make hard to compose better queries. To address these problems, the change proposes to extend the/{db}/_explain
endpoint with the following:index_candidates
: This attribute contains the list of all the other indexes that were not chosen for serving the query. The items in the list tell whether the given index is usable, what the reason was for its exclusion, and how "far" it is from being selected.selector_hints
: This attribute shows which fields are indexable and unindexable in the selector per each (available) index kind.Warning:
covered
attribute in the_explain
output has been changed tocovering
so that it now matches the attribute name used in the analysis._explain
will return HTTP 200 even if there is no usable index. This makes it different from how_find
would respond to the same set of parameters.Samples,
index_candidates
:and
selector_hints
:Based on this information, Fauxton could display more useful hints and tips for the users on building the queries.
As an extra, the unit test coverage is increased throughout the Mango modules. Brief stats (excerpt from
make eunit apps=mango
):mango_cursor
: 19% ⟶ 86%mango_cursor_view
: 99% ⟶ 100%mango_cursor_special
: 0% ⟶ 100%mango_idx
: 20% ⟶ 54%mango_idx_special
: 0% ⟶ 48%mango_idx_text
: 63% ⟶ 65%mango_idx_view
: 40% ⟶ 52%Checklist for the review
src/docs
folder