Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent execution statistics for Mango queries #4560

Closed
pgj opened this issue May 1, 2023 · 0 comments · Fixed by #4958
Closed

Inconsistent execution statistics for Mango queries #4560

pgj opened this issue May 1, 2023 · 0 comments · Fixed by #4958
Assignees

Comments

@pgj
Copy link
Contributor

pgj commented May 1, 2023

When requested, the Mango execution statistics do not always reflect their actual values but something that is less than or simply zero, most of the times. This happens on the latest version of CouchDB (as the time of writing, which is 82aa1625) and the issue can be reproduced as follows. The commands below create a simple database, called test with 25 documents of a single field, for which an index is defined.

curl -sS -X PUT "$COUCHDB_URL"/test
for i in $(jot - 1 25); do \
  curl -sS -X POST "$COUCHDB_URL"/test -H "Content-Type: application/json" -d '{"a": '"$i"'}'; done
curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_index \
  -d '{"index": {"fields": ["a"]}, "name": "a", "type": "json"}'

Then, using a selector like {"a": {"$lt": 20}} and limit of 1, and with execution statistics enabled, the total_docs_examined field becomes zero. The other _examined fields are either not implemented or not affected by queries like that. It is visible that results_returned is properly accounted at the same time.

$ curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_find \
  -d '{"execution_stats": true, "limit": 1, "selector": {"a": {"$lt": 20}}}' \
  | jq '.execution_stats'
{
  "total_keys_examined": 0,
  "total_docs_examined": 0,
  "total_quorum_docs_examined": 0,
  "results_returned": 1,
  "execution_time_ms": 1.067
}

Note that if either limit is increased or the search criteria on the a field is changed to find less documents, more realistic data is returned.

$ curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_find \
  -d '{"execution_stats": true, "limit": 20, "selector": {"a": {"$lt": 20}}}' \
  | jq '.execution_stats'
{
  "total_keys_examined": 0,
  "total_docs_examined": 20,
  "total_quorum_docs_examined": 0,
  "results_returned": 19,
  "execution_time_ms": 2.378
}

or

$ curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_find \
  -d '{"execution_stats": true, "limit": 1, "selector": {"a": {"$lt": 2}}}' \
  | jq '.execution_stats'
{
  "total_keys_examined": 0,
  "total_docs_examined": 2,
  "total_quorum_docs_examined": 0,
  "results_returned": 1,
  "execution_time_ms": 0.845
}

After some debugging, the source of the issue has been identified as the emission of stop in mango_cursor_view:handle_doc/2 when the limit reaches zero (last clause).

-spec handle_doc(#cursor{}, doc()) -> Response when
Response :: {ok, #cursor{}} | {stop, #cursor{}}.
handle_doc(#cursor{skip = S} = C, _) when S > 0 ->
{ok, C#cursor{skip = S - 1}};
handle_doc(#cursor{limit = L, execution_stats = Stats} = C, Doc) when L > 0 ->
UserFun = C#cursor.user_fun,
UserAcc = C#cursor.user_acc,
{Go, NewAcc} = UserFun({row, Doc}, UserAcc),
{Go, C#cursor{
user_acc = NewAcc,
limit = L - 1,
execution_stats = mango_execution_stats:incr_results_returned(Stats)
}};
handle_doc(C, _Doc) ->
{stop, C}.

The stop action immediately stops the processing of messages from the shards, including the shard-level statistics that are submitted in response to the complete message.

view_cb(complete, Acc) ->
% Send shard-level execution stats
ok = rexi:stream2({execution_stats, {docs_examined, get(mango_docs_examined)}}),
% Finish view output
ok = rexi:stream_last(complete),
{ok, Acc};

That is why when the limit is too low, there is not enough time to receive and process the related message. Therefore the execution statistics are not, or only partially (from a set of shards) handled.

I could not yet come up with a satisfying solution but I create this ticket to raise awareness about this bug. Changing stop to ok in mango_cursor_view:handle_doc/2 helps with the consistency but then too many results are sent by the shards which will be known to be discarded already. This potentially has an impact on the performance, although it has not been measured how much.

pgj added a commit to pgj/couchdb that referenced this issue Aug 22, 2023
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, interally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Aug 22, 2023
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
@pgj pgj self-assigned this Aug 24, 2023
pgj added a commit to pgj/couchdb that referenced this issue Sep 12, 2023
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 9, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 9, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 10, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 10, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 16, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 22, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 22, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Jan 22, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Feb 1, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Feb 20, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Feb 20, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Feb 21, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Mar 15, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit to pgj/couchdb that referenced this issue Mar 26, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes apache#4560
pgj added a commit that referenced this issue Mar 27, 2024
In case of map-reduce views, the arrival of the `complete` message
is not guaranteed for the view callback (at the shard) when a
`stop` is issued during the aggregation (at the coordinator).  Due
to that, internally collected shard-level statistics may not be
fed back to the coordinator which can cause data loss hence
inaccuracy in the overall execution statistics.

Address this issue by switching to a "rolling" model where
row-level statistics are immediately streamed back to the
coordinator.  Support mixed-version cluster upgrades by activating
this model only if requested through the map-reduce arguments and
the given shard supports that.

Fixes #4560
@pgj pgj closed this as completed in #4958 Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant