adds operators $sizeLte and $sizeGte #3582

hklarner · 2021-05-25T12:15:46Z

Overview

supports inequality matches for $size
- example: between 10 and 20 customers in array
  - "doc.customers": {"$sizeGte": 10}
  - "doc.customers": {"$sizeLte": 20}

Testing recommendations

Tests are added to

src/mango/test/03-operator-test.py

Related Issues or Pull Requests

None

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

…elixir·

We need to call StartFun as it might add headers, etc.

This fixes a94e693 because a race condition exisited where the 'DOWN' message could be received before the compactor pid is spawned. Adding a synchronous call to get the compactor pid guarantees that the couch_db_updater process handling of finish_compaction has occurred.

This implementation was difficult to understand and had behavior that was too difficult to predict. It would break if view behavior changed in significant ways from what was originally expected.

The couch_views_batch module is responsible for sensing the largest batch sizes that can be successfully processed for a given indexer process. It works by initially searching for the maximum number of documents that can be included in a batch. Once this threshold is found it then works by slowly increasing the batch size and decreasing when its found again. This approach works to maximise batch sizes while being reactive to when a larger batch would cross over the FoundationDB transaction limits which causes the entire batch to be aborted and retried which wastes time during view builds.

This test doesn't fail correctly any longer. Rather than attempting to create a new pathological view case I've just moved it to eunit where we can use meck to throw errors directly.

`after_db_create/2` and `after_db_delete/2` are when databases are created and deleted respectively. The callbacks are called with both the database name and the database instance UUID values.

Previously the data was read from the parser in the transaction. If the transaction had to retry, for example, because of a conflict, the parser would have been drained and exited resulting the request failing with a 500 "mp_parser noproc" error. Since FDB cannot handle transactions larger than 10MB opt to read the attachment data into memory first, before the transaction starts.

Also, ensure to use the same options as other couch apps: force_utf8 and dedupe_keys.

fold_jobs/4 is a generalized folding API which can be used to process all the jobs of a particular type. get_pending_count/2,3 can be used to get the count of pending jobs. It takes the same options as accept, including a limit and `max_sched_time`. Just like accept it reads the range of pending jobs as a snapshot to avoid generating read conflicts.

These modules are not used by the new replicator.

* Remove unused functions and some function used only from one place like `sum_stats/2`. * Update time functions to use the more modern `erlang:system_time/1` API. * `parse_int_param/5` and `parse_replication_states/1` was moved from the old _httpd_util module as they were they only ones need from there. * `default_headers_map/0` Used to the default httpd record headers as a map since part of the replication data will be kept as map object. * `proplist_options/1` Some parts of the replicator, like _httpc and _api_wrap still use proplist options, so this function can be used to translate options as maps to a proplist version.

This module is in responsible for parsing either an HTTP `_replicate` request body, or a _replicator doc into an internal `Rep` object (an Erlang map). `parse_transient_rep/2` parses _replicate requests. It also handles cancelations, where requests bodies look like ```{"id": ..., "cancel": true}``` instead of having all the expected parameters. `parse_rep_doc/1` parses _replicator docs. Parsing consists of 3 main parts: - Parsing the endpoint definitions: source and target url, headers, TLS bits and proxies - Parsing options into an options map, possibly using defaults from config parameters - Parsing socket parameters. These now have a hard-coded allow-list as opposed accepting all possible Erlang socket options. The parsing function also double as validation function which gets called from the _replicator's before_doc_update callback when users update replication documents. They would get an immediate feedback if their replicationd document is malformed. Everything is turned into a map object. This object should be able to be serialized and de-serialized to (from) JSON. Since maps are used, add the definitions of some common fields couch_replicator.hrl. Mistyping them should raise a compiler error. couch_replicator_docs lost all of its parsing function and also functions which update intermediate replication doc states (triggered and error). It still handles function which relate to interacting with _replicator docs.

The goal is to keep everything below the _api_wrap module level relatively intact. To achieve that handle option maps in some places, or translate back to a proplist or `#httpd{}` records in others. The `couch_replicator_api:db_from_json/1` function is where endpoint map object from a `Rep` object are translated into `#httpdb{}` records. Headers are translated back to lists and ibrowse options into proplist with atom keys.

This module is responsible for calculating replication IDs. It inspects all the replication options which may affect the replication results and hashes them into a single ID. CouchDB replicator tries to maintain compatibility with older versions of itself so it keep tracks of how to calculate replication IDs used by previous version of CouchDB. Replication ID calculation algorithms have their own version, the latest one is at version 4. One of the goals of this update is to not alter the replication ID algorithm and keep it at version 4, such that for all the same parameters the replication IDs should stay the same as they would be on CouchDB <= 3.x. That is why in some internal function, options maps and binares are turned back into proplist and tuples before hashing is performed. There is a unit tests which asserts that the replication ID calcuated with this update matches what was calcuated in CouchDB 3.x. Internal representation of the replication ID has changed slighly. Previously it was represented by a tuple of `{BaseId, ExtId}`, where `BaseId` was the ID without any options such as `continuous` or `create_target`, and `ExtId` was the concatenated list of those options. In most cases it was useful to operate on the full ID and in only a few place the `BaseId` was needed. So the calculation function was updated to return `{RepId, BaseId}` instead. `RepId` is a binary that is the full relication ID (base + extensions) and `BaseId` is just the base. The function which calculated the base ID was updated to actually be called `base_id/2` as opposed to `replication_id/2`. Another update to the module is a function which calculates replication job IDs. A `JobId` is used to identify replication jobs in the `couch_jobs` API. A `JobId`, unlike a `RepId` never requires making a network round-trip to calculate. For replications created from `_replicator` docs, `JobId` is defined as the concatenation of the database instance UUID and document ID. For a transient jobs it is calculated by hashing the source, target endpoint parameters, replication options. In fact, it is almost the same as a replication ID, with one important difference that the filter design doc name and function name are used instead of the contents of the filter from the source, so no network round-trip is necessary to calculate it.

This is the `couch_jobs` abstraction module. All replicator calls to `couch_jobs` should go through it. This module takes care of adding types to some of the API calls, handles maintencence of the RepId -> JobId mappings when jobs are added and removed, and some subscription logic. `fabric2.hrl` include file is updated with the definition of the `?REPLICATION_IDS` prefix where the RepId -> JobId keyspace lives.

The frontend is the part responsible for parsing replication parameters and creating replication jobs. Most of that happens in the `couch_replicator` module. - `couch_replicator:replicate/2` is the main API for creating transient replication jobs. - Replication jobs from `_replicator` documents are updated from `couch_replicator:after_*` callbacks. `after_db_create/2` besides being called on db creation also gets called when a database is undeleted and `add_jobs_from_db/1` function will attempt to parse them all. `couch_replicator` exports monitoring functions `docs/2,3 and jobs/0,1`. Those get called from HTTP handlers for `_scheduler/docs` and `_scheduler/jobs` respectively. For hands-on remsh access there some debuging functions such as: - rescan_jobs/0,1 : Simulates a db being re-created so all the jobs are added - reenqueue_jobs/0,1 : Deletes all the jobs from a db then re-adds them - remove_jobs/0 : Removes all the replication jobs - get_job_ids/0 : Read the RepId -> JobId mapping area

Backend replicator modules execute replication jobs. The two main modules reponsible for job management and execution are `couch_replicator_job` and `couch_replicator_job_server`. `couch_replicator_job` - Is the main process of each replication job. When this process starts, it waits in the `couch_jobs:accept/2` call. This may take an indefinite amount of time. The spawned `couch_replicator_job` waiting in accept like that is called internally an "acceptor". The main pattern of execution is multiple acceptors are started, and after some of them accept jobs, they become "workers". - After it accepts a job, it parses the `couch_jobs` job data, which contains the `Rep` object and calculates the replication ID from it. Replication ID calculation may involve making a call to the source endpoint in order to fetch the contents of the javascript filter. Then, the `Rep` object and the replication ID is used to construct the internal `#rep_state{}` state record of the `gen_server`. - Multiple replication jobs may end up trying to run the same replication (with the same replication ID) concurrently. To manage these types of colisions, `check_ownership/3` function is called to determine if the current replication is the correct `owner` of that replication. If it is not, then the job maybe fail and exit. - There is a periodic checkpoint timer which sends a `checkpoint` message. The checkpoint frequency is calculated as the minimum of the `couch_jobs` activity timeout and the configured checkpoint interval. During each checkpoint attempt, there is a call to `couch_jobs:update/3` which updates the job's data with latest state and ensure the job doesn't get re-enqueued due to inactivity. - If the job completes, then `couch_jobs:finish/3` is called and the replication process exits `normal`. If the job crashes, there is a consecutive error count field (`?ERROR_COUNT`) which, is used to calculate the backoff penalty. There is an exponential backoff schedule, that starts with the base value, then doubles, but only up to a maximum value. Both the base and the maximum values are configurable with the `min_backoff_penalty_sec` and `max_backoff_penalty_sec` settings respecively. This is an improvement from before where the users could only influence the maximum backoff penalty by reducing the number of failed events kept by each job. `couch_replicator_server` - This is a module which spawns and keeps track of `couch_replicator_job` processes. - Periodically, every `interval_sec` seconds, it runs the `reschedule` function which checks for pending jobs. If they are some, it will start acceptors to run them. If those acceptors become workers, and if the total number of workers goes above the `max_jobs` setting, the oldest `continuous` workers will be stopped until the total number of jobs falls below `max_jobs` value. - In addition to `max_jobs` limit, there is a `max_churn` limit which determines up to how many job starts to allow during each scheduling interval. As jobs are started, they reduce the available churn "budget" for that cycle and after it goes below 0 no more jobs can start until the next cycle. - This module also performs transient job cleanup. After transient jobs stop running previously they simply vanished but with this update they maybe linger for at least `transient_job_max_age_sec` seconds.

Stich everything together: the backend, frontend and http handlers. The supervisor `couch_replicator_sup` handles starting a set of fronted or backend children. It may also start both or neither. The HTTP layer for monitoring and creating jobs is simpler than before since there is rpc and clustering involved.

Update settings with defaults. Also comment out values which are already set to default in the code.

Remove sections which don't apply anymore and describe briefly how frontend and backend interact.

Update tests to use the new replicator. Also clean up redundancy and re-use some of the newer macros from fabric2 (?TDEF_FE). Make sure replicator tests are included in `make check`

Previously, in 3.x we re-parsed the endpoint URLs with `ibrowse_lib:parse_url/1` when stripping credentials, which threw an error if the URL was invalid. So we try to preserve that same logic. Backport some tests from 3.x to make sure URL stripping works when URL is valid and, also use the nicer ?TDEF and ?TDEF_FE test helpers.

Job exits are asynchronous so we ensure we wait for exit signals to be handled before checking the state.

Caught during Elixir tests. I've added a unit test to `ebtree.erl` to ensure we don't regress in the future.

This fixes compilation if CouchDB is used as a dependency.

Previously the default value for md5 field was `<<>>`. This value changed to undefined when we switch to using maps instead of erlang records. The change break the `couch_att:to_json/4` funciton because `base64:encode/1` cannot handle atom `undefined`. The issue happens for inline attachments only this is why it wasn't discovered earlier.

Handle the case when md5 field is undefined

Set the `worker_trap_exits = false` setting to ensure our replication worker pool properly cleans up worker processes. Ref: #3208

Add `buggify-elixir-suite` target to run Elixir integration tests under FoundationDB's client buggify mode [1]. In this mode, the FDB C client in the `erlfdb` application will periodically throw mostly retryable errors (`1009`, `1007`, etc). Transaction closures should properly handle retryable errors without side-effects such as re-sending response data to the user more than once or, attempt to re-read data from the socket after it was already read once. In order to avoid false positives, provide a custom .ini settings file which disables transaction timeouts (`1031` errors). Those are not retryable by default, as far as the `on_error` callback is concerned. Ff we do have timeouts set ( = 60000), it signals the FoundationDB client that we expect to handle timeouts in buggify mode, so it starts throwing them [2]. Since we don't handle those everywhere we get quite a few false positive errors. Buggify settings I believe are the default -- 25% chance to activate an error, and 25% chance of firing the error when the code passes over that section. In most test runs this should result in a pass, but sometimes, due to lingering bugs, there will be timeouts, 409 conflicts and other failures so we cannot yet turn this into a reliable integration test step. [1] https://apple.github.io/foundationdb/client-testing.html [2] https://github.com/apple/foundationdb/blob/master/fdbclient/ReadYourWrites.actor.cpp#L1191-L1194

During buggify runs we disable max tx retries by setting it to -1. That's FDB's documented way to of doing it. However, when we re-use that setting to handle restart_tx logic we don't account for -1, so we that's what this PR fixes.

Switching crypto functions to use the new ones such as: ``` crypto:hmac(Alg, Key, Message) -> crypto:mac(hmac, Alg, Key, Message) ``` To simplify Erlang 24 support, in which some crypto functions have been removed, bump the minimum version to 22. Other fixes were in dependencies: * Bumped meck to 0.9.2. New meck from upstream supports Erlang 24. Also required bumping folsom since it depends on meck * Example in passage module would not compile, so commented out the parse transform. Required bumping jaeger passage since it depends on passage

When ApiMod:acquire_map_context returns with {error, any()}, the caller(s) fail with badmatch. This commit changes the caller hierarchy so that those {error, _} are propagated back and a failing update gets retried (up to retry_limit times).

Previously, if a transaction got a `commit_unknown_result`, during the next successful attempt that result would be returned to the user. However, if the next attempt was another retryable error, then the committed result was ignored and the whole transaction would be applied again. This could result in document update transactions conflicting with themselves as described in issue #3560 To prevent that from happening we remember that there was an `commit_unknown_result` error during the course of any previous retries and try to return that result.

Solved conflicts from "cherry-pick" 3.x commit

Eliminate compiler warnings

garrensmith · 2021-05-26T13:28:48Z

src/mango/test/03-operator-test.py

@@ -75,6 +75,21 @@ def test_keymap_match(self):
 docs = self.db.find({"foo": {"$keyMapMatch": {"$eq": "aa"}}})
 self.assertEqual(len(docs), 1)

+ def test_size(self):


Could you add another test that shows it handles the case where the sizeGte value is not an integer and another when it is acting on a field that is not an array.

I have added tests for failures but I don't know how to "assert failure" in this test script (need help).

glynnbird · 2021-05-26T13:57:08Z

Thanks for the contribution @hklarner. On a broader question the new operators you are creating:

$sizeGte
$sizeLte

have no precedence in MongoDB which was the original inspiration for the MongoDB language. If I'm reading the MongoDB docs right, the MongoDB way of performing this query would be to allow the following syntax:

{
  "selector": {
     "customers": {
        "$size:": { "$gte": 20}
      }
   }
}

The advantage of this approach is that

a) we already have a $size operator.
b) we have plenty of other comparison operators $eq, $gte etc etc
c) it wouldn't be creating new operators that differ from the "MongoDB way"

Note that currently the Mango query parser doesn't support this syntax, but I'm suggesting it could be modified to, rather than adding lots of new $size... operators.

What do you thinkg @hklarner & @garrensmith?

hklarner · 2021-05-26T14:00:35Z

@glynnbird I see. Agreed, the MongoDB consistency is preferable. But I don't know how to implement it 😅

garrensmith · 2021-05-27T11:20:57Z

@hklarner take a look at some of the other operators that handle nested values. Maybe $and and look at implementing something similar.

jjrodrig · 2021-05-28T15:29:56Z

Hi @hklarner,

I was checking this. It seems quite simple to introduce nesting the behaviour for the $size operator. I've been testing with these changes and seems to work.

--- a/src/mango/src/mango_selector.erl
+++ b/src/mango/src/mango_selector.erl
@@ -145,6 +145,8 @@ norm_ops({[{<<"$keyMapMatch">>, Arg}]}) ->
 
 norm_ops({[{<<"$size">>, Arg}]}) when is_integer(Arg), Arg >= 0 ->
     {[{<<"$size">>, Arg}]};
+norm_ops({[{<<"$size">>, {_}=Arg}]}) ->
+    {[{<<"$size">>, norm_ops(Arg)}]};
 norm_ops({[{<<"$size">>, Arg}]}) ->
     ?MANGO_ERROR({bad_arg, '$size', Arg});
 
@@ -581,8 +583,10 @@ match({[{<<"$regex">>, Regex}]}, Value, _Cmp) when is_binary(Value) ->
 match({[{<<"$regex">>, _}]}, _Value, _Cmp) ->
     false;
 
-match({[{<<"$size">>, Arg}]}, Values, _Cmp) when is_list(Values) ->
+match({[{<<"$size">>, Arg}]}, Values, _Cmp) when is_list(Values), is_integer(Arg) ->
     length(Values) == Arg;
+match({[{<<"$size">>, Arg}]}, Values, _Cmp) when is_list(Values) ->
+    match(Arg, length(Values), _Cmp);
 match({[{<<"$size">>, _}]}, _Value, _Cmp) ->
     false;

I hope this could help.

hklarner · 2021-05-28T16:10:36Z

@jjrodrig that's great! Hey, because of your answer on stackoverflow I opened this PR! Thanks for helping.

bessbd · 2021-06-02T07:07:04Z

@hklarner : #3568 is very close to a merge. Do you want me to hold off with it so that this can be merged before?

hklarner · 2021-06-02T07:32:31Z

@bessbd no, because I won't be able to continue with this for at least another week. All I have to do to be up to date is pull before I continue, right?

bessbd · 2021-06-02T13:42:56Z

@bessbd no, because I won't be able to continue with this for at least another week. All I have to do to be up to date is pull before I continue, right?

That's pretty much it. In case you are changing files that are modified by the reformat, the conflict resolution may become a little difficult, but that shouldn't be a big deal.
In case you need help with anything, please let me know.

jjrodrig and others added 30 commits September 11, 2020 09:07

Port view_conflicts.js, view_errors.js and view_include_docs.js into …

ac33e85

…elixir·

Fix buffer_response=true (#3145)

56e0f9c

We need to call StartFun as it might add headers, etc.

Remove couch_rate

15ff580

This implementation was difficult to understand and had behavior that was too difficult to predict. It would break if view behavior changed in significant ways from what was originally expected.

Move error reporting test to EUnit

8cd1792

This test doesn't fail correctly any longer. Rather than attempting to create a new pathological view case I've just moved it to eunit where we can use meck to throw errors directly.

Drop Jenkins ppc64le builds (for now) (#3151)

6169104

Add after_db_create/2 and after_db_delete/2 callbacks to fabric

36a6b39

`after_db_create/2` and `after_db_delete/2` are when databases are created and deleted respectively. The callbacks are called with both the database name and the database instance UUID values.

Handle possible iodata from jiffy:encode in couch_jobs

9897cd8

Also, ensure to use the same options as other couch apps: force_utf8 and dedupe_keys.

Delete old 2.x-3.x replicator modules

4fc9a53

These modules are not used by the new replicator.

Update and cleanup default.ini replicator entries

b718d33

Update settings with defaults. Also comment out values which are already set to default in the code.

Update replicator's readme file

9926290

Remove sections which don't apply anymore and describe briefly how frontend and backend interact.

Update and clean up tests

ae85819

Update tests to use the new replicator. Also clean up redundancy and re-use some of the newer macros from fabric2 (?TDEF_FE). Make sure replicator tests are included in `make check`

Merge branch master into prototype/fdb-layer

ad93d13

Fix flaky couch_replicator_job_server tests

ffb85f1

Job exits are asynchronous so we ensure we wait for exit signals to be handled before checking the state.

Fix bug in ebtree:umerge_members/4

39aa742

Caught during Elixir tests. I've added a unit test to `ebtree.erl` to ensure we don't regress in the future.

Fix include directive in couch_views_batch_impl

5cc430f

This fixes compilation if CouchDB is used as a dependency.

Workaround dirty schedulers in run_queue stats (#3168)

f190677

Export fabric2_fdb:chunkify_binary/1,2

ddae72a

Terreii and others added 17 commits May 5, 2021 10:31

Add tests for password reqexp

1920448

Add password validation

ba8b167

Test speed up

5f8e374

Add password requirements test to skip list

3030de0

Fix bug in Replicator authentication for password contains @

b972f12

Merge pull request #3544 from cloudant/att-to-json-without-md5

f130ae1

Handle the case when md5 field is undefined

Bump ibrowse to 4.4.2 + couchdb patches

e69184a

Set the `worker_trap_exits = false` setting to ensure our replication worker pool properly cleans up worker processes. Ref: #3208

Handle disabled retries in restart_tx logic

fe7ac37

During buggify runs we disable max tx retries by setting it to -1. That's FDB's documented way to of doing it. However, when we re-use that setting to handle restart_tx logic we don't account for -1, so we that's what this PR fixes.

Re-enable skipped ConfigTest elixir tests (#3564)

d29999c

Moved some config options from httpd to chttpd

555b568

Solved conflicts from "cherry-pick" 3.x commit

Delete unused variables

9cdf1f0

Eliminate compiler warnings

adds operators $sizeLte and $sizeGte

cda55b5

hklarner mentioned this pull request May 25, 2021

adds operators $sizeLte and $sizeGte apache/couchdb-mango#48

Closed

garrensmith requested changes May 26, 2021

View reviewed changes

adds tests for operator $sizeGte for bad argument and field

1e4be26

nickva force-pushed the main branch from e41407e to a1fc807 Compare June 7, 2022 20:15

hklarner closed this by deleting the head repository Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

adds operators $sizeLte and $sizeGte #3582

adds operators $sizeLte and $sizeGte #3582

hklarner commented May 25, 2021

garrensmith May 26, 2021

hklarner May 26, 2021

glynnbird commented May 26, 2021 •

edited

Loading

hklarner commented May 26, 2021

garrensmith commented May 27, 2021

jjrodrig commented May 28, 2021

hklarner commented May 28, 2021 •

edited

Loading

bessbd commented Jun 2, 2021

hklarner commented Jun 2, 2021

bessbd commented Jun 2, 2021

adds operators $sizeLte and $sizeGte #3582

adds operators $sizeLte and $sizeGte #3582

Conversation

hklarner commented May 25, 2021

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

garrensmith May 26, 2021

Choose a reason for hiding this comment

hklarner May 26, 2021

Choose a reason for hiding this comment

glynnbird commented May 26, 2021 • edited Loading

hklarner commented May 26, 2021

garrensmith commented May 27, 2021

jjrodrig commented May 28, 2021

hklarner commented May 28, 2021 • edited Loading

bessbd commented Jun 2, 2021

hklarner commented Jun 2, 2021

bessbd commented Jun 2, 2021

glynnbird commented May 26, 2021 •

edited

Loading

hklarner commented May 28, 2021 •

edited

Loading