API versioning #2832

iilyak · 2020-04-27T19:39:17Z

Overview

This PR implements a draft of proposal

Testing recommendations

Running make check should be sufficient. You might need to comment out the following line because elixir tests are failing for prototype/fdb-layer.

You also can run API version related test suite alone as follows exunit tests=src/chttpd/test/exunit/api_version_test.exs

The manual testing can be done using the following:

curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "X-Couch-API: 1"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_v1/_all_dbs"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "Accept: application/couchdb; _v=1,application/json"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "X-Couch-API: 2"
{"bookmark":"123456","items":["foo","bar","baz"]}
curl -u adm:pass "http:https://127.0.0.1:15984/_v2/_all_dbs"
{"bookmark":"123456","items":["foo","bar","baz"]}
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "Accept: application/couchdb; _v=2,application/json"
{"bookmark":"123456","items":["foo","bar","baz"]}

Related Issues or Pull Requests

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

`TimeoutFun` was already returning `{ok|stop, UserAcc}` so there was no need to wrap it another `{ok, ...} tuple. Also TimeoutFun was calling user with`{timeout, _ResponseType}` not just timeout, so added a clause to handle that as well.

There are two fixes: 1) In `fabric2_fdb:get_config/1`, Db was matched before and after `ensure_current/1`. Only the db prefix path was used, which doesn't normally change, but it's worth fixing it anyway. 2) We used a cached version of the security document outside the transaction. Now we force it go through a transaction to call `fabric2_fdb:get_config/1` which call `ensure_current/1`. When done, we also update the cached Db handle. Do the same thing for revs_limit even thought it is not as critical as security.

This brings this to parity with master `couch_db:open/2` logic: https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db.erl#L166 There are two separate cases that have to be handled: 1) Db was already opened and cached. In that case, call `check_is_member(Db)` which reads the security from the Db to ensure we don't authorize against a stale security doc. Otherwise, the delay could be as long as the last write that went through that node. A potential future optimization could be to have a timestamp and only get the new security context if last refresh hasn't happened recently. 2) Db was not cached, and was just opened. To avoid running another two read transactions to get the security doc after the main transaction finished, call a version of check_is_member which gets the security doc passed in as an argument. As a bonus, `check_is_members(Db, SecDoc)` version ends up saving one extra security read since we don't read twice in is_member and is_admin calls. `delete/2` was updated to pass ?ADMIN_CTX to `open/2` since it only cares about getting a `database_does_not_exist` error thrown. There is a check for server admin at the HTTP API level that would care of authentication / authorization.

Do not decrement `doc_count` stat if document was previously missing, or if it was already deleted. Deleted documents could be brought in by replication. In that case, if there were more replicated documents than the current `doc_count`, the `doc_count` could even underflow the 64 bit unsigned integer range and end up somewhere in the vicinity of 2^64. The counter, of course, would still be incorrect even if it didn't underflow, the huge value would just make the issue more visible.

We test that a newly deleted document is replicated to the target and it bumps the deled doc count but doesn't change doc count. Another thing to test is that an already deleted document is replicated in where its revision path was extended on the source then gets replicated to the target. In that case neither del doc count not doc count are bumped.

This commit is mostly a copy paste of the existing modules in the `couch` application. For now I've left the build of the `couchjs` executable in `couch/priv` to avoid having to do the work of moving the build config over. I had contemplated just referencing the modules as they current exist but decided this would prepare us a bit better for when we eventually remove the old modules.

These are ported over from the existing couch Eunit suite and updated to be less racey hopefully.

This check fails if Clouseau isn't present. Though we don't need Clouseau to perform the check so just avoid it.

set_type_timeout takes seconds as the argument but we gave it milliseconds

It was suggested in another PR's discussion: apache#2107 (review)

To trace FDB transactions: 1. Enable tracing in erlfdb application environment: `network_options = [{trace_enable, ...}]` OR with an environment variable: `FDB_NETWORK_OPTION_TRACE_ENABLE = ""` 2. Set `[fabric] fdb_trace=true` configuration value 3. Add the `x-couchdb-fdb-trace:true` header to each request that should be traced. The transaction name is set to the nonce value, which is already used by CouchDB to track API requests. Only transactions started from the main request process will be traced. So if a process is spawned without inheriting the `erlfdb_trace` process dict key, that transaction will not be traced.

FDB's metadata version key allows more efficient metadata invalidation (see apple/foundationdb#1213). To take advantage of that feature update the caching logic to check the metadata version first, and if it is current, skip checking the db version altogether. When db version is bumped we now update the metadata version as well. There is a bit of a subtlety when the metadata version is stale. In that case we check the db version, and if that is current, we still don't reopen the database, instead we continue with the transaction. Then, after the transaction succeeds, we update the cached metadata version for that db handle. Next client would get the updated db metadata, it will be current, and they won't need to check the db version. If the db version is stale as well, then we throw a `reopen` exception and the handle gets removed from the cache and reopened. Note: this commit doesn't actually use the new metadata version key, it still uses the old plain key. That update will be a separate commit where we also start setting a new API version (610) and will only work on FDB version 6.1.x

Previously local docs were not chunkified and it was possible for replications which checkpointed a few dozen times to create local documents above the 100KB limit. Documents are chunkiefied according to the same scheme as the regular docs -- rev values are in a main `?DB_LOCAL_DOCS` subspace, and doc body chunks in a separate `?DB_LOCAL_DOC_BODIES` subspace that looks like: {?DB_LOCAL_DOC_BODIES, DocId, ChunkId} = BinaryChunk where `ChunkId` is an incrementing integer and BinaryChunk is a 100KB chunk of the term_to_binary of the body. We also go to some lengths to read and silently upgrade docs written with the old encoding. Upgrades happen on doc writes as a first step, to ensure stats update logic is not affected.

Since we are dealing with upgrades and both versions start out as binaries, make sure we add extra belts and suspenders to surface any issues with encoding errors.

Need to use FDB version 6.1+ and erlfdb version that has this commit: cloudant-labs/couchdb-erlfdb@7718a3d Since fabric2.hrl is a public include now, use that in couch_jobs to avoid redefining a bunch of things.

If we don't explicitly bail out of running the job it will loop indefinitely in the couch_jobs retry logic.

Since the db structure returned from fabric2_db:open and fabric2_db:create includes `user_ctx` there is no need to pass it explicitly in every `fabric2_db` call. This means we can simplify few things: - Don't pass user_ctx in `chttpd_db:db_req/2` since we pass db already - Don't have to use `db_open_options` in `chttpd_changes` - Don't have to pass `user_ctx` to `fabric2_db:open_doc` and `fabric2_db:update_doc`

Users should be able to replicate their partitioned dbs to the new environment.

Previously we checked security properties in a separate transaction, after opening the db or fetching it from the cache. To avoid running an extra transaction move the check inside the main transaction right after the metadata check runs. That ensure it will be consistent and it won't be accidentally missed as all operations run the `ensure_current` metadata check. Also remove the special `get_config/2` function in `fabric2_fdb` for getting revs limit and security properties and just read them directly from the db map.

Previously, a stale db handle could be re-used across a few separate transactions. That would result in the database getting re-opened before every one of those operations. To prevent that from happening, check the cache before the transaction starts, and if there is a newer version of the db handle and use that.

Previously `set_config/3` needed keys and values to be transalted to binaries, now that is done inside the function. It's a bit more consistent as binary config values and encodings are better encapsulated in the `fabric2_fdb` module. Since `set_config` does, it made sense to update get_config as well. There, it turns out it was used only to load configuration setting after a db open, so the function was renamed to `load_config` and was made private.

And also against too many conflicts during overload

Check metadata versions to ensure newer handles are not clobbered. The same thing is done for removal, `maybe_remove/1` removes handle only if there isn't a newer handle already there.

Update db handles right away as soon we db verison is checked. This ensures concurrent requests will get access to the current handle as soon as possible and may avoid doing extra version checks and re-opens.

Under load accept loop can blow up with timeout error from `erlfdb:wait(...)`(https://github.com/apache/couchdb-erlfdb/blob/master/src/erlfdb.erl#L255) so guard against it just like we do for fdb transaction timeout (1031) errors.

Use the `no_schedule` option to speed up job dequeuing. This optimization allows dequeuing jobs more efficiently if these conditions are met: 1) Job IDs start with a random prefix 2) No time-based scheduling is used Both of those can be true for views job IDs can be generated such that signature comes before the db name part, which is what this commit does. The way the optimization works, is random IDs are generating in pending jobs range, then, a key selection is used to pick either a job before or after it. That reduces each dequeue attempt to just 1 read instead of reading up to 1000 jobs.

When waiting to accept jobs and scheduling was used, timeout is limited based on the time scheduling parameter. When no_schedule option is used, time scheduling parameter is set to 0 always, and so in that case, we have to special-case the limit to return `infinity`. Later on when we wait for the watch to fire, the actual timeout can still be limited, by a separate user specified timeout option, but if user specifies `infinity` there and sets `#{no_schedule => true}` then we should respect and never return `{error, not_found}` in response.

As per ML [discussion](https://lists.apache.org/thread.html/rb328513fb932e231cf8793f92dd1cc2269044cb73cb43a6662c464a1%40%3Cdev.couchdb.apache.org%3E) add a `uuid` field to db info results in order to be able to uniquely identify a particular instance of a database. When a database is deleted and re-created with the same name, it will return a new `uuid` value.

Optimize couch_views by using a separate set of acceptors and workers. Previously, all `max_workers` where spawned on startup, and were to waiting to accept jobs in parallel. In a setup with a large number of pods, and 100 workers per pod, that could lead to a lot of conflicts being generated when all those workers race to accept the same job at the same time. The improvement is to spawn only a limited number of acceptors (5, by default), then, spawn more after some of them become workers. Also, when some workers finish or die with an error, check if more acceptors could be spawned. As an example, here is what might happen with `max_acceptors = 5` and `max_workers = 100` (`A` and `W` are the current counts of acceptors and workers, respectively): 1. Starting out: `A = 5, W = 0` 2. After 2 acceptors start running: `A = 3, W = 2` Then immediately 2 more acceptors are spawned: `A = 5, W = 2` 3. After 95 workers are started: `A = 5, W = 95` 4. Now if 3 acceptors accept, it would look like: `A = 2, W = 98` But no more acceptors would be started. 5. If the last 2 acceptors also accept jobs: `A = 0, W = 100` At this point no more indexing jobs can be accepted and started until at least one of the workers finish and exit. 6. If 1 worker exits: `A = 0, W = 99` An acceptor will be immediately spawned `A = 1, W = 99` 7. If all 99 workers exit, it will go back to: `A = 5, W = 0`

In an overload scenario do not let notifiers crash and lose their subscribers, instead make them more robust and let them retry on future or transaction timeouts.

https://github.com/apache/couchdb-erlfdb/releases/tag/v1.2.2

Add max_bulk_get_count configuration option

iilyak · 2020-06-26T11:34:56Z

I refactored PR to parse Version earlier and added tests

bessbd · 2021-03-29T13:17:35Z

Do we want this in 3.2 or do we intend to add the version option in >3.x onwards?

iilyak · 2021-03-29T13:48:14Z

There were no consensus https://lists.apache.org/thread.html/rcc742c0fdca0363bb338b54526045720868597ea35ee6842aef174e0%40%3Cdev.couchdb.apache.org%3E.

However Adam started new discussion about it recently https://github.com/apache/couchdb/blob/main/src/chttpd/src/chttpd_httpd_handlers.erl#L62:L495

garrensmith and others added 30 commits March 2, 2020 12:26

Fetch docs in parallel for view indexing

4e4b1a8

Fix _changes heartbeat option

3b9a0e3

`TimeoutFun` was already returning `{ok|stop, UserAcc}` so there was no need to wrap it another `{ok, ...} tuple. Also TimeoutFun was calling user with`{timeout, _ResponseType}` not just timeout, so added a clause to handle that as well.

Implement setting and getting _revs_limit

95ae5ba

Add revision stemming for interactive docs

f920172

Add couch_eval abstraction layer

3ccec82

Implement couch_js callbacks for couch_eval

6574a4d

Add tests for couch_js application

8805234

These are ported over from the existing couch Eunit suite and updated to be less racey hopefully.

Update couch_views to use couch_eval

388c114

Fix mango index validation

31dd2b7

This check fails if Clouseau isn't present. Though we don't need Clouseau to perform the check so just avoid it.

Fix timeout in couch_views

8a972dd

set_type_timeout takes seconds as the argument but we gave it milliseconds

DRY out CouchDB FDB prefix fetching

831555f

It was suggested in another PR's discussion: apache#2107 (review)

Use a shorter name for create_or_open_couchdb_dir

168126b

Remove compiler warning

7e881e2

Add a special error for an invalid legacy local doc revsion

253a48b

Since we are dealing with upgrades and both versions start out as binaries, make sure we add extra belts and suspenders to surface any issues with encoding errors.

add test to prove we can view swap

a1425f0

Use "\xFF/metadataVersion" key for checking metadata

31f88cf

Need to use FDB version 6.1+ and erlfdb version that has this commit: cloudant-labs/couchdb-erlfdb@7718a3d Since fabric2.hrl is a public include now, use that in couch_jobs to avoid redefining a bunch of things.

Abandon a view job if the db or ddoc is deleted

eb4d0d8

If we don't explicitly bail out of running the job it will loop indefinitely in the couch_jobs retry logic.

Ensure we can create partitioned design docs with FDB

c6f5d82

Users should be able to replicate their partitioned dbs to the new environment.

nickva and others added 18 commits May 29, 2020 14:22

Protect couch_jobs activity monitor against timeouts as well

b9aa2c6

Guard couch_jobs:accept_loop timing out

6501709

And also against too many conflicts during overload

Prevent eviction of newer handles from fabric_server cache

03210b0

Check metadata versions to ensure newer handles are not clobbered. The same thing is done for removal, `maybe_remove/1` removes handle only if there isn't a newer handle already there.

Remove on_commit handler from fabric2_fdb

9f8b4fd

Update db handles right away as soon we db verison is checked. This ensures concurrent requests will get access to the current handle as soon as possible and may avoid doing extra version checks and re-opens.

Handle transaction and future timeouts in couch_jobs notifiers

b17bc49

In an overload scenario do not let notifiers crash and lose their subscribers, instead make them more robust and let them retry on future or transaction timeouts.

Bump erlfdb to v1.2.2

fa16e6a

https://github.com/apache/couchdb-erlfdb/releases/tag/v1.2.2

add back r and w options

3423d19

Reserve aegis namespace under ?CLUSTER_CONFIG

7e008d0

Add max_bulk_get_count configuration option

afbe32e

Merge pull request apache#2960 from cloudant/add-max_bulk_get_count

5c7dbc4

Add max_bulk_get_count configuration option

PoC for API versioning

03ecce8

Get API version earlier

36c95bb

Add tests

2bab09a

iilyak force-pushed the api-versioning branch from 6630114 to 2bab09a Compare June 26, 2020 11:34

Handle unknown version error

2d7ad7d

iilyak marked this pull request as ready for review June 26, 2020 17:34

Fix meck invocation

5dc150b

iilyak force-pushed the api-versioning branch from 9d0cd08 to 5dc150b Compare June 26, 2020 18:19

iilyak changed the title ~~WIP PoC for API versioning~~ PoC for API versioning Jun 26, 2020

iilyak changed the title ~~PoC for API versioning~~ API versioning Jun 26, 2020

wohali changed the base branch from prototype/fdb-layer to main October 21, 2020 18:12

nickva force-pushed the main branch from e41407e to a1fc807 Compare June 7, 2022 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API versioning #2832

API versioning #2832

iilyak commented Apr 27, 2020 •

edited

Loading

iilyak commented Jun 26, 2020 •

edited

Loading

bessbd commented Mar 29, 2021 •

edited

Loading

iilyak commented Mar 29, 2021

API versioning #2832

Are you sure you want to change the base?

API versioning #2832

Conversation

iilyak commented Apr 27, 2020 • edited Loading

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

iilyak commented Jun 26, 2020 • edited Loading

bessbd commented Mar 29, 2021 • edited Loading

iilyak commented Mar 29, 2021

iilyak commented Apr 27, 2020 •

edited

Loading

iilyak commented Jun 26, 2020 •

edited

Loading

bessbd commented Mar 29, 2021 •

edited

Loading