Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API versioning #2832

Open
wants to merge 327 commits into
base: main
Choose a base branch
from
Open

API versioning #2832

wants to merge 327 commits into from

Conversation

iilyak
Copy link
Contributor

@iilyak iilyak commented Apr 27, 2020

Overview

This PR implements a draft of proposal

Testing recommendations

Running make check should be sufficient. You might need to comment out the following line because elixir tests are failing for prototype/fdb-layer.

You also can run API version related test suite alone as follows exunit tests=src/chttpd/test/exunit/api_version_test.exs

The manual testing can be done using the following:

curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "X-Couch-API: 1"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_v1/_all_dbs"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "Accept: application/couchdb; _v=1,application/json"
["_users","test"]
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "X-Couch-API: 2"
{"bookmark":"123456","items":["foo","bar","baz"]}
curl -u adm:pass "http:https://127.0.0.1:15984/_v2/_all_dbs"
{"bookmark":"123456","items":["foo","bar","baz"]}
curl -u adm:pass "http:https://127.0.0.1:15984/_all_dbs" -H "Accept: application/couchdb; _v=2,application/json"
{"bookmark":"123456","items":["foo","bar","baz"]}

Related Issues or Pull Requests

Checklist

garrensmith and others added 30 commits March 2, 2020 12:26
`TimeoutFun` was already returning `{ok|stop, UserAcc}` so there was no need to
wrap it another `{ok, ...} tuple.

Also TimeoutFun was calling user with`{timeout, _ResponseType}` not just
timeout, so added a clause to handle that as well.
There are two fixes:

1) In `fabric2_fdb:get_config/1`, Db was matched before and after
`ensure_current/1`. Only the db prefix path was used, which doesn't normally
change, but it's worth fixing it anyway.

2) We used a cached version of the security document outside the transaction.
Now we force it go through a transaction to call `fabric2_fdb:get_config/1`
which call `ensure_current/1`. When done, we also update the cached Db handle.
Do the same thing for revs_limit even thought it is not as critical as
security.
This brings this to parity with master `couch_db:open/2` logic:

https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db.erl#L166

There are two separate cases that have to be handled:

 1) Db was already opened and cached. In that case, call `check_is_member(Db)`
 which reads the security from the Db to ensure we don't authorize against a
 stale security doc. Otherwise, the delay could be as long as the last write
 that went through that node. A potential future optimization could be to have
 a timestamp and only get the new security context if last refresh hasn't
 happened recently.

 2) Db was not cached, and was just opened. To avoid running another two read
 transactions to get the security doc after the main transaction finished, call
 a version of check_is_member which gets the security doc passed in as an
 argument.

As a bonus, `check_is_members(Db, SecDoc)` version ends up saving one extra
security read since we don't read twice in is_member and is_admin calls.

`delete/2` was updated to pass ?ADMIN_CTX to `open/2` since it only cares about
getting a `database_does_not_exist` error thrown. There is a check for server
admin at the HTTP API level that would care of authentication / authorization.
Do not decrement `doc_count` stat if document was previously missing, or if it
was already deleted.

Deleted documents could be brought in by replication. In that case, if there
were more replicated documents than the current `doc_count`, the `doc_count`
could even underflow the 64 bit unsigned integer range and end up somewhere in
the vicinity of 2^64. The counter, of course, would still be incorrect even if
it didn't underflow, the huge value would just make the issue more visible.
We test that a newly deleted document is replicated to the target and it bumps
the deled doc count but doesn't change doc count.

Another thing to test is that an already deleted document is replicated in
where its revision path was extended on the source then gets replicated to the
target. In that case neither del doc count not doc count are bumped.
This commit is mostly a copy paste of the existing modules in the
`couch` application. For now I've left the build of the `couchjs`
executable in `couch/priv` to avoid having to do the work of moving the
build config over. I had contemplated just referencing the modules as
they current exist but decided this would prepare us a bit better for
when we eventually remove the old modules.
These are ported over from the existing couch Eunit suite and updated to
be less racey hopefully.
This check fails if Clouseau isn't present. Though we don't need
Clouseau to perform the check so just avoid it.
set_type_timeout takes seconds as the argument but we gave it milliseconds
It was suggested in another PR's discussion:

apache#2107 (review)
To trace FDB transactions:

 1. Enable tracing in erlfdb application environment:
     `network_options = [{trace_enable, ...}]`
   OR with an environment variable:
     `FDB_NETWORK_OPTION_TRACE_ENABLE = ""`

 2. Set `[fabric] fdb_trace=true` configuration value

 3. Add the `x-couchdb-fdb-trace:true` header to each request that should be
   traced.

The transaction name is set to the nonce value, which is already used by
CouchDB to track API requests.

Only transactions started from the main request process will be traced. So if a
process is spawned without inheriting the `erlfdb_trace` process dict key, that
transaction will not be traced.
FDB's metadata version key allows more efficient metadata invalidation
(see apple/foundationdb#1213). To take advantage of
that feature update the caching logic to check the metadata version first, and
if it is current, skip checking the db version altogether.

When db version is bumped we now update the metadata version as well.

There is a bit of a subtlety when the metadata version is stale. In that case
we check the db version, and if that is current, we still don't reopen the
database, instead we continue with the transaction. Then, after the transaction
succeeds, we update the cached metadata version for that db handle. Next client
would get the updated db metadata, it will be current, and they won't need to
check the db version. If the db version is stale as well, then we throw a
`reopen` exception and the handle gets removed from the cache and reopened.

Note: this commit doesn't actually use the new metadata version key, it still
uses the old plain key. That update will be a separate commit where we also start
setting a new API version (610) and will only work on FDB version 6.1.x
Previously local docs were not chunkified and it was possible for replications
which checkpointed a few dozen times to create local documents above the 100KB
limit.

Documents are chunkiefied according to the same scheme as the regular docs --
rev values are in a main `?DB_LOCAL_DOCS` subspace, and doc body chunks in a
separate `?DB_LOCAL_DOC_BODIES` subspace that looks like:

   {?DB_LOCAL_DOC_BODIES, DocId, ChunkId} = BinaryChunk

where `ChunkId` is an incrementing integer and BinaryChunk is a 100KB chunk of
the term_to_binary of the body.

We also go to some lengths to read and silently upgrade docs written with the
old encoding. Upgrades happen on doc writes as a first step, to ensure stats
update logic is not affected.
Since we are dealing with upgrades and both versions start out as binaries,
make sure we add extra belts and suspenders to surface any issues with encoding
errors.
Need to use FDB version 6.1+ and erlfdb version that has this commit:
cloudant-labs/couchdb-erlfdb@7718a3d

Since fabric2.hrl is a public include now, use that in couch_jobs to avoid
redefining a bunch of things.
If we don't explicitly bail out of running the job it will loop
indefinitely in the couch_jobs retry logic.
Since the db structure returned from fabric2_db:open and fabric2_db:create
includes `user_ctx` there is no need to pass it explicitly
in every `fabric2_db` call. This means we can simplify few things:

- Don't pass user_ctx in `chttpd_db:db_req/2` since we pass db already
- Don't have to use `db_open_options` in `chttpd_changes`
- Don't have to pass `user_ctx` to `fabric2_db:open_doc` and `fabric2_db:update_doc`
Users should be able to replicate their partitioned dbs to the new environment.
Previously we checked security properties in a separate transaction, after
opening the db or fetching it from the cache. To avoid running an extra
transaction move the check inside the main transaction right after the metadata
check runs. That ensure it will be consistent and it won't be accidentally
missed as all operations run the `ensure_current` metadata check.

Also remove the special `get_config/2` function in `fabric2_fdb` for getting
revs limit and security properties and just read them directly from the db map.
Previously, a stale db handle could be re-used across a few separate
transactions. That would result in the database getting re-opened before every
one of those operations.

To prevent that from happening, check the cache before the transaction starts,
and if there is a newer version of the db handle and use that.
Previously `set_config/3` needed keys and values to be transalted to binaries,
now that is done inside the function. It's a bit more consistent as binary
config values and encodings are better encapsulated in the `fabric2_fdb`
module.

Since `set_config` does, it made sense to update get_config as well. There, it
turns out it was used only to load configuration setting after a db open, so
the function was renamed to `load_config` and was made private.
nickva and others added 18 commits May 29, 2020 14:22
And also against too many conflicts during overload
Check metadata versions to ensure newer handles are not clobbered. The same
thing is done for removal, `maybe_remove/1` removes handle only if there isn't
a newer handle already there.
Update db handles right away as soon we db verison is checked. This ensures
concurrent requests will get access to the current handle as soon as possible
and may avoid doing extra version checks and re-opens.
Under load accept loop can blow up with timeout error from
`erlfdb:wait(...)`(https://github.com/apache/couchdb-erlfdb/blob/master/src/erlfdb.erl#L255)
so guard against it just like we do for fdb transaction timeout (1031) errors.
Use the `no_schedule` option to speed up job dequeuing. This optimization
allows dequeuing jobs more efficiently if these conditions are met:

 1) Job IDs start with a random prefix
 2) No time-based scheduling is used

Both of those can be true for views job IDs can be generated such that
signature comes before the db name part, which is what this commit does.

The way the optimization works, is random IDs are generating in pending jobs
range, then, a key selection is used to pick either a job before or after
it. That reduces each dequeue attempt to just 1 read instead of reading up to
1000 jobs.
When waiting to accept jobs and scheduling was used, timeout is limited based
on the time scheduling parameter. When no_schedule option is used, time
scheduling parameter is set to 0 always, and so in that case, we have to
special-case the limit to return `infinity`.

Later on when we wait for the watch to fire, the actual timeout can still be
limited, by a separate user specified timeout option, but if user specifies
`infinity` there and sets `#{no_schedule => true}` then we should respect and
never return `{error, not_found}` in response.
As per ML
[discussion](https://lists.apache.org/thread.html/rb328513fb932e231cf8793f92dd1cc2269044cb73cb43a6662c464a1%40%3Cdev.couchdb.apache.org%3E)
add a `uuid` field to db info results in order to be able to uniquely identify
a particular instance of a database. When a database is deleted and re-created
with the same name, it will return a new `uuid` value.
Optimize couch_views by using a separate set of acceptors and workers.
Previously, all `max_workers` where spawned on startup, and were to
waiting to accept jobs in parallel. In a setup with a large number of
pods, and 100 workers per pod, that could lead to a lot of conflicts
being generated when all those workers race to accept the same job at
the same time.

The improvement is to spawn only a limited number of acceptors (5, by
default), then, spawn more after some of them become workers. Also,
when some workers finish or die with an error, check if more acceptors
could be spawned.

As an example, here is what might happen with `max_acceptors = 5` and
`max_workers = 100` (`A` and `W` are the current counts of acceptors
and workers, respectively):

1. Starting out:
  `A = 5, W = 0`

2. After 2 acceptors start running:
  `A = 3, W = 2`
Then immediately 2 more acceptors are spawned:
  `A = 5, W = 2`

3. After 95 workers are started:
  `A = 5, W = 95`

4. Now if 3 acceptors accept, it would look like:
  `A = 2, W = 98`
But no more acceptors would be started.

5. If the last 2 acceptors also accept jobs: `A = 0, W = 100` At this
  point no more indexing jobs can be accepted and started until at
  least one of the workers finish and exit.

6. If 1 worker exits:
  `A = 0, W = 99`
An acceptor will be immediately spawned
  `A = 1, W = 99`

7. If all 99 workers exit, it will go back to:
 `A = 5, W = 0`
In an overload scenario do not let notifiers crash and lose their subscribers,
instead make them more robust and let them retry on future or transaction
timeouts.
Add max_bulk_get_count configuration option
@iilyak
Copy link
Contributor Author

iilyak commented Jun 26, 2020

I refactored PR to parse Version earlier and added tests

@iilyak iilyak marked this pull request as ready for review June 26, 2020 17:34
@iilyak iilyak changed the title WIP PoC for API versioning PoC for API versioning Jun 26, 2020
@iilyak iilyak changed the title PoC for API versioning API versioning Jun 26, 2020
@wohali wohali changed the base branch from prototype/fdb-layer to main October 21, 2020 18:12
@bessbd
Copy link
Member

bessbd commented Mar 29, 2021

Do we want this in 3.2 or do we intend to add the version option in >3.x onwards?

@iilyak
Copy link
Contributor Author

iilyak commented Mar 29, 2021

There were no consensus https://lists.apache.org/thread.html/rcc742c0fdca0363bb338b54526045720868597ea35ee6842aef174e0%40%3Cdev.couchdb.apache.org%3E.

However Adam started new discussion about it recently https://github.com/apache/couchdb/blob/main/src/chttpd/src/chttpd_httpd_handlers.erl#L62:L495

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet