-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API versioning #2832
base: main
Are you sure you want to change the base?
API versioning #2832
Conversation
`TimeoutFun` was already returning `{ok|stop, UserAcc}` so there was no need to wrap it another `{ok, ...} tuple. Also TimeoutFun was calling user with`{timeout, _ResponseType}` not just timeout, so added a clause to handle that as well.
There are two fixes: 1) In `fabric2_fdb:get_config/1`, Db was matched before and after `ensure_current/1`. Only the db prefix path was used, which doesn't normally change, but it's worth fixing it anyway. 2) We used a cached version of the security document outside the transaction. Now we force it go through a transaction to call `fabric2_fdb:get_config/1` which call `ensure_current/1`. When done, we also update the cached Db handle. Do the same thing for revs_limit even thought it is not as critical as security.
This brings this to parity with master `couch_db:open/2` logic: https://github.com/apache/couchdb/blob/master/src/couch/src/couch_db.erl#L166 There are two separate cases that have to be handled: 1) Db was already opened and cached. In that case, call `check_is_member(Db)` which reads the security from the Db to ensure we don't authorize against a stale security doc. Otherwise, the delay could be as long as the last write that went through that node. A potential future optimization could be to have a timestamp and only get the new security context if last refresh hasn't happened recently. 2) Db was not cached, and was just opened. To avoid running another two read transactions to get the security doc after the main transaction finished, call a version of check_is_member which gets the security doc passed in as an argument. As a bonus, `check_is_members(Db, SecDoc)` version ends up saving one extra security read since we don't read twice in is_member and is_admin calls. `delete/2` was updated to pass ?ADMIN_CTX to `open/2` since it only cares about getting a `database_does_not_exist` error thrown. There is a check for server admin at the HTTP API level that would care of authentication / authorization.
Do not decrement `doc_count` stat if document was previously missing, or if it was already deleted. Deleted documents could be brought in by replication. In that case, if there were more replicated documents than the current `doc_count`, the `doc_count` could even underflow the 64 bit unsigned integer range and end up somewhere in the vicinity of 2^64. The counter, of course, would still be incorrect even if it didn't underflow, the huge value would just make the issue more visible.
We test that a newly deleted document is replicated to the target and it bumps the deled doc count but doesn't change doc count. Another thing to test is that an already deleted document is replicated in where its revision path was extended on the source then gets replicated to the target. In that case neither del doc count not doc count are bumped.
This commit is mostly a copy paste of the existing modules in the `couch` application. For now I've left the build of the `couchjs` executable in `couch/priv` to avoid having to do the work of moving the build config over. I had contemplated just referencing the modules as they current exist but decided this would prepare us a bit better for when we eventually remove the old modules.
These are ported over from the existing couch Eunit suite and updated to be less racey hopefully.
This check fails if Clouseau isn't present. Though we don't need Clouseau to perform the check so just avoid it.
set_type_timeout takes seconds as the argument but we gave it milliseconds
It was suggested in another PR's discussion: apache#2107 (review)
To trace FDB transactions: 1. Enable tracing in erlfdb application environment: `network_options = [{trace_enable, ...}]` OR with an environment variable: `FDB_NETWORK_OPTION_TRACE_ENABLE = ""` 2. Set `[fabric] fdb_trace=true` configuration value 3. Add the `x-couchdb-fdb-trace:true` header to each request that should be traced. The transaction name is set to the nonce value, which is already used by CouchDB to track API requests. Only transactions started from the main request process will be traced. So if a process is spawned without inheriting the `erlfdb_trace` process dict key, that transaction will not be traced.
FDB's metadata version key allows more efficient metadata invalidation (see apple/foundationdb#1213). To take advantage of that feature update the caching logic to check the metadata version first, and if it is current, skip checking the db version altogether. When db version is bumped we now update the metadata version as well. There is a bit of a subtlety when the metadata version is stale. In that case we check the db version, and if that is current, we still don't reopen the database, instead we continue with the transaction. Then, after the transaction succeeds, we update the cached metadata version for that db handle. Next client would get the updated db metadata, it will be current, and they won't need to check the db version. If the db version is stale as well, then we throw a `reopen` exception and the handle gets removed from the cache and reopened. Note: this commit doesn't actually use the new metadata version key, it still uses the old plain key. That update will be a separate commit where we also start setting a new API version (610) and will only work on FDB version 6.1.x
Previously local docs were not chunkified and it was possible for replications which checkpointed a few dozen times to create local documents above the 100KB limit. Documents are chunkiefied according to the same scheme as the regular docs -- rev values are in a main `?DB_LOCAL_DOCS` subspace, and doc body chunks in a separate `?DB_LOCAL_DOC_BODIES` subspace that looks like: {?DB_LOCAL_DOC_BODIES, DocId, ChunkId} = BinaryChunk where `ChunkId` is an incrementing integer and BinaryChunk is a 100KB chunk of the term_to_binary of the body. We also go to some lengths to read and silently upgrade docs written with the old encoding. Upgrades happen on doc writes as a first step, to ensure stats update logic is not affected.
Since we are dealing with upgrades and both versions start out as binaries, make sure we add extra belts and suspenders to surface any issues with encoding errors.
Need to use FDB version 6.1+ and erlfdb version that has this commit: cloudant-labs/couchdb-erlfdb@7718a3d Since fabric2.hrl is a public include now, use that in couch_jobs to avoid redefining a bunch of things.
If we don't explicitly bail out of running the job it will loop indefinitely in the couch_jobs retry logic.
Since the db structure returned from fabric2_db:open and fabric2_db:create includes `user_ctx` there is no need to pass it explicitly in every `fabric2_db` call. This means we can simplify few things: - Don't pass user_ctx in `chttpd_db:db_req/2` since we pass db already - Don't have to use `db_open_options` in `chttpd_changes` - Don't have to pass `user_ctx` to `fabric2_db:open_doc` and `fabric2_db:update_doc`
Users should be able to replicate their partitioned dbs to the new environment.
Previously we checked security properties in a separate transaction, after opening the db or fetching it from the cache. To avoid running an extra transaction move the check inside the main transaction right after the metadata check runs. That ensure it will be consistent and it won't be accidentally missed as all operations run the `ensure_current` metadata check. Also remove the special `get_config/2` function in `fabric2_fdb` for getting revs limit and security properties and just read them directly from the db map.
Previously, a stale db handle could be re-used across a few separate transactions. That would result in the database getting re-opened before every one of those operations. To prevent that from happening, check the cache before the transaction starts, and if there is a newer version of the db handle and use that.
Previously `set_config/3` needed keys and values to be transalted to binaries, now that is done inside the function. It's a bit more consistent as binary config values and encodings are better encapsulated in the `fabric2_fdb` module. Since `set_config` does, it made sense to update get_config as well. There, it turns out it was used only to load configuration setting after a db open, so the function was renamed to `load_config` and was made private.
And also against too many conflicts during overload
Check metadata versions to ensure newer handles are not clobbered. The same thing is done for removal, `maybe_remove/1` removes handle only if there isn't a newer handle already there.
Update db handles right away as soon we db verison is checked. This ensures concurrent requests will get access to the current handle as soon as possible and may avoid doing extra version checks and re-opens.
Under load accept loop can blow up with timeout error from `erlfdb:wait(...)`(https://github.com/apache/couchdb-erlfdb/blob/master/src/erlfdb.erl#L255) so guard against it just like we do for fdb transaction timeout (1031) errors.
Use the `no_schedule` option to speed up job dequeuing. This optimization allows dequeuing jobs more efficiently if these conditions are met: 1) Job IDs start with a random prefix 2) No time-based scheduling is used Both of those can be true for views job IDs can be generated such that signature comes before the db name part, which is what this commit does. The way the optimization works, is random IDs are generating in pending jobs range, then, a key selection is used to pick either a job before or after it. That reduces each dequeue attempt to just 1 read instead of reading up to 1000 jobs.
When waiting to accept jobs and scheduling was used, timeout is limited based on the time scheduling parameter. When no_schedule option is used, time scheduling parameter is set to 0 always, and so in that case, we have to special-case the limit to return `infinity`. Later on when we wait for the watch to fire, the actual timeout can still be limited, by a separate user specified timeout option, but if user specifies `infinity` there and sets `#{no_schedule => true}` then we should respect and never return `{error, not_found}` in response.
As per ML [discussion](https://lists.apache.org/thread.html/rb328513fb932e231cf8793f92dd1cc2269044cb73cb43a6662c464a1%40%3Cdev.couchdb.apache.org%3E) add a `uuid` field to db info results in order to be able to uniquely identify a particular instance of a database. When a database is deleted and re-created with the same name, it will return a new `uuid` value.
Optimize couch_views by using a separate set of acceptors and workers. Previously, all `max_workers` where spawned on startup, and were to waiting to accept jobs in parallel. In a setup with a large number of pods, and 100 workers per pod, that could lead to a lot of conflicts being generated when all those workers race to accept the same job at the same time. The improvement is to spawn only a limited number of acceptors (5, by default), then, spawn more after some of them become workers. Also, when some workers finish or die with an error, check if more acceptors could be spawned. As an example, here is what might happen with `max_acceptors = 5` and `max_workers = 100` (`A` and `W` are the current counts of acceptors and workers, respectively): 1. Starting out: `A = 5, W = 0` 2. After 2 acceptors start running: `A = 3, W = 2` Then immediately 2 more acceptors are spawned: `A = 5, W = 2` 3. After 95 workers are started: `A = 5, W = 95` 4. Now if 3 acceptors accept, it would look like: `A = 2, W = 98` But no more acceptors would be started. 5. If the last 2 acceptors also accept jobs: `A = 0, W = 100` At this point no more indexing jobs can be accepted and started until at least one of the workers finish and exit. 6. If 1 worker exits: `A = 0, W = 99` An acceptor will be immediately spawned `A = 1, W = 99` 7. If all 99 workers exit, it will go back to: `A = 5, W = 0`
In an overload scenario do not let notifiers crash and lose their subscribers, instead make them more robust and let them retry on future or transaction timeouts.
Add max_bulk_get_count configuration option
I refactored PR to parse Version earlier and added tests |
Do we want this in |
There were no consensus https://lists.apache.org/thread.html/rcc742c0fdca0363bb338b54526045720868597ea35ee6842aef174e0%40%3Cdev.couchdb.apache.org%3E. However Adam started new discussion about it recently https://github.com/apache/couchdb/blob/main/src/chttpd/src/chttpd_httpd_handlers.erl#L62:L495 |
Overview
This PR implements a draft of proposal
Testing recommendations
Running
make check
should be sufficient. You might need to comment out the following line because elixir tests are failing forprototype/fdb-layer
.You also can run API version related test suite alone as follows
exunit tests=src/chttpd/test/exunit/api_version_test.exs
The manual testing can be done using the following:
Related Issues or Pull Requests
Checklist
rel/overlay/etc/default.ini