-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Feature: Database Partitions #1789
Conversation
78886ef
to
e0c98c6
Compare
e0c98c6
to
1251b9e
Compare
88f498f
to
8ba77c3
Compare
16249e4
to
fcd62ed
Compare
I didn't finish my review yet. But this is looking great so far.
Then we can use these functions:
The third place I mentioned is in couch_mrview_updater:partition/1. I already have a comment about it. |
Some tests need styling updates:
|
After updating Makefile to skip elixir-check-formatted I am getting multiple test failures in Elixir test suite. Some of them might not be related:
|
fcd62ed
to
4da3bab
Compare
@iilyak Those inline calls I believe only apply to the module, I don't think that'll affect anywhere else they're used as external functions? However we could at least hoist the macros into a couch_partitions.hrl for re-use. For the mix format, those should be fixed after one of my recent-ish force pushes. I rebased for something else and forgot to check that after we added Credo. I'm looking into the other test failures to see what's going on with those. |
4da3bab
to
fc696e8
Compare
After
compared to a successful result when run against the master commit this is based against:
The two failures are
There's also a failure in mem3:
since |
fc696e8
to
d7d9f56
Compare
@jaydoane chttpd tests should be fixed now. Am running through checking that everything else works as well. Fix for the ones you noted was to use |
52b0b02
to
5b295dc
Compare
51a482e
to
8cd68be
Compare
This allows us to implement features outside of the PSE API without requiring changes to the API for each bit of data we may want to end up storing. The use of this opaque object should only be used for features that don't require a beahvior change from the storage engine API. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
This allows for setting any combintaion of supported settings using a proplist appraoch.
This allows for more fine grained use of couch_db:clustered_db as well as chagnes the name to something more appropriate than `fake_db`.
Allow index validation to be parameterized by the database without having to reopen its own copy.
This adds specific datatype requirements to the list of allowable design document options. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
This provides the capability for features to specify alternative hash functions for placing documents in a given shard range. While the functionality exists with this implementation it is not yet actually used.
This change introduces the ability for users to place a group of documents in a single shard range by specifying a "partition key" in the document id. A partition key is denoted by everything preceding a colon ':' in the document id. Every document id (except for design documents) in a partitioned database is required to have a partition key. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
This feature allows us to fetch statistics for a given partition key which will allow for users to find bloated partitions and such forth. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
The benefit of using partitioned databases is that views can then be scoped to a single shard range. This allows for views to scale nearly as linearly as document lookups. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
If a user specifies document ids that scope the query to a single partition key we can automatically determine that we only need to consuly a single shard range. Co-authored-by: Robert Newson <[email protected]>
Now that a single shard handles the entire response we can optimize work normally done in the coordinator by moving it to the RPC worker which then removes the need to send an extra `skip` number of rows to the coordinator. Co-authored-by: Robert Newson <[email protected]>
Using the internal hash values for indexes was a brittle approach to ensuring that a specific index was or was not picked. By naming the index and design docs we can more concretely ensure that the chosen indexes match the intent of the test while also not breaking each time mango internals change.
Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>
8cd68be
to
91af772
Compare
Hello guys! |
@janl Thank you! Looks exactly like what I was looking for. |
Overview
This PR introduces a new feature, user-defined partitioned databases.
A new kind of database can be created with the ?partitioned=true option. All documents within the database must have document ids of the following format;
partition_name:doc_id
both partition_name and doc_id must follow the couchdb id format (can't begin with _, etc).
All documents with the same partition_name are guaranteed to be mapped to the same shard range. When querying an index, the new /db/_partition/$partition/_view endpoint can query the view more efficiently, by only consulting the single shard range holding the partition. This is much more efficient and scales the same way that primary key lookup (GET /dbname/docid) does (approximately linearly).
Testing recommendations
The PR contains multiple tests for basic functionality and all existing tests still pass. When testing the PR, it is important to try the feature yourself, interactively, with docs and views of your choosing, to give us confidence in this new feature.
Related Issues or Pull Requests
This supersedes PR #1605.
Checklist