New Feature: Database Partitions #1789

davisp · 2018-12-03T18:39:16Z

Overview

This PR introduces a new feature, user-defined partitioned databases.

A new kind of database can be created with the ?partitioned=true option. All documents within the database must have document ids of the following format;

partition_name:doc_id

both partition_name and doc_id must follow the couchdb id format (can't begin with _, etc).

All documents with the same partition_name are guaranteed to be mapped to the same shard range. When querying an index, the new /db/_partition/$partition/_view endpoint can query the view more efficiently, by only consulting the single shard range holding the partition. This is much more efficient and scales the same way that primary key lookup (GET /dbname/docid) does (approximately linearly).

Testing recommendations

The PR contains multiple tests for basic functionality and all existing tests still pass. When testing the PR, it is important to try the feature yourself, interactively, with docs and views of your choosing, to give us confidence in this new feature.

Related Issues or Pull Requests

This supersedes PR #1605.

Checklist

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

src/chttpd/src/chttpd_db.erl

src/couch_mrview/src/couch_mrview_index.erl

test/elixir/test/partition_view_test.exs

src/couch/src/couch_partition.erl

src/fabric/src/fabric_util.erl

iilyak · 2018-12-06T19:38:34Z

I didn't finish my review yet. But this is looking great so far.
There is one aspect of the codebase which I don't like. However it is very easy to fix.
There are 3 places where we leak implementation detail that the partition separator is ":".
Would you mind adding following two functions into couch_partition.erl:

-compile({inline, [
    start_key/1,
    end_key/1
]}).
start_key(Partition) -> 
    <<Partition/binary, ":">>.

end_key(Partition) -> 
    <<Partition/binary, ";">>.

Then we can use these functions:

in couch_bt_engine:get_partition_info/2
in couch_mrview_util:apply_all_docs_partition/2 instead of PARTITION_START and PARTITION_END macro.

The third place I mentioned is in couch_mrview_updater:partition/1. I already have a comment about it.

iilyak · 2018-12-06T20:13:14Z

Some tests need styling updates:

** (Mix) mix format failed due to --check-formatted.
The following files were not formatted:

  * test/partition_view_test.exs
  * test/partition_mango_test.exs
  * test/partition_crud_test.exs
  * test/all_docs_test.exs

iilyak · 2018-12-07T13:22:04Z

After updating Makefile to skip elixir-check-formatted I am getting multiple test failures in Elixir test suite. Some of them might not be related:

ReplicationTest
  * test unauthorized replication cancellation (1038.8ms)

  1) test unauthorized replication cancellation (ReplicationTest)
     test/replication_test.exs:156
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: anonymous fn/1 in ReplicationTest.set_user_context/1
       (ex_unit) lib/ex_unit/on_exit_handler.ex:140: ExUnit.OnExitHandler.exec_callback/1
       (ex_unit) lib/ex_unit/on_exit_handler.ex:126: ExUnit.OnExitHandler.on_exit_runner_loop/0

  * test non-admin or reader user on source - remote-to-remote (90.7ms)

  2) test non-admin or reader user on source - remote-to-remote (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin user on target - remote-to-local (51.4ms)

  3) test non-admin user on target - remote-to-local (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin user on target - local-to-local (64.0ms)

  4) test non-admin user on target - local-to-local (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2


  * test non-admin user on target - local-to-remote (36.3ms)

  5) test non-admin user on target - local-to-remote (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin or reader user on source - local-to-remote (71.0ms)

  6) test non-admin or reader user on source - local-to-remote (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin or reader user on source - local-to-local (49.4ms)

  7) test non-admin or reader user on source - local-to-local (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin or reader user on source - remote-to-local (192.6ms)

  8) test non-admin or reader user on source - remote-to-local (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin user on target - remote-to-remote (54.9ms)

  9) test non-admin user on target - remote-to-remote (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

PartitionCrudTest
  ...
  * test saves attachment with partitioned doc (171.8ms)

 10) test saves attachment with partitioned doc (PartitionCrudTest)
     test/partition_crud_test.exs:246
     Assertion with == failed
     code:  assert body["_attachments"] == %{"foo.txt" => %{"content_type" => "text/plain", "digest" => "md5-OW2BoZAtMqs1E+fAnLpNBw==", "length" => 31, "revpos" => 1, "stub" => true}}
     left:  %{"foo.txt" => %{"content_type" => "text/plain", "length" => 31, "revpos" => 1, "stub" => true, "digest" => "md5-wJORG+sY7HTKqCWOnk5tIQ=="}}
     right: %{"foo.txt" => %{"content_type" => "text/plain", "length" => 31, "revpos" => 1, "stub" => true, "digest" => "md5-OW2BoZAtMqs1E+fAnLpNBw=="}}
     stacktrace:
       test/partition_crud_test.exs:269: (test)

BasicsTest
  ...
  * test PUT error when body not an object (147.4ms)

 11) test PUT error when body not an object (BasicsTest)
     test/basics_test.exs:243
     Assertion with == failed
     code:  assert resp.status_code() == 400
     left:  500
     right: 400
     stacktrace:
       test/basics_test.exs:246: (test)

davisp · 2018-12-07T18:10:13Z

@iilyak Those inline calls I believe only apply to the module, I don't think that'll affect anywhere else they're used as external functions? However we could at least hoist the macros into a couch_partitions.hrl for re-use.

For the mix format, those should be fixed after one of my recent-ish force pushes. I rebased for something else and forgot to check that after we added Credo.

I'm looking into the other test failures to see what's going on with those.

jaydoane · 2018-12-11T04:03:46Z

After make clean && git clean -xffd && ./configure --dev && make, I'm seeing numerous setup failures and a couple test failures running make eunit apps=chttpd:

=======================================================
  Failed: 2.  Skipped: 0.  Passed: 184.

compared to a successful result when run against the master commit this is based against:

=======================================================
  All 233 tests passed.

The two failures are

chttpd_view:123: t_check_include_docs_throw_validation_error...*failed*
in function chttpd_view:'-t_check_include_docs_throw_validation_error/0-fun-0-'/0 (src/chttpd_view.erl, line 127)
**error:{assertException,
    [{module,chttpd_view},
     {line,127},
     {expression,
         "multi_query_view ( Req , db , ddoc , << \"v\" >> , [ Query ] )"},
     {pattern,"{ throw , Throw , [...] }"},
     {unexpected_exception,
         {error,function_clause,
             [{couch_db,is_clustered,[db],[{file,...},{...}]},
              {fabric_util,validate_args,3,[{...}|...]},
              {lists,map,2,[...]},
              {chttpd_view,multi_query_view,5,...},
              {chttpd_view,
                  '-t_check_include_docs_throw_validation_error/0-fun-0-',...},
              {eunit_test,...},
              {...}|...]}}]}
  output:<<"">>

chttpd_view:132: t_check_user_can_override_individual_query_type...*failed*
in function couch_db:is_clustered/1 (src/couch_db.erl, line 217)
  called as is_clustered(db)

There's also a failure in mem3:

module 'mem3_util_test'
  mem3_util_test: hash_test...*failed*
in function mem3_util:hash/1
  called as hash(0)
in call from mem3_util_test:'-hash_test/0-fun-0-'/0 (test/mem3_util_test.erl, line 19)
in call from mem3_util_test:hash_test/0 (test/mem3_util_test.erl, line 19)
in call from mem3_util_test:hash_test/0
**error:undef
  output:<<"">>

since mem3_util:hash has been removed.

davisp · 2018-12-12T19:59:10Z

@jaydoane chttpd tests should be fixed now. Am running through checking that everything else works as well.

Fix for the ones you noted was to use test_util:fake_db/1 instead of the atom db.

davisp · 2018-12-12T23:05:41Z

@jaydoane @iilyak All tests both eunit and elixir should be passing now. Apologies for the delay!

src/couch/src/couch_partition.erl

This allows us to implement features outside of the PSE API without requiring changes to the API for each bit of data we may want to end up storing. The use of this opaque object should only be used for features that don't require a beahvior change from the storage engine API. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

This allows for setting any combintaion of supported settings using a proplist appraoch.

This allows for more fine grained use of couch_db:clustered_db as well as chagnes the name to something more appropriate than `fake_db`.

Allow index validation to be parameterized by the database without having to reopen its own copy.

This adds specific datatype requirements to the list of allowable design document options. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

This provides the capability for features to specify alternative hash functions for placing documents in a given shard range. While the functionality exists with this implementation it is not yet actually used.

This change introduces the ability for users to place a group of documents in a single shard range by specifying a "partition key" in the document id. A partition key is denoted by everything preceding a colon ':' in the document id. Every document id (except for design documents) in a partitioned database is required to have a partition key. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

This feature allows us to fetch statistics for a given partition key which will allow for users to find bloated partitions and such forth. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

The benefit of using partitioned databases is that views can then be scoped to a single shard range. This allows for views to scale nearly as linearly as document lookups. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

If a user specifies document ids that scope the query to a single partition key we can automatically determine that we only need to consuly a single shard range. Co-authored-by: Robert Newson <[email protected]>

Now that a single shard handles the entire response we can optimize work normally done in the coordinator by moving it to the RPC worker which then removes the need to send an extra `skip` number of rows to the coordinator. Co-authored-by: Robert Newson <[email protected]>

Using the internal hash values for indexes was a brittle approach to ensuring that a specific index was or was not picked. By naming the index and design docs we can more concretely ensure that the chosen indexes match the intent of the test while also not breaking each time mango internals change.

Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

AlexanderKaraberov · 2019-02-19T09:57:36Z

Hello guys!
I'm trying to sync with this new cool feature as it totally passed by me. Please correct me if I'm wrong. As I understand this adds something akin to list partitioning in relational databases to CouchDB (on top of current composite hash-range partitioning) where each partition is defined and selected based on the membership of a document id?
Could you please point me to other discussions, sources (mailing list perhaps?) where I can read more about this new feature and ponder beforehand about the changes in our backend in order to leverage it. Thanks!

janl · 2019-02-19T10:24:16Z

@AlexanderKaraberov https://lists.apache.org/thread.html/4d09ee0fcc4afec8f25dbd540f9c4f0af67d3fdcc41f26f36ff7a32d@%3Cdev.couchdb.apache.org%3E

AlexanderKaraberov · 2019-02-19T12:33:59Z

@janl Thank you! Looks exactly like what I was looking for.

davisp changed the title ~~Feature/database partitions~~ New Feature: Database Partitions Dec 3, 2018

davisp force-pushed the feature/database-partitions branch from 78886ef to e0c98c6 Compare December 3, 2018 18:40

davisp mentioned this pull request Dec 3, 2018

Feature/user partitioned databases #1605

Closed

3 tasks

chewbranca reviewed Dec 3, 2018

View reviewed changes

src/chttpd/src/chttpd_db.erl Outdated Show resolved Hide resolved

davisp force-pushed the feature/database-partitions branch from e0c98c6 to 1251b9e Compare December 3, 2018 20:44

jaydoane mentioned this pull request Dec 4, 2018

Only validate partitioned db name if db is partitioned #1782

Closed

3 tasks

garrensmith reviewed Dec 4, 2018

View reviewed changes

src/couch_mrview/src/couch_mrview_index.erl Outdated Show resolved Hide resolved

garrensmith reviewed Dec 4, 2018

View reviewed changes

test/elixir/test/partition_view_test.exs Outdated Show resolved Hide resolved

davisp force-pushed the feature/database-partitions branch 2 times, most recently from 88f498f to 8ba77c3 Compare December 4, 2018 20:42

iilyak reviewed Dec 5, 2018

View reviewed changes

src/couch/src/couch_partition.erl Outdated Show resolved Hide resolved

iilyak reviewed Dec 5, 2018

View reviewed changes

src/fabric/src/fabric_util.erl Outdated Show resolved Hide resolved

davisp force-pushed the feature/database-partitions branch 3 times, most recently from 16249e4 to fcd62ed Compare December 5, 2018 22:16

davisp force-pushed the feature/database-partitions branch from fcd62ed to 4da3bab Compare December 7, 2018 18:02

davisp force-pushed the feature/database-partitions branch from 4da3bab to fc696e8 Compare December 7, 2018 21:26

garrensmith mentioned this pull request Dec 11, 2018

Implement partitions pouchdb/pouchdb#7566

Closed

davisp force-pushed the feature/database-partitions branch from fc696e8 to d7d9f56 Compare December 12, 2018 19:56

davisp force-pushed the feature/database-partitions branch 3 times, most recently from 52b0b02 to 5b295dc Compare December 12, 2018 23:04

jaydoane reviewed Dec 13, 2018

View reviewed changes

src/couch/src/couch_partition.erl Outdated Show resolved Hide resolved

davisp force-pushed the feature/database-partitions branch 7 times, most recently from 51a482e to 8cd68be Compare January 18, 2019 17:56

garrensmith and others added 14 commits January 18, 2019 12:35

Improve couch_db:clustered_db flexibility

62a294f

This allows for setting any combintaion of supported settings using a proplist appraoch.

Implement fabric_util:open_cluster_db

cea5ea0

This allows for more fine grained use of couch_db:clustered_db as well as chagnes the name to something more appropriate than `fake_db`.

Pass the DB record to index validation functions

993a85f

Allow index validation to be parameterized by the database without having to reopen its own copy.

Validate design document options more strictly

c5569ed

This adds specific datatype requirements to the list of allowable design document options. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

Implement configurable hash functions

6adcb18

This provides the capability for features to specify alternative hash functions for placing documents in a given shard range. While the functionality exists with this implementation it is not yet actually used.

Implement couch_db:get_partition_info/2

a5cd962

This feature allows us to fetch statistics for a given partition key which will allow for users to find bloated partitions and such forth. Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

Optimize all_docs queries in a single partition

dc53a3f

If a user specifies document ids that scope the query to a single partition key we can automatically determine that we only need to consuly a single shard range. Co-authored-by: Robert Newson <[email protected]>

Support partitioned queries in Mango

0541a31

Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

Add Elixir tests for database partitions

91af772

Co-authored-by: Garren Smith <[email protected]> Co-authored-by: Robert Newson <[email protected]>

davisp force-pushed the feature/database-partitions branch from 8cd68be to 91af772 Compare January 18, 2019 18:35

davisp merged commit 16e6af4 into master Jan 18, 2019

davisp deleted the feature/database-partitions branch January 18, 2019 19:04

davisp mentioned this pull request Jan 25, 2019

Documentation for partitioned dbs apache/couchdb-documentation#385

Merged

3 tasks

jiangphcn mentioned this pull request Jan 30, 2019

Fix partitioned query database header version #1886

Merged

3 tasks

lbboe mentioned this pull request Jul 19, 2019

Update database partitions with review changes apache/couchdb-documentation#430

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Feature: Database Partitions #1789

New Feature: Database Partitions #1789

davisp commented Dec 3, 2018

iilyak commented Dec 6, 2018

iilyak commented Dec 6, 2018

iilyak commented Dec 7, 2018

davisp commented Dec 7, 2018

jaydoane commented Dec 11, 2018 •

edited

Loading

davisp commented Dec 12, 2018

davisp commented Dec 12, 2018

AlexanderKaraberov commented Feb 19, 2019 •

edited

Loading

janl commented Feb 19, 2019

AlexanderKaraberov commented Feb 19, 2019

New Feature: Database Partitions #1789

New Feature: Database Partitions #1789

Conversation

davisp commented Dec 3, 2018

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

iilyak commented Dec 6, 2018

iilyak commented Dec 6, 2018

iilyak commented Dec 7, 2018

davisp commented Dec 7, 2018

jaydoane commented Dec 11, 2018 • edited Loading

davisp commented Dec 12, 2018

davisp commented Dec 12, 2018

AlexanderKaraberov commented Feb 19, 2019 • edited Loading

janl commented Feb 19, 2019

AlexanderKaraberov commented Feb 19, 2019

jaydoane commented Dec 11, 2018 •

edited

Loading

AlexanderKaraberov commented Feb 19, 2019 •

edited

Loading