Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Feature: Database Partitions #1789

Merged
merged 14 commits into from
Jan 18, 2019
Merged

New Feature: Database Partitions #1789

merged 14 commits into from
Jan 18, 2019

Conversation

davisp
Copy link
Member

@davisp davisp commented Dec 3, 2018

Overview

This PR introduces a new feature, user-defined partitioned databases.

A new kind of database can be created with the ?partitioned=true option. All documents within the database must have document ids of the following format;

partition_name:doc_id

both partition_name and doc_id must follow the couchdb id format (can't begin with _, etc).

All documents with the same partition_name are guaranteed to be mapped to the same shard range. When querying an index, the new /db/_partition/$partition/_view endpoint can query the view more efficiently, by only consulting the single shard range holding the partition. This is much more efficient and scales the same way that primary key lookup (GET /dbname/docid) does (approximately linearly).

Testing recommendations

The PR contains multiple tests for basic functionality and all existing tests still pass. When testing the PR, it is important to try the feature yourself, interactively, with docs and views of your choosing, to give us confidence in this new feature.

Related Issues or Pull Requests

This supersedes PR #1605.

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

@davisp davisp changed the title Feature/database partitions New Feature: Database Partitions Dec 3, 2018
@davisp davisp force-pushed the feature/database-partitions branch from 78886ef to e0c98c6 Compare December 3, 2018 18:40
@davisp davisp mentioned this pull request Dec 3, 2018
3 tasks
@davisp davisp force-pushed the feature/database-partitions branch 2 times, most recently from 88f498f to 8ba77c3 Compare December 4, 2018 20:42
@davisp davisp force-pushed the feature/database-partitions branch 3 times, most recently from 16249e4 to fcd62ed Compare December 5, 2018 22:16
@iilyak
Copy link
Contributor

iilyak commented Dec 6, 2018

I didn't finish my review yet. But this is looking great so far.
There is one aspect of the codebase which I don't like. However it is very easy to fix.
There are 3 places where we leak implementation detail that the partition separator is ":".
Would you mind adding following two functions into couch_partition.erl:

-compile({inline, [
    start_key/1,
    end_key/1
]}).
start_key(Partition) -> 
    <<Partition/binary, ":">>.

end_key(Partition) -> 
    <<Partition/binary, ";">>.

Then we can use these functions:

The third place I mentioned is in couch_mrview_updater:partition/1. I already have a comment about it.

@iilyak
Copy link
Contributor

iilyak commented Dec 6, 2018

Some tests need styling updates:

** (Mix) mix format failed due to --check-formatted.
The following files were not formatted:

  * test/partition_view_test.exs
  * test/partition_mango_test.exs
  * test/partition_crud_test.exs
  * test/all_docs_test.exs

@iilyak
Copy link
Contributor

iilyak commented Dec 7, 2018

After updating Makefile to skip elixir-check-formatted I am getting multiple test failures in Elixir test suite. Some of them might not be related:

ReplicationTest
  * test unauthorized replication cancellation (1038.8ms)

  1) test unauthorized replication cancellation (ReplicationTest)
     test/replication_test.exs:156
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: anonymous fn/1 in ReplicationTest.set_user_context/1
       (ex_unit) lib/ex_unit/on_exit_handler.ex:140: ExUnit.OnExitHandler.exec_callback/1
       (ex_unit) lib/ex_unit/on_exit_handler.ex:126: ExUnit.OnExitHandler.on_exit_runner_loop/0
  * test non-admin or reader user on source - remote-to-remote (90.7ms)

  2) test non-admin or reader user on source - remote-to-remote (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2
  * test non-admin user on target - remote-to-local (51.4ms)

  3) test non-admin user on target - remote-to-local (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2
  * test non-admin user on target - local-to-local (64.0ms)

  4) test non-admin user on target - local-to-local (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin user on target - local-to-remote (36.3ms)

  5) test non-admin user on target - local-to-remote (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2
  * test non-admin or reader user on source - local-to-remote (71.0ms)

  6) test non-admin or reader user on source - local-to-remote (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2

  * test non-admin or reader user on source - local-to-local (49.4ms)

  7) test non-admin or reader user on source - local-to-local (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2
  * test non-admin or reader user on source - remote-to-local (192.6ms)

  8) test non-admin or reader user on source - remote-to-local (ReplicationTest)
     test/replication_test.exs:248
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2
  * test non-admin user on target - remote-to-remote (54.9ms)

  9) test non-admin user on target - remote-to-remote (ReplicationTest)
     test/replication_test.exs:243
     Expected truthy, got false
     code: assert HTTPotion.Response.success?(resp)
     stacktrace:
       test/replication_test.exs:2: ReplicationTest.create_user/1
       test/replication_test.exs:2: ReplicationTest.set_user_context/1
       (elixir) lib/enum.ex:1899: Enum."-reduce/3-lists^foldl/2-0-"/3
       test/replication_test.exs:2: ReplicationTest.__ex_unit_setup_0/1
       test/replication_test.exs:1: ReplicationTest.__ex_unit__/2
PartitionCrudTest
  ...
  * test saves attachment with partitioned doc (171.8ms)

 10) test saves attachment with partitioned doc (PartitionCrudTest)
     test/partition_crud_test.exs:246
     Assertion with == failed
     code:  assert body["_attachments"] == %{"foo.txt" => %{"content_type" => "text/plain", "digest" => "md5-OW2BoZAtMqs1E+fAnLpNBw==", "length" => 31, "revpos" => 1, "stub" => true}}
     left:  %{"foo.txt" => %{"content_type" => "text/plain", "length" => 31, "revpos" => 1, "stub" => true, "digest" => "md5-wJORG+sY7HTKqCWOnk5tIQ=="}}
     right: %{"foo.txt" => %{"content_type" => "text/plain", "length" => 31, "revpos" => 1, "stub" => true, "digest" => "md5-OW2BoZAtMqs1E+fAnLpNBw=="}}
     stacktrace:
       test/partition_crud_test.exs:269: (test)
BasicsTest
  ...
  * test PUT error when body not an object (147.4ms)

 11) test PUT error when body not an object (BasicsTest)
     test/basics_test.exs:243
     Assertion with == failed
     code:  assert resp.status_code() == 400
     left:  500
     right: 400
     stacktrace:
       test/basics_test.exs:246: (test)

@davisp davisp force-pushed the feature/database-partitions branch from fcd62ed to 4da3bab Compare December 7, 2018 18:02
@davisp
Copy link
Member Author

davisp commented Dec 7, 2018

@iilyak Those inline calls I believe only apply to the module, I don't think that'll affect anywhere else they're used as external functions? However we could at least hoist the macros into a couch_partitions.hrl for re-use.

For the mix format, those should be fixed after one of my recent-ish force pushes. I rebased for something else and forgot to check that after we added Credo.

I'm looking into the other test failures to see what's going on with those.

@davisp davisp force-pushed the feature/database-partitions branch from 4da3bab to fc696e8 Compare December 7, 2018 21:26
@jaydoane
Copy link
Contributor

jaydoane commented Dec 11, 2018

After make clean && git clean -xffd && ./configure --dev && make, I'm seeing numerous setup failures and a couple test failures running make eunit apps=chttpd:

=======================================================
  Failed: 2.  Skipped: 0.  Passed: 184.

compared to a successful result when run against the master commit this is based against:

=======================================================
  All 233 tests passed.

The two failures are

chttpd_view:123: t_check_include_docs_throw_validation_error...*failed*
in function chttpd_view:'-t_check_include_docs_throw_validation_error/0-fun-0-'/0 (src/chttpd_view.erl, line 127)
**error:{assertException,
    [{module,chttpd_view},
     {line,127},
     {expression,
         "multi_query_view ( Req , db , ddoc , << \"v\" >> , [ Query ] )"},
     {pattern,"{ throw , Throw , [...] }"},
     {unexpected_exception,
         {error,function_clause,
             [{couch_db,is_clustered,[db],[{file,...},{...}]},
              {fabric_util,validate_args,3,[{...}|...]},
              {lists,map,2,[...]},
              {chttpd_view,multi_query_view,5,...},
              {chttpd_view,
                  '-t_check_include_docs_throw_validation_error/0-fun-0-',...},
              {eunit_test,...},
              {...}|...]}}]}
  output:<<"">>

chttpd_view:132: t_check_user_can_override_individual_query_type...*failed*
in function couch_db:is_clustered/1 (src/couch_db.erl, line 217)
  called as is_clustered(db)

There's also a failure in mem3:

module 'mem3_util_test'
  mem3_util_test: hash_test...*failed*
in function mem3_util:hash/1
  called as hash(0)
in call from mem3_util_test:'-hash_test/0-fun-0-'/0 (test/mem3_util_test.erl, line 19)
in call from mem3_util_test:hash_test/0 (test/mem3_util_test.erl, line 19)
in call from mem3_util_test:hash_test/0
**error:undef
  output:<<"">>

since mem3_util:hash has been removed.

@davisp
Copy link
Member Author

davisp commented Dec 12, 2018

@jaydoane chttpd tests should be fixed now. Am running through checking that everything else works as well.

Fix for the ones you noted was to use test_util:fake_db/1 instead of the atom db.

@davisp davisp force-pushed the feature/database-partitions branch 3 times, most recently from 52b0b02 to 5b295dc Compare December 12, 2018 23:04
@davisp
Copy link
Member Author

davisp commented Dec 12, 2018

@jaydoane @iilyak All tests both eunit and elixir should be passing now. Apologies for the delay!

@davisp davisp force-pushed the feature/database-partitions branch 7 times, most recently from 51a482e to 8cd68be Compare January 18, 2019 17:56
garrensmith and others added 14 commits January 18, 2019 12:35
This allows us to implement features outside of the PSE API without
requiring changes to the API for each bit of data we may want to end up
storing. The use of this opaque object should only be used for features
that don't require a beahvior change from the storage engine API.

Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
This allows for setting any combintaion of supported settings using a
proplist appraoch.
This allows for more fine grained use of couch_db:clustered_db as well
as chagnes the name to something more appropriate than `fake_db`.
Allow index validation to be parameterized by the database without
having to reopen its own copy.
This adds specific datatype requirements to the list of allowable design
document options.

Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
This provides the capability for features to specify alternative hash
functions for placing documents in a given shard range. While the
functionality exists with this implementation it is not yet actually
used.
This change introduces the ability for users to place a group of
documents in a single shard range by specifying a "partition key" in the
document id. A partition key is denoted by everything preceding a colon
':' in the document id.

Every document id (except for design documents) in a partitioned
database is required to have a partition key.

Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
This feature allows us to fetch statistics for a given partition key
which will allow for users to find bloated partitions and such forth.

Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
The benefit of using partitioned databases is that views can then be
scoped to a single shard range. This allows for views to scale nearly as
linearly as document lookups.

Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
If a user specifies document ids that scope the query to a single
partition key we can automatically determine that we only need to
consuly a single shard range.

Co-authored-by: Robert Newson <[email protected]>
Now that a single shard handles the entire response we can optimize work
normally done in the coordinator by moving it to the RPC worker which
then removes the need to send an extra `skip` number of rows to the
coordinator.

Co-authored-by: Robert Newson <[email protected]>
Using the internal hash values for indexes was a brittle approach to
ensuring that a specific index was or was not picked. By naming the
index and design docs we can more concretely ensure that the chosen
indexes match the intent of the test while also not breaking each time
mango internals change.
Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
Co-authored-by: Garren Smith <[email protected]>
Co-authored-by: Robert Newson <[email protected]>
@davisp davisp force-pushed the feature/database-partitions branch from 8cd68be to 91af772 Compare January 18, 2019 18:35
@davisp davisp merged commit 16e6af4 into master Jan 18, 2019
@davisp davisp deleted the feature/database-partitions branch January 18, 2019 19:04
@AlexanderKaraberov
Copy link
Contributor

AlexanderKaraberov commented Feb 19, 2019

Hello guys!
I'm trying to sync with this new cool feature as it totally passed by me. Please correct me if I'm wrong. As I understand this adds something akin to list partitioning in relational databases to CouchDB (on top of current composite hash-range partitioning) where each partition is defined and selected based on the membership of a document id?
Could you please point me to other discussions, sources (mailing list perhaps?) where I can read more about this new feature and ponder beforehand about the changes in our backend in order to leverage it. Thanks!

@janl
Copy link
Member

janl commented Feb 19, 2019

@AlexanderKaraberov https://lists.apache.org/thread.html/4d09ee0fcc4afec8f25dbd540f9c4f0af67d3fdcc41f26f36ff7a32d@%3Cdev.couchdb.apache.org%3E

@AlexanderKaraberov
Copy link
Contributor

@janl Thank you! Looks exactly like what I was looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants