Add couchdb clustering #2810

style95 · 2017-09-28T08:35:19Z

This pr is to deploy couchdb cluster.
From the version 2.0, CouchDB officially supports clustering.

Notable changes are as follow:

Image of CouchDB is changed to the one from Apache.

klaemo/couchdb is superseded by apache/couchdb.
Accordingly version is also updated to 2.1

Coordinator node.

Since we need one coordinator node to setup CouchDB cluster, first host in db group becomes coordinator node.

New environment variables for cluster.

Important factors for CouchDB cluster are the number of shards and the number of quorums.
Corresponding environment variables are introduced.

Note: Please be informed that, current version of apache/couchdb:2.1 does not support these values, so I created a pull request for this.
Once this pr is merged, those environment variables will be available.
apache/couchdb-docker#32

This pr also can address the problem reported by this issue: #2808

style95 · 2017-10-04T02:28:43Z

Any more thing to do for this pr?

rabbah · 2017-10-18T20:35:18Z

@style95 as you shared on slack, the use of db_num_read_quorum and db_num_write_quorum I think is discouraged. Can you link the ansible related issue here and is there an update?

style95 · 2017-10-20T02:59:38Z

@rabbah Let me explain the issues here.

In this pr, apache/couchdb is being used.
That image does not support environment variables for n, q, r, w, and that's the reason why I initially created this PR(apache/couchdb-docker#32).

And I got comment from couchdb community that,

n will be automatically configured while setting up cluster.
q only matters when creating database and can be specified ?q=1 parameter.
Explicit r and w configurations are deprecated, instead couchdb will use half of the number of replicas for those two values.
(These can also be specified via parameter in API)
After setting up cluster, cluster configurations will be stored in /opt/couchdb/etc/local.d, so couchdb community highly recommend to use volume mapping of that directory.

In short, we don't need those configurations any more.
Instead, we need to add q param when creating a database.

I will update the code accordingly.

One question is, would it be required to add volume mapping for configuration directory as well?
For this, we need to keep configuration directory and files in OW repo such as etc/default.ini etc/default.d/*.ini etc/local.d because there will be no default configuration when volume is mapped.

style95 · 2017-10-20T07:25:08Z

@rabbah
Code is updated accordingly.
Regarding q value, default is 8.
I think it would be a good starting point, how do you think?

style95 · 2017-10-27T07:05:53Z

It seems Travis error is not related to this pr.
Could anyone trigger Travis build again?

cbickel · 2017-10-27T11:47:23Z

ansible/roles/couchdb/tasks/deploy.yml

 set_fact:
- nodeName: "couchdb{{ groups['db'].index(inventory_hostname) }}"
+ coordinator: "{{ groups['db'][0] }}"


You should use the ansible_host of groups['db'][0] here. Otherwise we are not able to deploy two dbs on one machine (which is very useful for testing)

As you know, to setup cluster, we need only one coordinator node.
This is to get the first host among many hosts.
Is it possible to do the same thing with ansible_host?

For examaple, if there are two nodes running on same ansible_host, we need only one node.

cbickel · 2017-10-27T11:55:27Z

ansible/roles/couchdb/tasks/deploy.yml

 state: started
 recreate: true
 restart_policy: "{{ docker.restart.policy }}"
 volumes: "{{volume_dir | default([])}}"
 ports:
 - "{{ db_port }}:5984"
+ - "4369:4369"
+ - "9100:9100"


Can we make these ports (outside of the container) configurable and increment them for each container? Again, to be able to deploy two dbs on the same host.

I want to discuss this further.
Please refer to this: https://docs.couchdb.org/en/latest/cluster/setup.html#firewall

Those two ports are used when Erlang VM finds other nodes.
APAIK, 4369 is reserved to find other nodes, and 9100 is reserved to use for communication among nodes.

Normally, erlang VM uses random ports between 9100~9200, but it is forced to use 9100 only in apache/couchdb.

Regarding 9100, I think we can change it, but we need to update apache/couchdb image.
That should be different on every nodes, I am not sure it works in that situation.
You can refer to this: https://github.com/apache/couchdb-docker/blob/master/2.1.1/vm.args#L14

And with regard to 4369, since it is reserved at Erlang VM layer, I am not sure we can change it.

@cbickel Do you have any idea?

style95 · 2017-11-06T02:34:11Z

@cbickel any comments on this?

cbickel · 2017-11-08T12:55:33Z

ansible/roles/couchdb/tasks/deploy.yml

+ url: "{{ db_protocol }}:https://{{ ansible_host }}:{{ db_port }}/_cluster_setup"
+ method: POST
+ body: >
+ {"action": "enable_cluster", "bind_address":"0.0.0.0", "username": "{{ db_username }}", "password":"{{ db_password }}", "port": {{ db_port }}, "node_count": "{{ num_instances }}", "remote_node": "{{ ansible_host }}", "remote_current_user": "{{ db_username }}", "remote_current_password": "{{ db_password }}"}


What about using

{{ groups['db'] | length }}

Instead of defining num_instances on the top?

cbickel · 2017-11-08T14:03:49Z

This is a great Pull request.
The only thing, that should be there (if it is technically possible) is, that we should have the ability to deploy two instances on the same machine. With this feature, testing would be much easier. We have it for the controllers and invokers as well.

And I have an additional question. Is the intention of this PR only, to share data across multiple instances? Wouldn't it be great, if the requests against the database would be distributed across the instances? Or do you have in mind to do this later?

style95 · 2017-11-10T09:50:27Z

Agree. It would be great if we can deploy more than one instances on a single machine for testability.

Regarding the data sharing across multiple instances, do we need any further configurations to distribute data?

AFAIK, from CouchDB 2.0, the number of nodes who are in charge of specific shards will be decided based on n(normally recommended value is 3).
For example, if we have 5 nodes, and n is 3, and let's say node 1,2,3 are in charge of some data, and if client sent request to node 4, it will automatically redirect the request to one of node 1,2,3 and they will handle the requests properly.

Based on the value of w, r, it will respond to the clients.
So no matter who receives the request, client will get proper response.
Did I understand your question correctly?
If I misunderstood about couchdb, please rectify me.

If you are mentioning about load-balancing(using single endpoint address) among multi nodes for HA, currently i have no idea. We may locate load-balancer in front of them or, other components to use list of endpoints.

cbickel · 2017-11-10T13:28:21Z

Ok, I didn't know yet, that if you send a request to instance 1, that it could be handled by instance2.

I think for HA we need one of your proposed ideas. I just wanted to ask, if you want to address that problem already in this PR, or maybe later.
But for this PR I'm totally fine without loadbalancing to the different nodes.

style95 · 2017-11-12T10:22:47Z

Yes I think it would be better to address load-balancing problem in separate pr.

Regarding epmd port, 4369, initially I thought we can change the port using ERL_EPMD_PORT: https://erlang.org/doc/man/epmd.html.

But for example, if we configure the port as 4369 on first node and 4370 for second node, first node will try to find other node using 4369 port but on second node, only 4370 is opened, so look-up will fail.

I am looking for the configuration to segregate its binding port(4369) and communication port(4370). But I am not sure it is feasible.

style95 · 2017-11-13T07:39:11Z

It seems, all Erlang nodes must use same epmd port.
https://stackoverflow.com/questions/40189097/erlang-epmd-connects-to-other-host-with-non-default-epmd-port

It means, we cannot run a couchdb cluster on a single machine.

As of now I am trying the following scenario.

use same epmd port(4369) for all couchdb nodes.
do not map that port to host (remove -p 4369:4369).
trying to form a cluster.

Since both nodes will be running on a single machine, they will be in the same subnet.
They might be able to communicate with each other using their container IPs.

So though port is not opened in host layer, they may be able to connect to other node using 172.17.0.2:4369.
For this, when we deploy couchdb, we need to specify its container IPs.
(currently, we are specifying host IP).

Since it assumes nodes are running on same subnet, it won't work on distributed environments.
So we need to differentiate local deployment with distributed deployment.

Is it worth to take this approach?

style95 · 2017-11-15T02:11:59Z

Let me share the discussion with @cbickel.
It would be better for us to have test cases for HA or failover of CouchDB.
But it's not that simple and seems to require many dirty changes to run multiple couchdb on a single machine.

So it could be an option that running a test case in case there are more than 1 couchdb deployed(in distributed environment).
Same approach is already taken for controllers.

I will look into ShootComponentsTests.scala and apply similar way for couchdb as well.

cbickel · 2017-11-15T12:07:02Z

One idea for one of the tests could be:

Write documents to the first DB
Shoot the first DB
Check that the documents are available (maybe also some view results) on the second db
Write some documents to second DB
Start first DB again and wait until the first DB is up again
Check that the new documents are replicated back to the first database

Do you have some more ideas?

chetanmeh · 2017-11-20T05:05:32Z

@style95 What would be consistency support in clustered setup. Would it be possible to read-your-own-writes?

Reading here indicates that for reads and writes it would use quorum which should allow one to read its own write

The number of copies of a document with the same revision that have to be read before CouchDB returns with a 200 is equal to a half of total copies of the document plus one. It is the same for the number of nodes that need to save a document before a write is returned with 201

However its not clear what is the semantics for queries or view access. Per here it looks like queries would not be using quorum

The coordinator waits for a given number of responses to arrive from the nodes it contacted. The number of responses it waits for differ based on the type of request. Reads and writes wait for two responses. A view read only waits for one.

So in clustered setup it can happen that an action added in one step is not visible in query made in subsequent request?

rabbah · 2017-11-20T05:17:01Z

Correct: views are subject to eventual consistency and may exclude a just added/modified action (for example) until all nodes have replicated and indexed.

chetanmeh · 2017-11-20T07:16:33Z

Thanks @rabbah for confirming the eventual consistent nature for queries.

Next question - Does Couchdb supports specifying the sharding key? Or it always shards on the basis of _id key only. In absence of sharding key all queries would need to query all shards which may have some impact on performance

style95 · 2017-11-20T11:53:48Z

@chetanmeh
AFAIK, there is no support to specify the custom sharding key.
What kinds of performance increment do you expect with custom sharding key?

If we do view query, we should access all relevant shards to create a combined result.
That's the reason why, it is generally recommended to have small q value for read-intensive DB and large value for write-intensive DB.

If I misunderstood, kindly correct me.

chetanmeh · 2017-11-20T13:20:15Z

What kinds of performance increment do you expect with custom sharding key?

I am exploring support for running OW on Mongo and Cosomosdb which support specifying custom partition key. Looking at views used by Couchdb I see following types of queries

Activations
- byDateWithLogs - [doc.start]
- cleanUpActivations - [doc.start]
- activations - [doc.namespace, doc.start]
Whisks
- [root namespace, date] - For each of rules, packages, actions, triggers

From my understanding of flow the "whisks" are mostly read. In a sharded setup this may benefit by using "root namespace" as sharding (partition key). In such a case the query would only need to be evaluated on single shard and avoid hitting all the shards for the whisks access.

This would be specially useful for databases where sharded queries are done from client which may add higher latency. Compared to this Couchdb proxies the queries and talks to all the shards on behalf of client. As such calls happen within same data center they may incur less latency.

So I am interested in this PR to see how to model OW entities on other sharded storage systems

style95 · 2017-11-20T16:44:47Z

So you are thinking of assigning a shard per namespace, right?
Great idea. It would be great for small scale and latency.

But if there are huge number of data, one shard may not be enough.
Since shard is a unit of distribution and replication, too big shard is not good for performance.
As data grows, at some scale we need to rebalance shards but if we use namespace as a partition key, how can we rebalance the shards into small shards?

And accessing to shards are proceeded in parallel.
Generally recommended value for n is 3 as so, w and r will be 2.
That means the number of shards to access is 2.
Though we decrease the access time to 1, there might not be huge latency improvement.

And in my experience, storing activation has more huge impact on performance than reading whisk entity because there is cache layer in controller.

chetanmeh · 2017-11-21T04:48:21Z

But if there are huge number of data, one shard may not be enough.
Since shard is a unit of distribution and replication, too big shard is not good for performance.
As data grows, at some scale we need to rebalance shards but if we use namespace as a partition key, how can we rebalance the shards into small shards?

Ack. But it depends on kind of data being stored. This scheme may be ok for whisks where one namespace may contain say 1M entities (which can be stored in single shard). This may not work fine for activations where number of activations may of the order of 10-100M. So use of sharding key would depend on number of objects being stored

And accessing to shards are proceeded in parallel.
Generally recommended value for n is 3 as so, w and r will be 2.
That means the number of shards to access is 2.
Though we decrease the access time to 1, there might not be huge latency improvement.

This is fine for read and write of single instance. However for queries all shards would need to be queried

The Query request handler must ask every node that hosts a replica of a shard in the database for their answer to the query. Depending on how the shard replicas are distributed across the cluster, this might be all nodes in the cluster, or only a subset.

And in my experience, storing activation has more huge impact on performance than reading whisk entity because there is cache layer in controller.

Is the cache layer used for queries also or its only used for individual entity access i.e. lookups by docId?

style95 · 2017-11-21T12:12:01Z

@chetanmeh
Yes. There might not be such huge number of data in one namespace.
But what if data grows by any possibility, how can we handle that?
And if there are not much data, do you expect it will cause any trouble in terms of performance with such small size of data?

Second, query is only used to list entities.
All other APIs, such as get/update/delete and even invoke an action uses normal document get/put api.(surely they can take advantage of cache.)

And I can't expect such huge number of list API calls.
As you mentioned, your approach can improve latency of list API.
But normally list API is called by user rather than any systematic call because list is to see what kinds of entities are there. So there may not be 10K or 100K TPS calls.

That's the reason why I think action invocation(which includes storing activations) will affect more on the performance and incline to scalable approach.

And, I could not find any support for custom partition key.

chetanmeh · 2017-11-21T13:53:53Z

But normally list API is called by user rather than any systematic call because list is to see what kinds of entities are there. So there may not be 10K or 100K TPS calls.
That's the reason why I think action invocation(which includes storing activations) will affect more on the performance and incline to scalable approach.

Makes sense. Would keep that in mind when modeling for other dbs

And, I could not find any support for custom partition key.

Thanks for confirming. So for couchdb the partition key decision is moot!

rabbah · 2017-11-22T04:58:46Z

The queries (list operations) are not cached today. They could be since we allow stale results and queue indexing operations in couch as configured. In general as already noted the bulk of the load isn’t on crud or query operations.

style95 · 2017-11-26T12:41:45Z

I have tested this pr is working with a couchdb cluster and a single couchdb setup.

style95 · 2017-11-26T12:44:52Z

Travis complains there is no db_local.ini, is it because of this PR?

cbickel · 2017-11-28T15:32:37Z

ansible/templates/whisk.properties.j2

@@ -94,6 +94,8 @@ db.whisk.activations={{ db.whisk.activations }}
 db.whisk.actions.ddoc={{ db.whisk.actions_ddoc }}
 db.whisk.activations.ddoc={{ db.whisk.activations_ddoc }}
 db.whisk.activations.filter.ddoc={{ db.whisk.activations_filter_ddoc }}
+db.hosts={{ groups["db"] | map('extract', hostvars, 'ansible_host') | list | join(",") }}


what about renaming this value to db.hostsList as well? In the following code you always use dbHostsList.

cbickel · 2017-11-28T15:36:17Z

tests/src/test/scala/ha/ShootComponentsTests.scala

+ }
+ }
+
+ def pingDB(host: String, port: Int) = {


what do you think about using one ping method and passing the path as an argument.

cbickel

LGTM. I'll take care about merging.
Thanks a lot for your contribution to OpenWhisk :)

This is a follow-up patch to the below: - Add couchdb clustering. (apache#2810) commit e587bf8

* Add couchdb clustering

bbrowning mentioned this pull request Sep 28, 2017

Create the required CouchDB system databases #2808

Closed

style95 force-pushed the feature/add-couchdb-cluster branch from abc9a1b to cc0d2d3 Compare September 29, 2017 01:44

rabbah added programming model deployment and removed programming model labels Oct 18, 2017

style95 force-pushed the feature/add-couchdb-cluster branch from cc0d2d3 to 30ff97b Compare October 20, 2017 07:21

style95 force-pushed the feature/add-couchdb-cluster branch 2 times, most recently from 1666f2e to 08f9594 Compare October 26, 2017 09:23

csantanapr requested a review from cbickel October 27, 2017 10:26

csantanapr assigned cbickel Oct 27, 2017

cbickel requested changes Oct 27, 2017

View reviewed changes

cbickel reviewed Nov 8, 2017

View reviewed changes

style95 force-pushed the feature/add-couchdb-cluster branch from 2feda7f to 934bf3a Compare November 20, 2017 12:53

style95 changed the title ~~Add couchdb clustering~~ [WIP] Add couchdb clustering Nov 22, 2017

style95 force-pushed the feature/add-couchdb-cluster branch 2 times, most recently from c31ecad to 6e30f92 Compare November 26, 2017 12:39

style95 changed the title ~~[WIP] Add couchdb clustering~~ Add couchdb clustering Nov 26, 2017

cbickel reviewed Nov 28, 2017

View reviewed changes

style95 force-pushed the feature/add-couchdb-cluster branch from 505f341 to 3d2da88 Compare November 29, 2017 07:01

cbickel approved these changes Nov 29, 2017

View reviewed changes

Administrator and others added 6 commits November 29, 2017 17:55

Add couchdb clustering

2d253db

Add test case for couchdb HA

7ee3655

Replace num_instances to db host length.

6a10515

Remove pingDB

9e13762

Change key name for db hosts

75d7856

Format codes

7d77126

style95 force-pushed the feature/add-couchdb-cluster branch from 3d2da88 to 7d77126 Compare November 29, 2017 08:55

cbickel merged commit e587bf8 into apache:master Nov 29, 2017

csantanapr mentioned this pull request Nov 29, 2017

couchdb clustering not working #3029

Closed

NaohiroTamura added a commit to NaohiroTamura/incubator-openwhisk that referenced this pull request Dec 22, 2017

Updated couchdb docker image name to apache/couchdb

b292960

This is a follow-up patch to the below: - Add couchdb clustering. (apache#2810) commit e587bf8

NaohiroTamura mentioned this pull request Dec 22, 2017

Updated couchdb docker image name to apache/couchdb #3131

Merged

NaohiroTamura added a commit to NaohiroTamura/incubator-openwhisk that referenced this pull request Dec 25, 2017

Removed docker image name klaemo/couchdb

88ff819

This is a follow-up patch to the below: - Add couchdb clustering. (apache#2810) commit e587bf8

NaohiroTamura added a commit to NaohiroTamura/incubator-openwhisk that referenced this pull request Jan 9, 2018

Removed docker image name klaemo/couchdb

cd98de0

This is a follow-up patch to the below: - Add couchdb clustering. (apache#2810) commit e587bf8

BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019

Add couchdb clustering. (apache#2810)

b768d8b

* Add couchdb clustering

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add couchdb clustering #2810

Add couchdb clustering #2810

style95 commented Sep 28, 2017 •

edited

Loading

style95 commented Oct 4, 2017

rabbah commented Oct 18, 2017

style95 commented Oct 20, 2017 •

edited

Loading

style95 commented Oct 20, 2017

style95 commented Oct 27, 2017 •

edited

Loading

cbickel Oct 27, 2017

style95 Nov 2, 2017

cbickel Oct 27, 2017

style95 Nov 2, 2017 •

edited

Loading

style95 commented Nov 6, 2017

cbickel Nov 8, 2017

cbickel commented Nov 8, 2017

style95 commented Nov 10, 2017

cbickel commented Nov 10, 2017

style95 commented Nov 12, 2017

style95 commented Nov 13, 2017

style95 commented Nov 15, 2017

cbickel commented Nov 15, 2017

chetanmeh commented Nov 20, 2017

rabbah commented Nov 20, 2017

chetanmeh commented Nov 20, 2017

style95 commented Nov 20, 2017

chetanmeh commented Nov 20, 2017

style95 commented Nov 20, 2017

chetanmeh commented Nov 21, 2017 •

edited

Loading

style95 commented Nov 21, 2017 •

edited

Loading

chetanmeh commented Nov 21, 2017

rabbah commented Nov 22, 2017

style95 commented Nov 26, 2017

style95 commented Nov 26, 2017

cbickel Nov 28, 2017

cbickel Nov 28, 2017

cbickel left a comment

Add couchdb clustering #2810

Add couchdb clustering #2810

Conversation

style95 commented Sep 28, 2017 • edited Loading

Image of CouchDB is changed to the one from Apache.

Coordinator node.

New environment variables for cluster.

style95 commented Oct 4, 2017

rabbah commented Oct 18, 2017

style95 commented Oct 20, 2017 • edited Loading

style95 commented Oct 20, 2017

style95 commented Oct 27, 2017 • edited Loading

cbickel Oct 27, 2017

Choose a reason for hiding this comment

style95 Nov 2, 2017

Choose a reason for hiding this comment

cbickel Oct 27, 2017

Choose a reason for hiding this comment

style95 Nov 2, 2017 • edited Loading

Choose a reason for hiding this comment

style95 commented Nov 6, 2017

cbickel Nov 8, 2017

Choose a reason for hiding this comment

cbickel commented Nov 8, 2017

style95 commented Nov 10, 2017

cbickel commented Nov 10, 2017

style95 commented Nov 12, 2017

style95 commented Nov 13, 2017

style95 commented Nov 15, 2017

cbickel commented Nov 15, 2017

chetanmeh commented Nov 20, 2017

rabbah commented Nov 20, 2017

chetanmeh commented Nov 20, 2017

style95 commented Nov 20, 2017

chetanmeh commented Nov 20, 2017

style95 commented Nov 20, 2017

chetanmeh commented Nov 21, 2017 • edited Loading

style95 commented Nov 21, 2017 • edited Loading

chetanmeh commented Nov 21, 2017

rabbah commented Nov 22, 2017

style95 commented Nov 26, 2017

style95 commented Nov 26, 2017

cbickel Nov 28, 2017

Choose a reason for hiding this comment

cbickel Nov 28, 2017

Choose a reason for hiding this comment

cbickel left a comment

Choose a reason for hiding this comment

style95 commented Sep 28, 2017 •

edited

Loading

style95 commented Oct 20, 2017 •

edited

Loading

style95 commented Oct 27, 2017 •

edited

Loading

style95 Nov 2, 2017 •

edited

Loading

chetanmeh commented Nov 21, 2017 •

edited

Loading

style95 commented Nov 21, 2017 •

edited

Loading