Enable cluster auto-assembly through a seedlist #1658

kocolosk · 2018-10-16T20:10:59Z

Overview

This introduces a new config setting which allows an administrator to configure an initial list of nodes that should be contacted when a node boots up:

[cluster]
seedlist = [email protected],[email protected],[email protected]

If configured, CouchDB will add every node in the seedlist to the _nodes DB automatically, which will trigger a distributed Erlang connection and a replication of the internal system databases to the local node. This eliminates the need to explicitly add each node using the HTTP API.

We also modify the /_up endpoint to reflect the progress of the initial seeding of the node. If a seedlist is configured the endpoint will return 404 until the local node has updated its local replica of each of the system databases from one of the members of the seedlist. The body of the HTTP response now looks like

{
  "status": "seeding"
  "seeds": {
    "[email protected]": {
      "timestamp": "2018-10-16T19:58:03+00:00",
      "last_replication_status": "ok",
      "pending_updates": {"_nodes": 0, "_dbs": 101, "_users": 42}
    },
    "[email protected]": { ...
}

Once the status flips to "ok" the endpoint will return 200 and it's safe to direct requests to the new node.

Testing recommendations

Configure the seedlist for a new 3 node cluster with the names of the 3 nodes and check /_membership to confirm that the nodes connect to each other automatically
On a cluster with lots of databases or users, add a node to the cluster and check that /_up returns 404 while the initial internal replication takes place.

You'll notice that the PR currently has no tests. I wanted to put it up for review while I familiarize myself with the latest bits of the test suite and see what I can contribute.

Checklist

Code is written and works correctly;
Changes are covered by tests;
Documentation reflects the changes;

This introduces a new config setting which allows an administrator to configure an initial list of nodes that should be contacted when a node boots up: [cluster] seedlist = [email protected],[email protected],[email protected] If configured, CouchDB will add every node in the seedlist to the _nodes DB automatically, which will trigger a distributed Erlang connection and a replication of the internal system databases to the local node. This eliminates the need to explicitly add each node using the HTTP API.

This patch adds a new gen_server whose only job is to download the system DBs (_nodes, _dbs, _users) from the nodes in the seedlist, and then set a flag once it has downloaded a complete copy. Once the flag is set we can confidently allow the node to handle HTTP requests.

Missing from this test suite is anything that actually triggers an internal replication between nodes in a cluster, because I don't know how to do that (or if it is even possible).

nickva · 2018-10-25T17:10:21Z

src/mem3/src/mem3_rpc.erl

+% "Pull" is a bit of a misnomer here, as what we're actually doing is
+% issuing an RPC request and telling the remote node to push updates to
+% us. This lets us reuse all of the battle-tested machinery of mem3_rpc.
+pull_from_seed(Seed) ->


This seems useful in general and maybe rename this to pull_replication so it matches with the pull_replication_rpc local callback?

Good idea will do

nickva · 2018-10-25T18:00:20Z

src/mem3/src/mem3_seeds.erl

+    gen_server:call(?MODULE, get_status).
+
+init([]) ->
+    Seeds = get_seeds(),


Is there any use case where the seed would be added later to config after the node is started, so get_seeds() would be called then every time before start_replication(Seeds) is called.

I don’t expect that use case as the whole seed list feature is really built to make the node initialization process more robust. Would adding a seed cause _up to flip back to 404 if no seed had previously been contacted? Lots of weird stuff there.

nickva · 2018-10-25T18:09:01Z

src/mem3/src/mem3_seeds.erl

+
+init([]) ->
+    Seeds = get_seeds(),
+    timer:send_interval(?REPLICATION_INTERVAL, start_replication),


This will send start_replication forever every minute. What is the idea behind it? Something like, "once we have a seed list, we'll try to continuously replicate dbs from the seed list to our this node". Once we do it one time, wouldn't mem3_sync take care of this afterwards. Or this is just to handle retries if there are failures?

Hmmm ... I honestly don’t remember. I probably added it as a safeguard against failures with the intention to cancel the timer once we hit a “ready” status — but never did. I can add that

wohali · 2018-11-07T17:09:12Z

@kocolosk any progress on the test suite? Also, is there any need for #1337 to land for this to function?

kocolosk · 2018-11-07T18:33:07Z

Yes, 6f35073 added some tests. Could do more but I think I'd need to figure out how to mock internal replication.

No, #1337 is an independent thought. Both can coexist. Of the two I would personally consider this one to be a higher priority for production automation and operations.

wohali · 2018-11-07T20:48:55Z

@kocolosk thanks. Do you think you could submit a documentation PR for this new feature?

Also, you must update default.ini and/or local.ini under etc/rel/overlay before this PR lands. ;)

kocolosk · 2018-11-07T21:43:16Z

I already took care of the documentation PR -- submitted and approved: apache/couchdb-documentation#339

I will update the default config and address Nick's other comment, hopefully tomorrow.

This is a holdover from an initial prototype; the current version is already equipped to run start_replication only as often as necessary to get the node into a ready state.

nickva

Looks good. Nice work!

I tested by starting a disconnected cluster:

./dev/run --admin=adm:pass --no-join

Created some dbs in node1. Then stopped, it, editedd the seedlist with node1 as the only seed. Restarted the disconnected cluster.

Cluster connected as expected and _dbs was synchronized.

http https://adm:pass@localhost:15984/_dbs
http https://adm:pass@localhost:25984/_dbs
http https://adm:pass@localhost:35984/_dbs

All show:

"update_seq": "5196-g2wAAAABaANkAA9ub2RlMUAxMjcuMC4wLjFsAAAAAmEAbgQA_____2piAAAUTGo"

While seeding response on node2 and node3 was 404:

HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 127
Content-Type: application/json
Date: Fri, 09 Nov 2018 22:56:26 GMT
Server: CouchDB/2.2.0-6f3507303 (Erlang OTP/20)
X-Couch-Request-ID: 74f82375be
X-CouchDB-Body-Time: 0

{
    "seeds": {
        "[email protected]": {
            "last_replication_status": "error",
            "timestamp": "2018-11-09T22:56:07.608284Z"
        }
    },
    "status": "seeding"
}

It did actually show an error. I think I might have seen a rexi_DOWN in the log possibly from it.

But it did finish correctly and _up started showing:

{
    "seeds": {
        "[email protected]": {
            "last_replication_status": "ok",
            "pending_updates": {
                "_dbs": 0,
                "_nodes": 0,
                "_users": 0
            },
            "timestamp": "2018-11-09T22:56:28.126795Z"
        }
    },
    "status": "ok"
}

kocolosk added 4 commits October 16, 2018 15:39

Surface seeding status in /_up

7130bba

Add some unit tests for seedlist configuration

6f35073

Missing from this test suite is anything that actually triggers an internal replication between nodes in a cluster, because I don't know how to do that (or if it is even possible).

asfgit force-pushed the mem3-seedlist branch from b85a9e2 to 6f35073 Compare October 17, 2018 20:46

kocolosk mentioned this pull request Oct 17, 2018

Document the new seedlist config setting apache/couchdb-documentation#339

Merged

3 tasks

nickva reviewed Oct 25, 2018

View reviewed changes

kocolosk added 3 commits November 9, 2018 09:07

Rename pull_from_seed to pull_replication

7fd12f7

Remove superfluous start_replication messages

31d7a3a

This is a holdover from an initial prototype; the current version is already equipped to run start_replication only as often as necessary to get the node into a ready state.

Add description of seedlist config option

83f2ea5

nickva approved these changes Nov 9, 2018

View reviewed changes

kocolosk added 2 commits November 10, 2018 09:34

Merge branch 'master' into mem3-seedlist

e5afe23

Merge branch 'master' into mem3-seedlist

d102cbc

kocolosk merged commit 0302e9d into master Nov 11, 2018

kocolosk deleted the mem3-seedlist branch November 11, 2018 02:45

wohali mentioned this pull request Dec 14, 2018

Support new 2.3.0 seeding approach to establish a cluster apache/couchdb-docker#122

Closed

ghost mentioned this pull request Feb 4, 2019

Cluster created using seedlist does not create system tables #1900

Closed

iilyak mentioned this pull request Jun 6, 2019

Auto-create system databases at startup IF configured for "single node mode" #1977

Closed

gesellix mentioned this pull request Jul 18, 2019

Add maintenance metric gesellix/couchdb-prometheus-exporter#40

Open

kocolosk mentioned this pull request Aug 7, 2019

kubernetes: cluster quorum status #2100

Open

wohali mentioned this pull request Aug 26, 2019

docker/kubrenetes nodes discovery #2142

Open

kocolosk mentioned this pull request Sep 5, 2019

Improve ability to discover when a node is ready to handle requests #1416

Closed

kocolosk mentioned this pull request Jan 8, 2020

Add support for CouchDB clustering apache/openwhisk-deploy-kube#498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable cluster auto-assembly through a seedlist #1658

Enable cluster auto-assembly through a seedlist #1658

kocolosk commented Oct 16, 2018 •

edited

Loading

nickva Oct 25, 2018

kocolosk Oct 25, 2018

nickva Oct 25, 2018

kocolosk Oct 25, 2018

nickva Oct 25, 2018

kocolosk Oct 25, 2018

wohali commented Nov 7, 2018

kocolosk commented Nov 7, 2018 •

edited

Loading

wohali commented Nov 7, 2018

kocolosk commented Nov 7, 2018

nickva left a comment •

edited

Loading

Enable cluster auto-assembly through a seedlist #1658

Enable cluster auto-assembly through a seedlist #1658

Conversation

kocolosk commented Oct 16, 2018 • edited Loading

Overview

Testing recommendations

Checklist

nickva Oct 25, 2018

Choose a reason for hiding this comment

kocolosk Oct 25, 2018

Choose a reason for hiding this comment

nickva Oct 25, 2018

Choose a reason for hiding this comment

kocolosk Oct 25, 2018

Choose a reason for hiding this comment

nickva Oct 25, 2018

Choose a reason for hiding this comment

kocolosk Oct 25, 2018

Choose a reason for hiding this comment

wohali commented Nov 7, 2018

kocolosk commented Nov 7, 2018 • edited Loading

wohali commented Nov 7, 2018

kocolosk commented Nov 7, 2018

nickva left a comment • edited Loading

Choose a reason for hiding this comment

kocolosk commented Oct 16, 2018 •

edited

Loading

kocolosk commented Nov 7, 2018 •

edited

Loading

nickva left a comment •

edited

Loading