-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cluster auto-assembly through a seedlist #1658
Conversation
This introduces a new config setting which allows an administrator to configure an initial list of nodes that should be contacted when a node boots up: [cluster] seedlist = [email protected],[email protected],[email protected] If configured, CouchDB will add every node in the seedlist to the _nodes DB automatically, which will trigger a distributed Erlang connection and a replication of the internal system databases to the local node. This eliminates the need to explicitly add each node using the HTTP API.
This patch adds a new gen_server whose only job is to download the system DBs (_nodes, _dbs, _users) from the nodes in the seedlist, and then set a flag once it has downloaded a complete copy. Once the flag is set we can confidently allow the node to handle HTTP requests.
Missing from this test suite is anything that actually triggers an internal replication between nodes in a cluster, because I don't know how to do that (or if it is even possible).
src/mem3/src/mem3_rpc.erl
Outdated
% "Pull" is a bit of a misnomer here, as what we're actually doing is | ||
% issuing an RPC request and telling the remote node to push updates to | ||
% us. This lets us reuse all of the battle-tested machinery of mem3_rpc. | ||
pull_from_seed(Seed) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems useful in general and maybe rename this to pull_replication so it matches with the pull_replication_rpc local callback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea will do
gen_server:call(?MODULE, get_status). | ||
|
||
init([]) -> | ||
Seeds = get_seeds(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any use case where the seed would be added later to config after the node is started, so get_seeds() would be called then every time before start_replication(Seeds)
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t expect that use case as the whole seed list feature is really built to make the node initialization process more robust. Would adding a seed cause _up to flip back to 404 if no seed had previously been contacted? Lots of weird stuff there.
src/mem3/src/mem3_seeds.erl
Outdated
|
||
init([]) -> | ||
Seeds = get_seeds(), | ||
timer:send_interval(?REPLICATION_INTERVAL, start_replication), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will send start_replication
forever every minute. What is the idea behind it? Something like, "once we have a seed list, we'll try to continuously replicate dbs from the seed list to our this node". Once we do it one time, wouldn't mem3_sync take care of this afterwards. Or this is just to handle retries if there are failures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm ... I honestly don’t remember. I probably added it as a safeguard against failures with the intention to cancel the timer once we hit a “ready” status — but never did. I can add that
@kocolosk thanks. Do you think you could submit a documentation PR for this new feature? Also, you must update |
I already took care of the documentation PR -- submitted and approved: apache/couchdb-documentation#339 I will update the default config and address Nick's other comment, hopefully tomorrow. |
This is a holdover from an initial prototype; the current version is already equipped to run start_replication only as often as necessary to get the node into a ready state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Nice work!
I tested by starting a disconnected cluster:
./dev/run --admin=adm:pass --no-join
Created some dbs in node1. Then stopped, it, editedd the seedlist with node1 as the only seed. Restarted the disconnected cluster.
Cluster connected as expected and _dbs was synchronized.
http https://adm:pass@localhost:15984/_dbs
http https://adm:pass@localhost:25984/_dbs
http https://adm:pass@localhost:35984/_dbs
All show:
"update_seq": "5196-g2wAAAABaANkAA9ub2RlMUAxMjcuMC4wLjFsAAAAAmEAbgQA_____2piAAAUTGo"
While seeding response on node2 and node3 was 404:
HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 127
Content-Type: application/json
Date: Fri, 09 Nov 2018 22:56:26 GMT
Server: CouchDB/2.2.0-6f3507303 (Erlang OTP/20)
X-Couch-Request-ID: 74f82375be
X-CouchDB-Body-Time: 0
{
"seeds": {
"[email protected]": {
"last_replication_status": "error",
"timestamp": "2018-11-09T22:56:07.608284Z"
}
},
"status": "seeding"
}
It did actually show an error. I think I might have seen a rexi_DOWN in the log possibly from it.
But it did finish correctly and _up
started showing:
{
"seeds": {
"[email protected]": {
"last_replication_status": "ok",
"pending_updates": {
"_dbs": 0,
"_nodes": 0,
"_users": 0
},
"timestamp": "2018-11-09T22:56:28.126795Z"
}
},
"status": "ok"
}
Overview
This introduces a new config setting which allows an administrator to configure an initial list of nodes that should be contacted when a node boots up:
If configured, CouchDB will add every node in the seedlist to the
_nodes
DB automatically, which will trigger a distributed Erlang connection and a replication of the internal system databases to the local node. This eliminates the need to explicitly add each node using the HTTP API.We also modify the
/_up
endpoint to reflect the progress of the initial seeding of the node. If a seedlist is configured the endpoint will return 404 until the local node has updated its local replica of each of the system databases from one of the members of the seedlist. The body of the HTTP response now looks likeOnce the status flips to "ok" the endpoint will return 200 and it's safe to direct requests to the new node.
Testing recommendations
/_membership
to confirm that the nodes connect to each other automatically/_up
returns 404 while the initial internal replication takes place.You'll notice that the PR currently has no tests. I wanted to put it up for review while I familiarize myself with the latest bits of the test suite and see what I can contribute.
Checklist