Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cluster auto-assembly through a seedlist #1658

Merged
merged 9 commits into from
Nov 11, 2018
Merged

Conversation

kocolosk
Copy link
Member

@kocolosk kocolosk commented Oct 16, 2018

Overview

This introduces a new config setting which allows an administrator to configure an initial list of nodes that should be contacted when a node boots up:

If configured, CouchDB will add every node in the seedlist to the _nodes DB automatically, which will trigger a distributed Erlang connection and a replication of the internal system databases to the local node. This eliminates the need to explicitly add each node using the HTTP API.

We also modify the /_up endpoint to reflect the progress of the initial seeding of the node. If a seedlist is configured the endpoint will return 404 until the local node has updated its local replica of each of the system databases from one of the members of the seedlist. The body of the HTTP response now looks like

{
  "status": "seeding"
  "seeds": {
    "[email protected]": {
      "timestamp": "2018-10-16T19:58:03+00:00",
      "last_replication_status": "ok",
      "pending_updates": {"_nodes": 0, "_dbs": 101, "_users": 42}
    },
    "[email protected]": { ...
}

Once the status flips to "ok" the endpoint will return 200 and it's safe to direct requests to the new node.

Testing recommendations

  • Configure the seedlist for a new 3 node cluster with the names of the 3 nodes and check /_membership to confirm that the nodes connect to each other automatically
  • On a cluster with lots of databases or users, add a node to the cluster and check that /_up returns 404 while the initial internal replication takes place.

You'll notice that the PR currently has no tests. I wanted to put it up for review while I familiarize myself with the latest bits of the test suite and see what I can contribute.

Checklist

  • Code is written and works correctly;
  • Changes are covered by tests;
  • Documentation reflects the changes;

This introduces a new config setting which allows an administrator to
configure an initial list of nodes that should be contacted when a node
boots up:

[cluster]
seedlist = [email protected],[email protected],[email protected]

If configured, CouchDB will add every node in the seedlist to the _nodes
DB automatically, which will trigger a distributed Erlang connection and
a replication of the internal system databases to the local node. This
eliminates the need to explicitly add each node using the HTTP API.
This patch adds a new gen_server whose only job is to download the
system DBs (_nodes, _dbs, _users) from the nodes in the seedlist, and
then set a flag once it has downloaded a complete copy. Once the flag
is set we can confidently allow the node to handle HTTP requests.
Missing from this test suite is anything that actually triggers an
internal replication between nodes in a cluster, because I don't know
how to do that (or if it is even possible).
% "Pull" is a bit of a misnomer here, as what we're actually doing is
% issuing an RPC request and telling the remote node to push updates to
% us. This lets us reuse all of the battle-tested machinery of mem3_rpc.
pull_from_seed(Seed) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems useful in general and maybe rename this to pull_replication so it matches with the pull_replication_rpc local callback?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea will do

gen_server:call(?MODULE, get_status).

init([]) ->
Seeds = get_seeds(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any use case where the seed would be added later to config after the node is started, so get_seeds() would be called then every time before start_replication(Seeds) is called.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t expect that use case as the whole seed list feature is really built to make the node initialization process more robust. Would adding a seed cause _up to flip back to 404 if no seed had previously been contacted? Lots of weird stuff there.


init([]) ->
Seeds = get_seeds(),
timer:send_interval(?REPLICATION_INTERVAL, start_replication),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will send start_replication forever every minute. What is the idea behind it? Something like, "once we have a seed list, we'll try to continuously replicate dbs from the seed list to our this node". Once we do it one time, wouldn't mem3_sync take care of this afterwards. Or this is just to handle retries if there are failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm ... I honestly don’t remember. I probably added it as a safeguard against failures with the intention to cancel the timer once we hit a “ready” status — but never did. I can add that

@wohali
Copy link
Member

wohali commented Nov 7, 2018

@kocolosk any progress on the test suite? Also, is there any need for #1337 to land for this to function?

@kocolosk
Copy link
Member Author

kocolosk commented Nov 7, 2018

Yes, 6f35073 added some tests. Could do more but I think I'd need to figure out how to mock internal replication.

No, #1337 is an independent thought. Both can coexist. Of the two I would personally consider this one to be a higher priority for production automation and operations.

@wohali
Copy link
Member

wohali commented Nov 7, 2018

@kocolosk thanks. Do you think you could submit a documentation PR for this new feature?

Also, you must update default.ini and/or local.ini under etc/rel/overlay before this PR lands. ;)

@kocolosk
Copy link
Member Author

kocolosk commented Nov 7, 2018

I already took care of the documentation PR -- submitted and approved: apache/couchdb-documentation#339

I will update the default config and address Nick's other comment, hopefully tomorrow.

This is a holdover from an initial prototype; the current version is
already equipped to run start_replication only as often as necessary to
get the node into a ready state.
Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Nice work!

I tested by starting a disconnected cluster:

./dev/run --admin=adm:pass --no-join

Created some dbs in node1. Then stopped, it, editedd the seedlist with node1 as the only seed. Restarted the disconnected cluster.

Cluster connected as expected and _dbs was synchronized.

http https://adm:pass@localhost:15984/_dbs
http https://adm:pass@localhost:25984/_dbs
http https://adm:pass@localhost:35984/_dbs

All show:

"update_seq": "5196-g2wAAAABaANkAA9ub2RlMUAxMjcuMC4wLjFsAAAAAmEAbgQA_____2piAAAUTGo"

While seeding response on node2 and node3 was 404:

HTTP/1.1 404 Object Not Found
Cache-Control: must-revalidate
Content-Length: 127
Content-Type: application/json
Date: Fri, 09 Nov 2018 22:56:26 GMT
Server: CouchDB/2.2.0-6f3507303 (Erlang OTP/20)
X-Couch-Request-ID: 74f82375be
X-CouchDB-Body-Time: 0

{
    "seeds": {
        "[email protected]": {
            "last_replication_status": "error",
            "timestamp": "2018-11-09T22:56:07.608284Z"
        }
    },
    "status": "seeding"
}

It did actually show an error. I think I might have seen a rexi_DOWN in the log possibly from it.

But it did finish correctly and _up started showing:

{
    "seeds": {
        "[email protected]": {
            "last_replication_status": "ok",
            "pending_updates": {
                "_dbs": 0,
                "_nodes": 0,
                "_users": 0
            },
            "timestamp": "2018-11-09T22:56:28.126795Z"
        }
    },
    "status": "ok"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants