-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure replication jobs migrate when the shard map changes #5093
Conversation
3f4124a
to
db76759
Compare
db76759
to
1df2636
Compare
@@ -108,11 +112,6 @@ get_worker_ref({DbName, DocId}) when is_binary(DbName), is_binary(DocId) -> | |||
nil | |||
end. | |||
|
|||
% Cluster membership change notification callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the cluster membership change notification any longer as the check_membership periodic check would take care of it. couch_replicator_db_changes will still initiate a rescan when cluster nodes connect/disconnect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice cleanup that removes most handle_cast
clauses.
Playing with the new PR locally I noticed that the couch replicator clustering and the db monitoring separate module were not needed any longer and removed. Added a separate |
To test the job migration even when the local shard doesn't get updated, I used an N=5 db and then stopped one of the nodes where the job wasn't running. Then noticed that the job correctly migrated to one of the remaining N=3 nodes. I used this silly script to create the replication job: #!/bin/bash
# make && ./dev/run --admin=adm:pass -n 5 --with-haproxy
S=localhost:5984
SA=adm:pass@${S}
JSON="Content-Type: application/json"
curl -s -XDELETE $SA/t > /dev/null
curl -s -XDELETE $SA/s > /dev/null
curl -s -XDELETE $SA/_replicator > /dev/null
sleep 1
echo "Creating _replicator db"
curl -f -s -XPUT $SA/_replicator'?n=5' > /dev/null
sleep 1
echo "Creating src and tgt dbs"
curl -f -s -XPUT $SA/t > /dev/null
curl -f -s -XPUT $SA/s > /dev/null
echo "Posting replication job"
curl -f -XPUT $SA/_replicator/r -H "$JSON" \
-d '{"source":"http:https://adm:pass@localhost:5984/s", "target":"http:https://adm:pass@localhost:5984/t", "worker_processes":1, "continuous":true}' Cluster started with: make && ./dev/run --admin=adm:pass -n 5 --with-haproxy As a result my replication job ended up running on NODE4. I then stopped NODE2 and that altered the hashing used in the ownership algorithm and the new owner becomes NODE1:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way to fix a nasty corner case, and also reduce the amount of code, and complexity at the same time. Brilliant work!
@@ -108,11 +112,6 @@ get_worker_ref({DbName, DocId}) when is_binary(DbName), is_binary(DocId) -> | |||
nil | |||
end. | |||
|
|||
% Cluster membership change notification callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice cleanup that removes most handle_cast
clauses.
Previously, we didn't account for all the possible migration patterns, specifically, when a replication job would migrate from node A to node C, when we move a shard from A to B. The job would stop on node A, on node B the replicator would notice it's not the owner, and ignore the change. Node C, however, would never experience any shard file changes, so it would never know it should start the replication job. To fix the migration logic, pick a naive approach and make the replication doc processor retain the replication records not just for the jobs it owns, but also for the shard copies for which it could be a potential owner. Then, periodically, check ownership, and start/stop jobs accordingly. The difference from before is now we have a new replicator doc processor state: `not_owner`. Docs in this state still track all the doc updates, but otherwise stay inactive: they don't run jobs and they don't update replication documents on errors. Only if the node becomes the owner, it switches the record to `initializing` state and the job starts to run. To avoid unnecessary duplicate job dueling, while other nodes still haven't run their membership checks, the usurped jobs are stopped immediately during a membership check, but those that should run are started with a delay. That means that jobs which migrates, may stop running for a minute or so, and then should start running on another node. Additionally, can remove couch_replicator_db_changes module which used to monitor cluster stability and restart the scanner on node connects and disconnect (membership checks happen in doc processor). The only thing left from couch_replicator_db_changes was the couch_multidb subsription which was moved to the scanner function in the doc_processor. Since we removed couch_replicator_db_changes we also don't need the cluster stability monitoring so that module is also removed. The owner function is moved to the utils module and it's `unstable` return value eliminated. The membership check in the doc processor will stop jobs immediately but delay their startup so we get the same effect as before with the stability check. Previously the couch_replicator_clustering module also applied an initial startup delay to allow the application to start properly before starting to scan for jobs so to retain compatibility that setting was moved to doc processor. The new behavior should be covered by tests in the doc processor. Since the tests were updated, took the opportunity to switch the doc processor tests suites to use `TDEF_FE` macro, which removes few redundant setup lines (begin/end ...).
2f68d18
to
9c701a6
Compare
Previously, we didn't account for all the possible migration patterns, specifically, when a replication job would migrate from node A to node C, when we move a shard from A to B. The job would stop on node A, on node B the replicator would notice it's not the owner, and ignore the change. Node C, however, would never experience any shard file changes, so it would never know it should start the replication job.
To fix the migration logic, pick a naive approach and make the replication doc processor retain the replication records not just for the jobs it owns, but also for the shard copies for which it could be a potential owner. Then, periodically, check ownership, and start/stop jobs accordingly.
The difference from before is now we have a new replicator doc processor state:
not_owner
. Docs in this state still track all the doc updates, but otherwise stay inactive: they don't run jobs and they don't update replication documents on errors. Only if the node becomes the owner, it switches the record toinitializing
state and the job starts to run.To avoid unnecessary duplicate job dueling, while other nodes still haven't run their membership checks, the usurped jobs are stopped immediately during a membership check, but those that should run are started with a delay. That means that jobs which migrates, may stop running for a minute or so, and then should start running on another node.
Additionally, can remove couch_replicator_db_changes module which used to monitor cluster stability and restart the scanner on node connects and disconnect (membership checks happen in doc processor). The only thing left from
couch_replicator_db_changes
was thecouch_multidb
subsripttion which was moved to the scanner function in the doc_processor.Since we removed couch_replicator_db_changes we also don't need the cluster stability monitoring so that module is also removed. The owner function is moved to the utils module and its
unstable
return value eliminated. The membership check in the doc processor will stop jobs immediately but delay their startup so we get the same effect as before with the stability check.Previously the couch_replicator_clustering module also applied an initial startup delay to allow the application to start properly before starting to scan for jobs so to retain compatibility that setting was moved to doc processor.
The new behavior should be covered by tests in the doc processor. Since the tests were updated, took the opportunity to switch the doc processor tests suites to use
TDEF_FE
macro, which removes few redundant setup lines (begin/end ...).