-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast-forward replication through transitive checkpoint analysis #3675
Comments
I think it might be doable if we record a few more bits in the checkpoint documents and change their shape a bit.
[1] example checkpoint: {
"_id": "_local/d99e532c1129e9cacbf7ed085deca509",
"_rev": "0-17",
"history": [
{
"doc_write_failures": 0,
"docs_read": 249,
"docs_written": 249,
"end_last_seq": "249-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE5tygQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSeVgk5JMkkxNkg0xdWUBAJ5nJWc",
"end_time": "Wed, 21 Jul 2021 17:10:06 GMT",
"missing_checked": 253,
"missing_found": 249,
"recorded_seq": "249-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE5tygQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSeVgk5JMkkxNkg0xdWUBAJ5nJWc",
"session_id": "dc645ae85a7c3fe6c3ac5da8e73077ce",
"start_last_seq": "228-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE0tzgQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSflgk5JMkkxNkg0xdWUBAJgFJVI",
"start_time": "Wed, 21 Jul 2021 17:01:21 GMT"
},
...
],
"replication_id_version": 4,
"session_id": "dc645ae85a7c3fe6c3ac5da8e73077ce",
"source_last_seq": "249-g1AAAACTeJzLYWBgYMpgTmHgz8tPSTV0MDQy1zMAQsMckEQiQ1L9____szKYE5tygQLsZiaGqWlpxpjKcRqRxwIkGRqA1H-oSeVgk5JMkkxNkg0xdWUBAJ5nJWc"
} [2] Unique, per db-instance UUID on http $DB/mydb1
{
...
"instance_start_time": "0",
"sizes": {
"external": 34,
"views": 0
},
"update_seq": "00000008d5c93d5a00000000",
"uuid": "ce0279e40045b4f7cd6cd4f60ffd3b3c"
} |
Summary
I'd like to be able to choose the starting sequence for a replication between a given source and target using more information than just the replication history between those two databases. Specifically, I'd like to be able to use other replication checkpoint histories to discover transitive relationships that could be used to accelerate the first replication between CouchDB databases that share a common peer.
Desired Behaviour
It might be simplest to provide an example. Consider a system where you have a pair of cloud sites (call them
us-east
andus-west
) and a series of edge locations (e.g.store1
):us-east
andus-west
are replicating with each otherstore1
is pulling data fromus-east
us-east
experiences an outage, so we respond by initiatingus-west
->store1
In the current version of CouchDB, the
us-west
->store1
replication will start from 0 because those peers have no replication history between them. Going forward, it would be useful for us to recognize thatus-west
->us-east
has a history, andus-east
->store1
has a history, so we can fast-forwardus-west
->store1
by analyzing the pair of those checkpoint histories to discover the maximum sequence onus-west
guaranteed to have been observed onstore1
(by way ofus-east
).Possible Solution
I believe we actually already employ this transitive analysis for fast-forwarding internal replications between shard copies in a cluster, so we may be able to refactor some of that code to apply it more generally.
I'm not sure if we track the target sequence in the current external replication checkpoint schema. That's essential for this analysis to work.
There's nothing fundamental that limits the analysis to first-order transitive relationships. One could build out an entire graph. I'm not sure the extra complexity that would bring is worth it in a first pass.
Additional context
Proposing this enhancement after chatting with a user who is planning this kind of deployment and would benefit from the enhancement.
The text was updated successfully, but these errors were encountered: