Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First class conflict handling #1507

Open
wohali opened this issue Aug 7, 2018 · 0 comments
Open

First class conflict handling #1507

wohali opened this issue Aug 7, 2018 · 0 comments

Comments

@wohali
Copy link
Member

wohali commented Aug 7, 2018

@janl:

As a bit of background, our current revision model is a standard tree. The biggest issue we've seen customers have with using CouchDB "normally" is when they have a work load that generates conflicts with regularity. This ends up creating revision trees with many thousands of revisions with no general bound on growth of the tree. Eventually a single document can take many seconds to minutes to update as the tree has to be read from disk, updated, and then written back to disk. The only effective solution to this currently is to have an operator manually purge revisions from the document.

The best solution I've come across to this issue would be to change our revision model to be a graph. Then instead of resolving conflicts by deleting a revision, the conflict is resolved by making an update that references two or more revisions. In this way a customer that generates a large number of conflicts can easily resolve the situation during their normal conflict resolution process. (Our stemming would change from keeping $revs_limit revisions for each leaf to doing a breadth first search and keeping all revisions for the depth that contains $revs_limit revisions or something.)

While this approach is fairly straight forward in theory, the difficult part is how we'd want to handle backwards compatibility with replication. So far I could see having a replicator that could translate new revision graphs to old revision trees by "undoing" merge changes and created the equivalent of the old "deleted revision" logic. However, I don't see a way that we could go from the old format back to the graph (ie, think about replicating a new style graph doc through an old CouchDB version back to a new graph doc and end up with the same doc).

Obviously, this is a long term goal but I do think we should start thinking about this and possibly making transition plans long term (assuming everyone thinks this is a good idea).

I had a couple questions for good reading material on this sort of stuff so I figured I'd post some links here:

http:https://docs.couchdb.org/en/latest/replication/conflicts.html

This is an intro to our current revision handling.

http:https://stackoverflow.com/questions/2283757/can-someone-explain-in-simple-terms-to-me-what-a-directed-acyclic-graph-is
http:https://ericsink.com/vcbe/html/directed_acyclic_graphs.html
https://en.wikipedia.org/wiki/Directed_acyclic_graph

These are a few links on graphs (specifically directed acyclic graphs which we'd be using). The DAG that everyone is most familiar with is most likely Git's commit history. In general terms our revisions-as-graph would mimic that sort of behavior although be much simpler since there would likely only be two operations for mutation: extend exisitng or merge 2 or more existing. There wouldn't be all of the rebasing type operations or anything like that because we're obviously using the DAG model for two very different applications.

https://en.wikipedia.org/wiki/Breadth-first_search
https://en.wikipedia.org/wiki/Depth-first_search

Heres the two different searches I mentioned. The breadth-first search has a nifty animated gif to show what its doing. The depth first doesn't have one but there are a couple different pictures.

I'd note that the wiki pages tend to be a little dense and can be a bit of a difficult resource for someone trying to learn these concepts from scratch. For anyone completely new to these topics I'd google around for things like the Khan Academy. Pages like this one are a lot easier to digest:

https://www.khanacademy.org/computing/computer-science/algorithms/graph-representation/a/describing-graphs

In general the algorithms we'd need to implement are all fairly basic. We'll have to think about all of the operations we need, but a lot of this stuff can be implemented fairly easily with just a couple hash maps. The only hard part to all of this will be coordinating the global upgrade and maintaining backwards compatibility.

I finally had time to read the JSON CRDT paper. It is certainly very promising, but not production ready yet. The algorithm doesn’t account for a few unbounded lists happening. And it currently skips of required tombstone removal.

The biggest caveat so far is that it requires all replicas to eventually converge towards the same state. While that might be a good enough limitation for many cases, it is less general than what CouchDB supports today. The authors promise continued work and we should keep watching this closely.

oh, and some unaddressed inefficiencies converging long arrays/lists, as the only traversal operation is next() and not indexed access.

What I thought while reading this though is whether it might be feasible to have a rev-tree/graph per key inside a doc, rather than per doc.

@wohali wohali added this to In Discussion in Roadmap Aug 7, 2018
@wohali wohali moved this from Proposed for 3.x to Proposed (release independent) in Roadmap Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
  
Proposed (backlog)
Development

No branches or pull requests

1 participant