You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a bit of background, our current revision model is a standard tree. The biggest issue we've seen customers have with using CouchDB "normally" is when they have a work load that generates conflicts with regularity. This ends up creating revision trees with many thousands of revisions with no general bound on growth of the tree. Eventually a single document can take many seconds to minutes to update as the tree has to be read from disk, updated, and then written back to disk. The only effective solution to this currently is to have an operator manually purge revisions from the document.
The best solution I've come across to this issue would be to change our revision model to be a graph. Then instead of resolving conflicts by deleting a revision, the conflict is resolved by making an update that references two or more revisions. In this way a customer that generates a large number of conflicts can easily resolve the situation during their normal conflict resolution process. (Our stemming would change from keeping $revs_limit revisions for each leaf to doing a breadth first search and keeping all revisions for the depth that contains $revs_limit revisions or something.)
While this approach is fairly straight forward in theory, the difficult part is how we'd want to handle backwards compatibility with replication. So far I could see having a replicator that could translate new revision graphs to old revision trees by "undoing" merge changes and created the equivalent of the old "deleted revision" logic. However, I don't see a way that we could go from the old format back to the graph (ie, think about replicating a new style graph doc through an old CouchDB version back to a new graph doc and end up with the same doc).
Obviously, this is a long term goal but I do think we should start thinking about this and possibly making transition plans long term (assuming everyone thinks this is a good idea).
I had a couple questions for good reading material on this sort of stuff so I figured I'd post some links here:
These are a few links on graphs (specifically directed acyclic graphs which we'd be using). The DAG that everyone is most familiar with is most likely Git's commit history. In general terms our revisions-as-graph would mimic that sort of behavior although be much simpler since there would likely only be two operations for mutation: extend exisitng or merge 2 or more existing. There wouldn't be all of the rebasing type operations or anything like that because we're obviously using the DAG model for two very different applications.
Heres the two different searches I mentioned. The breadth-first search has a nifty animated gif to show what its doing. The depth first doesn't have one but there are a couple different pictures.
I'd note that the wiki pages tend to be a little dense and can be a bit of a difficult resource for someone trying to learn these concepts from scratch. For anyone completely new to these topics I'd google around for things like the Khan Academy. Pages like this one are a lot easier to digest:
In general the algorithms we'd need to implement are all fairly basic. We'll have to think about all of the operations we need, but a lot of this stuff can be implemented fairly easily with just a couple hash maps. The only hard part to all of this will be coordinating the global upgrade and maintaining backwards compatibility.
I finally had time to read the JSON CRDT paper. It is certainly very promising, but not production ready yet. The algorithm doesn’t account for a few unbounded lists happening. And it currently skips of required tombstone removal.
The biggest caveat so far is that it requires all replicas to eventually converge towards the same state. While that might be a good enough limitation for many cases, it is less general than what CouchDB supports today. The authors promise continued work and we should keep watching this closely.
oh, and some unaddressed inefficiencies converging long arrays/lists, as the only traversal operation is next() and not indexed access.
What I thought while reading this though is whether it might be feasible to have a rev-tree/graph per key inside a doc, rather than per doc.
The text was updated successfully, but these errors were encountered:
@janl:
The text was updated successfully, but these errors were encountered: