Enable replicating purge requests between nodes #4064

nickva · 2022-06-13T21:23:26Z

It seems we had forgotten to enable it and in the case a node if off-line when a clustered purge request is issued, and then re-join they won't know to process any purge requests issued to other nodes.

After enabling it @jjrodrig had noticed that sometimes we were getting a duplicate UUID purge error thrown. That's because now there is a chance for a race between the internal replicator and the interactive request, when they both handle the exact same purge request. So, instead of throwing a duplicate UUID request, consider a duplicate purge request a no-op, that's the second commit in the PR.

rel/overlay/etc/default.ini

nickva · 2022-06-14T16:02:58Z

@jjrodrig also thanks for pointing out an issue related to a race between the internal replicator and the interactive requests.

In a test cluster, with this option enable there is a good chance of getting a duplicate UUID badreq error from

couchdb/src/couch/src/couch_db.erl

Lines 406 to 424 in 6f85bac

 IsRepl = lists:member(replicated_changes, Options), 

 if 

 IsRepl -> 

 ok; 

 true -> 

 UUIDs = [UUID || {UUID, _, _} <- UUIDsIdsRevs2], 

 lists:foreach( 

 fun(Resp) -> 

 if 

 Resp == not_found -> 

 ok; 

 true -> 

 Fmt = "Duplicate purge info UIUD: ~s", 

 Reason = io_lib:format(Fmt, [element(2, Resp)]), 

 throw({badreq, Reason}) 

 end 

 end, 

 get_purge_infos(Db, UUIDs) 

 )

Which looks to be a race between the internal replicator and the interactive request workers. If the internal replicator wins, then the interactive request would throw the error.

The fix is probably to ignore the request if it is already present on the node. These UUIDs are generated on the cluster not by the user so if we already saw the UUID it means we've already processed it.

It seems we had forgotten to enable it and in the case a node if offline when a clustered purge request is issued, and then re-join they won't know to process any purge requests issued to other nodes.

davisp

If memory serves, this was a belt and suspenders assertion to prevent us from overwriting a UUID in the purge tree which would have correctness issues. Obviously, given that UUIDs are generated fresh internally in Fabric, the liklihood of having the same UUID used multiple times for different {Id, Revs} purge requests would be nil, however, not checking means we leave open the possibility.

The general proposed filtering approach to just ignore any UUID previously is fine, but I would add the check for matching the {Id, Revs} portion of the request to ensure we don't allow that particular bug (which would likely only ever be triggerable by operator intervention).

nickva · 2022-06-16T21:07:59Z

@davisp good idea to check Id and Revs!

src/couch/src/couch_db.erl

After enabling default purge request replication, it was possible to get a `badreq` error because the internal replicator would race with the interactive worker updates. Since we're dealing with cluster-generated UUIDs if we see the same UUID, DocId, and Revs again, assume it was applied already and ignore it. For belt-and-suspenders protection ensure that revs lists don't have duplicates and are sorted. Do it both in fabric and couch_db in case an operator wants to do local purges. For extra safety also keep sorting the revs in the duplicate check for the rare case that the existing purge btree already contain duplicate revs. Eventually that check could be removed as all the new purge revs lists would be de-duplicated and sorted. As a bonus, if it turns out all the purges were applied simply return and avoid calling the updater process which can be an expensive bottleneck.

davisp

+1

jjrodrig · 2022-06-17T19:12:34Z

Thanks @nickva and @davisp

nickva requested a review from jjrodrig June 13, 2022 22:09

jjrodrig requested changes Jun 14, 2022

View reviewed changes

rel/overlay/etc/default.ini Outdated Show resolved Hide resolved

nickva force-pushed the replicate-purges branch from 87dbdf8 to 70b60fb Compare June 14, 2022 19:50

Enable replicating purge requests between nodes

42d9d73

It seems we had forgotten to enable it and in the case a node if offline when a clustered purge request is issued, and then re-join they won't know to process any purge requests issued to other nodes.

nickva force-pushed the replicate-purges branch from 70b60fb to 291be7e Compare June 14, 2022 19:58

davisp requested changes Jun 16, 2022

View reviewed changes

nickva force-pushed the replicate-purges branch 3 times, most recently from 2f6975d to e32ac2a Compare June 16, 2022 21:36

davisp reviewed Jun 16, 2022

View reviewed changes

src/couch/src/couch_db.erl Outdated Show resolved Hide resolved

nickva force-pushed the replicate-purges branch 2 times, most recently from c7f07a9 to 2e20fe6 Compare June 16, 2022 22:40

nickva force-pushed the replicate-purges branch from 2e20fe6 to 5430517 Compare June 17, 2022 18:07

davisp approved these changes Jun 17, 2022

View reviewed changes

nickva merged commit 63e0983 into main Jun 17, 2022

nickva deleted the replicate-purges branch June 17, 2022 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable replicating purge requests between nodes #4064

Enable replicating purge requests between nodes #4064

nickva commented Jun 13, 2022 •

edited

Loading

nickva commented Jun 14, 2022

davisp left a comment

nickva commented Jun 16, 2022

davisp left a comment

jjrodrig commented Jun 17, 2022

Enable replicating purge requests between nodes #4064

Enable replicating purge requests between nodes #4064

Conversation

nickva commented Jun 13, 2022 • edited Loading

nickva commented Jun 14, 2022

davisp left a comment

Choose a reason for hiding this comment

nickva commented Jun 16, 2022

davisp left a comment

Choose a reason for hiding this comment

jjrodrig commented Jun 17, 2022

nickva commented Jun 13, 2022 •

edited

Loading