Make multidb changes shard map aware #4962
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
couch_multidb_changes module is monitoring shards whose names match a particular suffix, and notifies users with found, updated and deleted events. This is the module which drives replicator jobs when
*/_replicator
databases are updated.Previously, couch_multidb_changes reacted only to node-local shard file events and was not aware of the shard map membership of those files. This was mostly evident during shard moves: the target shard could be created long before the
shard file becomes a part of the shard map. The replicator could notice the new target shard file and spawn a replication job on the new node, but keep the same replication job running on the source node. The two replication jobs will eventually conflict in the PG system (https://www.erlang.org/doc/man/pg.html) and one of them would start crashing with a "duplicate job" error. This could last days depending on how long it would take to populate the data on the target. Even after recovery, the target shard could be backed-off up to another extra 8 hours until it may run again.
To avoid issues like that, make couch_multidb_changes aware of shard map membership updates. When a shard file is discovered, and it is not in the shard map, mark it with a
wait_shard_map = true
flag. Then, re-use the existing db event monitoring mechanism to notice when shards db is updated, and schedule a delayed membership check for the shards tracked in our ETS table.Other changes to the module are mostly cosmetic:
Remove the unused
created
callback.db_found
is used instead, both when dbs are created, and during startup when they are discovered.In the ETS table use a proper
#row{}
record since we now have 5 items in the tuple. This simplifies some of the existing code as well.During deletion and creation, actually delete the entries from the ETS table. Previously we didn't do it so the would hang around forever until the node was restarted.
Add comments to a few tricky sections explaining what should be happening there.
Add more tests, both the old and new functionality. Increase coverage from 96% to 98%.