Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recovering databases with purges not replicated after nodes in maintenance_mode #4844

Open
Bolebo opened this issue Nov 13, 2023 · 6 comments

Comments

@Bolebo
Copy link

Bolebo commented Nov 13, 2023

Description

I have a cluster with 6 nodes, with big databases (> 220M documents).
Each database lives with many creations, updates and purges (no deletion). Exceptionnaly, I set my nodes in maintenance_mode for a long time (16h) and discovered that purges were not replicated in nodes with maintenance_mode to true.
After a quick search, I found this issue (#2139) which correspond to my anomaly.

I've upgraded a test environment to v3.3.2 and the anomaly is well corrected for new purges. But is there a way to recover old purges and how should I do it ?

Thanks for your support.

Steps to Reproduce

  • Put a node in maintenance mode
  • Purge a document
  • Put back the node from maintenance mode
  • Migrate to 3.3.2
  • The purge is still not replicated

Expected Behaviour

Documents purged are purged in every node.

Your Environment

  • CouchDB version used: 3.2.3
  • Browser name and version: not relevant
  • Operating system and version: Ubuntu 18.04

Additional Context

@nickva
Copy link
Contributor

nickva commented Nov 13, 2023

Thanks for reaching out @Bolebo.

The issue I think is that the internal replicator had already checkpointed that it has synchronized the shards and so won't re-apply the changes.

One way to force it to go through all changes could be to delete the internal replicator checkpoints and then force a resync. Your database is quite large so may take a while to resync. Before attempting, try it on a small db if you have any with the same issue and perhaps make a backup. At least, make sure the db is not accessed or serving traffic during the time resyncing take place (most changes except purge will be there, so it's mostly just the time it take so run through the changes feed and perform calls to the target to compute revision differences).

Some details on how to do it:

  • Internal replication checkpoints are saved on each shard as it's replicating that shard to it's other (usually 2) copies.

  • Access individual db shards via the node local API: /_node/$node or (/_node/_local which just accesses it on the node which serves the request):

Example:

% http $DB/_node/_local/_all_dbs
HTTP/1.1 200 OK

[
    "_dbs",
    "_nodes",
    "_users",
    "shards/00000000-7fffffff/_replicator.1699890258",
    "shards/00000000-7fffffff/_users.1699890258",
    "shards/00000000-7fffffff/mydb.1699890412",
    "shards/80000000-ffffffff/_replicator.1699890258",
    "shards/80000000-ffffffff/_users.1699890258",
    "shards/80000000-ffffffff/mydb.1699890412"
]

This shows mydb is a Q=2 db and has 2 shards on this node. There are 3 total nodes, each with 2 shards, for a total of 6. Your database hopefully has a large Q value and may have other N value.

  • Internal replicator checkpoints are _local documents in each individual shard:

For instance:

 % http $DB/_node/_local/shards%2f00000000-7fffffff%2fmydb.1699890412/_local_docs
HTTP/1.1 200 OK

{
    "offset": null,
    "rows": [
        {
            "id": "_local/shard-sync-NLHBA2jfFMselm5Rg2Pw-g-eyIod-CGU1zju2fkN_sj9g",
            "key": "_local/shard-sync-NLHBA2jfFMselm5Rg2Pw-g-eyIod-CGU1zju2fkN_sj9g",
            "value": {
                "rev": "0-1"
            }
        },
        {
            "id": "_local/shard-sync-NLHBA2jfFMselm5Rg2Pw-g-scr4eZmJPaM_c5S5WoCs0w",
            "key": "_local/shard-sync-NLHBA2jfFMselm5Rg2Pw-g-scr4eZmJPaM_c5S5WoCs0w",
            "value": {
                "rev": "0-1"
            }
        },
        {
            "id": "_local/shard-sync-eyIod-CGU1zju2fkN_sj9g-NLHBA2jfFMselm5Rg2Pw-g",
            "key": "_local/shard-sync-eyIod-CGU1zju2fkN_sj9g-NLHBA2jfFMselm5Rg2Pw-g",
            "value": {
                "rev": "0-1"
            }
        },
        {
            "id": "_local/shard-sync-scr4eZmJPaM_c5S5WoCs0w-NLHBA2jfFMselm5Rg2Pw-g",
            "key": "_local/shard-sync-scr4eZmJPaM_c5S5WoCs0w-NLHBA2jfFMselm5Rg2Pw-g",
            "value": {
                "rev": "0-1"
            }
        }
    ],
    "total_rows": null
}

It's a Q=2, N=3 db. So this shard has 2 copies on other nodes. There is a checkpoint for replication to and from each copy for a total of 4 checkpoints in this shard. Each shard copy should have 4 checkpoint local docs. Note: these local checkpoint docs can only be reliably accessed via the node local API and are accessible via the regular clustered API (mydb/_local_docs).

  • To delete the local document issue a doc delete request.
% http delete $DB/_node/_local/shards%2f00000000-7fffffff%2fmydb.1699890412/_local/shard-sync-scr4eZmJPaM_c5S5WoCs0w-NLHBA2jfFMselm5Rg2Pw-g
HTTP/1.1 200 OK

{
    "id": "_local/shard-sync-scr4eZmJPaM_c5S5WoCs0w-NLHBA2jfFMselm5Rg2Pw-g",
    "ok": true,
    "rev": "0-0"
}

Note: I had to url escape / as %2f in the db name and since these are _local docs no need to specify a revision.

You'd do this on every node, on every shard copy. Also, ensure there aren't any requests or writes that time to the database.

  • When done, force the db shards to resync with a POST request to mydb/_sync_shards:

Example:

http post $DB/mydb/_sync_shards
HTTP/1.1 202 Accepted

{
    "ok": true
}

This should hopefully force all your purges to replicate between your nodes.

@Bolebo
Copy link
Author

Bolebo commented Nov 14, 2023

First of all, thank you for your quick answer. As it is a production issue, it is very appreciated !

I tried your solution on a test environment, but it doesn't work as expected.
I managed to identify corrupted shards and I worked on only one to save some time (my database is q=128 with 220M docs).
Here was the state (number of documents in each shard) before taking action:

Shard shards%2Fdc000000-ddffffff%2Factivity-logs-v2.1696865462 is corrupted:

{
  "couchdb@Vm-TIS-Pxp-couchdb0": 1804869,
  "couchdb@Vm-TIS-Pxp-couchdb1": 1804869,
  "couchdb@Vm-TIS-Pxp-couchdb5": 1807865
}

Clearly, the node couchdb5 was the one that "missed" 2996 purges during maintenance mode.

After applying your method I came with this:

"shards%2Fdc000000-ddffffff%2Factivity-logs-v2.1696865462": {
    "couchdb@Vm-TIS-Pxp-couchdb0": 1807865,
    "couchdb@Vm-TIS-Pxp-couchdb1": 1807865,
    "couchdb@Vm-TIS-Pxp-couchdb5": 1807865
  }

The node is coherent but purged documents have been reintegrated into the database (I expected 1804869 docs in each node).

Perhaps I did something wrong.
For example, I've notived that I have other local docs on shards such as:

  • _local/purge-mem3-* documents
  • _local/purge-mrview-* documents
    Should I remove them too ?

If you have another nice idea, it is very welcomed.

@nickva
Copy link
Contributor

nickva commented Nov 15, 2023

Thanks for trying @Bolebo. Sorry it didn't work.

Looking at the difference in the number of docs and that there are more than 1000 and I wonder if the purges have been already been removed by compaction. We only keep up to purged_infos_limit = 1000 purge requests https://docs.couchdb.org/en/stable/api/database/misc.html#put--db-_purged_infos_limit purges on a node. If at some point we didn't replicate them, when the nodes were in maintenance, compaction would have run on nodes and would have removed those most of purge requests leaving only 1000 of them.

In main (pending 3.4 release) we'll have a new API endpoint _purged_infos to fetch and inspect current purge infos on a cluster, but it's not there in any release yet. #4703

For example, I've noticed that I have other local docs on shards such as: _local/purge-mem3-* documents

That could be worth trying! Perhaps try it on a test instance. So, delete the _local/purge-mem3-* documents on all copies of a particular shard range, and then force another shard sync with with a POST to _sync_shards. But if the theory about purged_infos_limit is correct, that may purge only the last 1000 docs only.

If this doesn't work, the simplest approach may just be to re-purge all the docs which should be deleted. It's safe to create purges for doc revisions which are already purged, they'll just be processed by all the views and then during compaction only the last 1000 will remain.

You probably already know this but if you re-purge all the missed docs pay attention to max_document_id_number = 100 and max_revision_number =1000 config values. Those would restrict the max number of documents and max number of revisions per document: https://docs.couchdb.org/en/stable/config/misc.html#purge. So you can raise those value or create a script to post small enough batches.

@Bolebo
Copy link
Author

Bolebo commented Nov 15, 2023

Thank you for your answer.
Despite all tries, all purge instructions seem to be lost.

I have a final try to do:

  • As I know that the problem comes from purge instructions
  • As I know in which shard my problem is (in my example in shard in node 5)

Is there any risk or side effect you know about if I simply physically delete the shard file on node 5 ? I expect it will synchronize with the 2 left nodes, without documents already purged in those 2 remaining nodes.

Thanks in advance for your (final) answer.

@nickva
Copy link
Contributor

nickva commented Nov 15, 2023

Despite all tries, all purge instructions seem to be lost.

Just curious did you try re-purging them. It seems that should have worked? But I can understand that it be tricky to also figure out which ones to re-purge. Internally purge requests get a uuid assigned to them so the new ones even if for the same doc_id and rev will be processed as new requests. But I can see if you know the exact shard with the issue, then your resyncing idea is much easier.

Is there any risk or side effect you know about if I simply physically delete the shard file on node 5 ? I expect it will synchronize with the 2 left nodes, without documents already purged in those 2 remaining nodes.

That should work. To practice try it on a test instance first, just in case. But that's the standard recovery path if a shard copy is lost or corrupted - it should be resynced.

@Bolebo
Copy link
Author

Bolebo commented Nov 16, 2023

Unfortunatelly, I can't re-purge them because I basically don't know which documents are purged (this is triggered by endusers).
I will then remove "corrupted" shards, even if it's not obvious for each database concerned.

Thank you for your support. It was very usefull to have a way to consult individual shards properties/documents on each node ! To my knowledge, it is not documented but essential for such analysis.

Best regards,
Boris

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants