Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicted shards #4849

Open
nono opened this issue Nov 15, 2023 · 8 comments
Open

Conflicted shards #4849

nono opened this issue Nov 15, 2023 · 8 comments

Comments

@nono
Copy link

nono commented Nov 15, 2023

Description

We have in the CouchDB logs some messages saying:

4 conflicted shards in cluster

We are using many small databases created with a single shard (q=1). We don't know which shards are in this state, nor what we can do about that.

Steps to Reproduce

We don't know how to reproduce.

Expected Behaviour

Well, if CouchDB could avoid to create conflicted shards, it would be nice. At least, some documentation for what to do in that case is expected.

Your Environment

{
  "couchdb": "Welcome",
  "version": "3.2.2",
  "git_sha": "d5b746b7c",
  "uuid": "b3ac19e755fb8720ec0be49b61088909",
  "features": [
    "access-ready",
    "partitioned",
    "pluggable-storage-engines",
    "reshard",
    "scheduler"
  ],
  "vendor": {
    "name": "The Apache Software Foundation"
  }
}
# cat /etc/debian_version
11.7
# cat /etc/apt/sources.list.d/couchdb.list
deb https://apache.jfrog.io/artifactory/couchdb-deb/ bullseye main

Additional Context

@nickva
Copy link
Contributor

nickva commented Nov 21, 2023

These conflicts shards typically are benign. They result from concurrent updates to _dbs. When the same database gets created concurrently and they maybe some network or temporary partition delay. After the _dbs db gets replicated (it replicates in a ring 0 -> 1 ... -> N-1 -> N -> 0). The cluster will use the winning revision of the database.

To mitigate or stop the issue you can find the conflicted dbs and delete the conflicted revision. So if 2-abc... and 2-def are conflicted and 2-def is the actively used shard map, you can delete 2-abc by getting its rev and then issue a delete with rev = 2-abc

@nickva nickva closed this as completed Nov 21, 2023
@nickva nickva reopened this Nov 21, 2023
@nickva
Copy link
Contributor

nickva commented Nov 21, 2023

(sorry closed by accident)

@sblaisot
Copy link
Contributor

sblaisot commented Nov 22, 2023

Thanks for your response @nickva

Finding the conflicted DB seems like a hard task in a million-db cluster with a medium db creation/deletion rate.

Maybe you have some hints on how to find them?

@rnewson
Copy link
Member

rnewson commented Nov 22, 2023

in a remote shell to any node in the cluster run: custodian:report(). and you'll get a list of database names with their conflicted shard count.

@nono
Copy link
Author

nono commented Nov 27, 2023

We have used https://docs.couchdb.org/en/stable/replication/conflicts.html#finding-conflicted-documents-with-mango to find the conflicted databases. One interesting things is that we have conflicts for _replicator and _users.

@nono
Copy link
Author

nono commented Nov 27, 2023

To mitigate or stop the issue you can find the conflicted dbs and delete the conflicted revision. So if 2-abc... and 2-def are conflicted and 2-def is the actively used shard map, you can delete 2-abc by getting its rev and then issue a delete with rev = 2-abc

How can we do that?

_dbs is not a normal database and most requests fail with {"error":"not_found","reason":"Database does not exist."}. For example, curl -s "$COUCH_URL/_dbs/_replicator?meta=true&open_revs=all". Idem for _bulk_docs.

@rnewson
Copy link
Member

rnewson commented Nov 27, 2023

The "4 conflicted shards in cluster" is referring to conflicts within the meta _dbs documents that define where the shards of databases should be, it is not reporting on the conflicted documents within your regular databases.

the custodian:report(). output will tell you which, and you can then use the /_node/_local/_dbs/dbname endpoint to examine the conflicts and decide which branches to delete and which to keep.

These are most likely by concurrent requests to create the same database, which is quite unusual.

@nono
Copy link
Author

nono commented Nov 27, 2023

The "4 conflicted shards in cluster" is referring to conflicts within the meta _dbs documents that define where the shards of databases should be, it is not reporting on the conflicted documents within your regular databases.

Yes, but curl -v -X POST $COUCH_URL/_dbs/_find -d '{"selector": {"_conflicts": { "$exists": true}}, "conflicts": true}' -H "Content-Type: application/json" has found the 4 conflicted meta documents.

you can then use the /_node/_local/_dbs/dbname endpoint to examine the conflicts and decide which branches to delete and which to keep.

Thanks, by using /_node/_local/_dbs/:dbname instead of just /_dbs/:dbname, it works! And same for _bulk_docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants