-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error 500 when creating a db below quorum #603
Comments
Hi there, can you please include an excerpt from |
Sure!!
and the log, in debug mode
|
This one is a bit tricky. We are throwing The database is available after that, we can add docs to it and it'll be replicated to the rest of the nodes when they get up, so strictly speaking this is not a complete failure. But at the same time we want to have some kind of warning that the cluster is unstable, after all, if we'll get available nodes down and previously unavailable nodes up at the same time the database will as if suddenly disappear. I guess we can add a check for the nodes availability on db creation/deletion before starting operations and throw 412 there if we know we wouldn't have the quorum. It wouldn't solve the issue completely, the nodes could go down during db creation, but at least it'll narrow the confusion gap. @davisp what do you think? Is it worth of try to implement? |
Assuming the db record gets written on one node then we should return a 202 Accepted as the dbs db will be replicated and at that point we'll have created the database. |
I also have observed same behavior for my production cluster. Any idea, when it's going to be fixed? Is it targeted for specific release? |
In addition to this, When I was testing behavior with 3 nodes cluster. Whenever 1 of the node is down then I was able to create database and receiving 200 status code. But, it sends 500 status code when 2 nodes (out of 3) are down & only 1 is up. |
I've also noticed this issue during my cluster testing. couchdb/src/fabric/src/fabric_db_create.erl Line 143 in d16f2db
Accepted is returned if the number of responses are at least (W div 2 + 1). If you get less than this value an internal server error is returned. In a three node cluster where the quorum is 2, you need at least 2 responses in order to get accepted as response which is the same number you need for a 200 status code response. This seems not to be consistent with this description in the documentation "11.2 Theory" which applies to document reads/writes not to db creation but I expected a similar behaviour .
W div 2 + 1, seems to be the default quorum required for a 200 status code, not for a 202 status code response |
Add degrade-cluster option for cluster testing Add tests for different cluster conditions with/without quorum Add test-cluster-with-quorum and test-cluster-without-quorum tasks
Add degrade-cluster option for cluster testing Add tests for different cluster conditions with/without quorum Add test-cluster-with-quorum and test-cluster-without-quorum tasks
Same question as @ChiragMoradiya : "When will this fix make it to a Debian package?" Asking here as I was not able to find info about CouchDB release cycle. |
@jjrodrig thanks for the fix, closing this issue. As to when it will be available in a package, the answer is "with the next release of CouchDB." Our intention is to release CouchDB approximately twice a year. The last release of CouchDB was November 7, 2017. |
I experience the same issue as the OP on a single node, when Typical log output look like this:
Where posts to It happens somewhat inconsistently too, which is surprising, because if |
Expected Behavior
On a 4 nodes cluster, having two down, if I try to create a new database the server should either reject the request or accept it returning a 202 status code. Once the down nodes come back, the new db should be replicated to them.
Current Behavior
Currently an error 500 is returned but the db is indeed created (not sure if it replicated when the down nodes came back).
Possible Solution
Return a more friendlier status code as
202 Accepted
if the db was actually created or completely reject it with a412 Precondition Failed
instead.Steps to Reproduce (for bugs)
curl -X PUT "https://xxx.xxx.xxxx.xxx:5984/testdb"
curl -X GET "https://xxx.xxx.xxx.xxx:5984/_all_dbs"
will returntestdb
Context
I'm building a CouchDB automatic administration toolset and I was trying out different scenarios to see it working, in this case I was simulating an operation where on a cluster without quorum (4 nodes, 2 of them down), one tries to create a new database.
Your Environment
It is a Kubernetes managed cluster where the Docker image in use is https://hub.docker.com/r/klaemo/couchdb/
The text was updated successfully, but these errors were encountered: