Error 500 when creating a db below quorum #603

calonso · 2017-06-20T08:21:22Z

Expected Behavior

On a 4 nodes cluster, having two down, if I try to create a new database the server should either reject the request or accept it returning a 202 status code. Once the down nodes come back, the new db should be replicated to them.

Current Behavior

Currently an error 500 is returned but the db is indeed created (not sure if it replicated when the down nodes came back).

Possible Solution

Return a more friendlier status code as 202 Accepted if the db was actually created or completely reject it with a 412 Precondition Failed instead.

Steps to Reproduce (for bugs)

Setup a 4 nodes cluster
Bring 2 of them down
curl -X PUT "https://xxx.xxx.xxxx.xxx:5984/testdb"
An error 500 is returned.
curl -X GET "https://xxx.xxx.xxx.xxx:5984/_all_dbs" will return testdb

Context

I'm building a CouchDB automatic administration toolset and I was trying out different scenarios to see it working, in this case I was simulating an operation where on a cluster without quorum (4 nodes, 2 of them down), one tries to create a new database.

Your Environment

It is a Kubernetes managed cluster where the Docker image in use is https://hub.docker.com/r/klaemo/couchdb/

Version used: 2.0.0
Link to your project: https://github.com/cabify/couchdb-admin

The text was updated successfully, but these errors were encountered:

wohali · 2017-06-20T10:09:27Z

Hi there, can you please include an excerpt from couch.log showing what happens when the 500 is returned? Also, if possible, can you include the exact message returned to your client along with the 500, if any? Thanks.

calonso · 2017-06-20T13:46:04Z

Sure!!

$ curl -X PUT "https://admin:[email protected]:5984/newtestdb2"
{"error":"error","reason":"internal_server_error"}

and the log, in debug mode

[debug] 2017-06-20T13:42:58.420676Z [email protected] <0.18754.0> ac122995ef cache miss for admin
[debug] 2017-06-20T13:42:58.421663Z [email protected] <0.18754.0> ac122995ef no record of user admin
[notice] 2017-06-20T13:42:58.564874Z [email protected] <0.18754.0> ac122995ef 127.0.0.1:5984 127.0.0.1 undefined PUT /newtestdb2 500 ok 144

eiri · 2017-06-21T16:03:28Z

This one is a bit tricky. We are throwing internal_server_error here when the writers quorum not met, but we can only know that after the fact, i.e. after we created db on all available nodes and didn't get enough replies from the writers.

The database is available after that, we can add docs to it and it'll be replicated to the rest of the nodes when they get up, so strictly speaking this is not a complete failure. But at the same time we want to have some kind of warning that the cluster is unstable, after all, if we'll get available nodes down and previously unavailable nodes up at the same time the database will as if suddenly disappear.

I guess we can add a check for the nodes availability on db creation/deletion before starting operations and throw 412 there if we know we wouldn't have the quorum. It wouldn't solve the issue completely, the nodes could go down during db creation, but at least it'll narrow the confusion gap. @davisp what do you think? Is it worth of try to implement?

davisp · 2017-06-21T18:00:25Z

Assuming the db record gets written on one node then we should return a 202 Accepted as the dbs db will be replicated and at that point we'll have created the database.

ChiragMoradiya · 2018-01-18T13:21:14Z

I also have observed same behavior for my production cluster. Any idea, when it's going to be fixed? Is it targeted for specific release?

ChiragMoradiya · 2018-01-18T13:30:38Z

In addition to this, When I was testing behavior with 3 nodes cluster. Whenever 1 of the node is down then I was able to create database and receiving 200 status code.

But, it sends 500 status code when 2 nodes (out of 3) are down & only 1 is up.

jjrodrig · 2018-01-23T21:03:02Z

I've also noticed this issue during my cluster testing.
I think the problem is here

couchdb/src/fabric/src/fabric_db_create.erl

Line 143 in d16f2db

maybe_stop(W, Counters) ->

Accepted is returned if the number of responses are at least (W div 2 + 1). If you get less than this value an internal server error is returned.

In a three node cluster where the quorum is 2, you need at least 2 responses in order to get accepted as response which is the same number you need for a 200 status code response.

This seems not to be consistent with this description in the documentation "11.2 Theory" which applies to document reads/writes not to db creation but I expected a similar behaviour .

The number of copies of a document with the same revision that have to be read before CouchDB returns with a 200 is equal to a half of total copies of the document plus one. It is the same for the number of nodes that need to save a document before a write is returned with 201. If there are less nodes than that number, then 202 is returned. Both read and write numbers can be specified with a request as r and w parameters accordingly.

W div 2 + 1, seems to be the default quorum required for a 200 status code, not for a 202 status code response

Add degrade-cluster option for cluster testing Add tests for different cluster conditions with/without quorum Add test-cluster-with-quorum and test-cluster-without-quorum tasks

marcinrozanski · 2018-02-16T16:41:30Z

Same question as @ChiragMoradiya : "When will this fix make it to a Debian package?" Asking here as I was not able to find info about CouchDB release cycle.

wohali · 2018-02-16T17:59:25Z

@jjrodrig thanks for the fix, closing this issue.

As to when it will be available in a package, the answer is "with the next release of CouchDB." Our intention is to release CouchDB approximately twice a year.

The last release of CouchDB was November 7, 2017.

dynamite-ready · 2018-12-30T00:35:07Z

I experience the same issue as the OP on a single node, when cluster.n=1 has been set in the config.
I've been trying to post a document to a newly created user db (with the Couch Per User feature).

Typical log output look like this:

[notice] 2018-12-30T00:18:28.028842Z [email protected] <0.8860.0> 4cf3f1a0e6 localhost:5984 127.0.0.1 admin POST /userdb-*** 404 ok 17
[debug] 2018-12-30T00:18:29.031705Z [email protected] <0.8872.0> 8f64a178af no record of user admin
[notice] 2018-12-30T00:18:29.032887Z [email protected] <0.8872.0> 8f64a178af localhost:5984 127.0.0.1 admin POST /userdb-**** 404 ok 2
[debug] 2018-12-30T00:18:31.036855Z [email protected] <0.9002.0> 6745f08438 no record of user admin
[notice] 2018-12-30T00:18:31.038036Z [email protected] <0.9002.0> 6745f08438 localhost:5984 127.0.0.1 admin POST /userdb-**** 404 ok 2
[debug] 2018-12-30T00:18:35.041849Z [email protected] <0.9086.0> c3579d0751 no record of user admin
[notice] 2018-12-30T00:18:35.047631Z [email protected] <0.9086.0> c3579d0751 localhost:5984 127.0.0.1 admin POST /userdb-**** 404 ok 6
[debug] 2018-12-30T00:18:43.054896Z [email protected] <0.9199.0> 6a15101a52 no record of user admin
[notice] 2018-12-30T00:18:43.056778Z [email protected] <0.9199.0> 6a15101a52 localhost:5984 127.0.0.1 admin POST /userdb-**** 404 ok 3

Where posts to /userdb-**** are all the same recently created DB. The 5 calls are made rough 4 seconds apart, in a 20 second interval, and by that time, /userdb-**** had definitely been created.

It happens somewhat inconsistently too, which is surprising, because if cluster.n=1, I shouldn't have to worry about the chance of the POST going to a non existent node, isn't that right?

wohali added bug dbcore api labels Jun 23, 2017

janl added the need more info label Oct 1, 2017

jjrodrig pushed a commit to jjrodrig/couchdb that referenced this issue Jan 24, 2018

Fix for issue apache#603 - Error 500 when creating a db below quorum

f5ccd8b

jjrodrig mentioned this issue Jan 25, 2018

Fix for issue #603 - Error 500 when creating a db below quorum #1127

Merged

3 tasks

jjrodrig mentioned this issue Jan 31, 2018

Error 500 when deleting a db in a cluster without quorum #1136

Closed

wohali closed this as completed Feb 16, 2018

worldspawn mentioned this issue Jun 4, 2018

Unattended installation appears to ignore nodename setting apache/couchdb-pkg#21

Closed

iilyak mentioned this issue Nov 26, 2018

Add new return codes for database creation/deletion apache/couchdb-documentation#360

Merged

3 tasks

dynamite-ready mentioned this issue Jan 2, 2019

Occassionally unable to access Couch Per User databases, after they're created #1840

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error 500 when creating a db below quorum #603

Error 500 when creating a db below quorum #603

calonso commented Jun 20, 2017

wohali commented Jun 20, 2017

calonso commented Jun 20, 2017

eiri commented Jun 21, 2017

davisp commented Jun 21, 2017

ChiragMoradiya commented Jan 18, 2018

ChiragMoradiya commented Jan 18, 2018

jjrodrig commented Jan 23, 2018

marcinrozanski commented Feb 16, 2018

wohali commented Feb 16, 2018

dynamite-ready commented Dec 30, 2018 •

edited

Loading

Error 500 when creating a db below quorum #603

Error 500 when creating a db below quorum #603

Comments

calonso commented Jun 20, 2017

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

wohali commented Jun 20, 2017

calonso commented Jun 20, 2017

eiri commented Jun 21, 2017

davisp commented Jun 21, 2017

ChiragMoradiya commented Jan 18, 2018

ChiragMoradiya commented Jan 18, 2018

jjrodrig commented Jan 23, 2018

marcinrozanski commented Feb 16, 2018

wohali commented Feb 16, 2018

dynamite-ready commented Dec 30, 2018 • edited Loading

dynamite-ready commented Dec 30, 2018 •

edited

Loading