Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In 3 node cluster (with q=1), when any one node fails and is re-added as blank, _security object is reset to default #3696

Open
ChetanGoti opened this issue Aug 6, 2021 · 0 comments

Comments

@ChetanGoti
Copy link

Your Environment

{
  "couchdb": "Welcome",
  "version": "3.1.1",
  "git_sha": "ce596c65d",
  "uuid": "8d406054df5edac06ee4906f3259e62f",
  "features": [
    "access-ready",
    "partitioned",
    "pluggable-storage-engines",
    "reshard",
    "scheduler"
  ],
  "vendor": {
    "name": "The Apache Software Foundation"
  }
}

Description

I have a 3 node couchdb 3.1.1 cluster with following configuration:

[cluster]
q=1
n=2

There is a non-partitioned database named test2, whose shards resides on node1 and node2. test2 database has following cluster settings:

"cluster":{"q":1,"n":2,"w":2,"r":2}

test2 database has few documents and it's _security is:

{"admins":{"names":["superuser"],"roles":["admins"]},"members":{"names":["user1","user2"],"roles":["developers"]}}

I'm now running a scenario where any of the node's disk crashes. Let's say node2's disk crashes.

I have performed following steps:

  • Remove node2 from the cluster
  • Replace node2 disk with a new blank disk
  • Start node2
  • Add node2 into cluster
  • It would resync the shards as its blank
  • After a while, resync is completed

At this stage, test2 database's shards file(test2.1628258896.couch) can be seen on node2.

1: Now, when I retrieve _security for test2 database from node2, it is reset to the default like:

{"members":{"roles":["_admin"]},"admins":{"roles":["_admin"]}}

If I retrieve _security from node1 or node3, it responds with the correct: (which I set earlier before node2 crash)

{"admins":{"names":["superuser"],"roles":["admins"]},"members":{"names":["user1","user2"],"roles":["developers"]}}

2: When I restart all nodes, there are following error logs shown:

node2 | [error] 2021-08-06T12:59:21.842335Z couchdb@node2 <0.4465.0> -------- Bad security object in <<"test2">>: [{{[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},1},{{[{<<"admins">>,{[{<<"names">>,[<<"superuser">>]},{<<"roles">>,[<<"admins">>]}]}},{<<"members">>,{[{<<"names">>,[<<"user1">>,<<"user2">>]},{<<"roles">>,[<<"developers">>]}]}}]},1}]

3: Not relevant to crash, but relevant to same _security thing.

When 1 node is down, PUT _security fails with: {"error":"error","reason":"no_majority"}

  • node2 is down
  • Create a new DB test3, whose shards would reside on node1 and node2
  • PUT _security for test3, it would fail with:
{"error":"error","reason":"no_majority"}

Other observations

  • _sync_shards throws same Bad security object error
  • It works for q=2
  • It fixes the issue when I updated _security of test2 DB again

_sync_shards logs

node1 | [notice] 2021-08-06T13:38:55.064067Z couchdb@node1 <0.10379.1> c89bd8ac41 localhost:5984 172.20.0.1 admin POST /test2/_sync_shards 202 ok 2
node2 | [error] 2021-08-06T13:38:55.080064Z couchdb@node2 <0.7232.0> -------- Bad security object in <<"test2">>: [{{[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},1},{{[{<<"admins">>,{[{<<"names">>,[<<"superuser">>]},{<<"roles">>,[<<"admins">>]}]}},{<<"members">>,{[{<<"names">>,[<<"user1">>,<<"user2">>]},{<<"roles">>,[<<"developers">>]}]}}]},1}]
node1 | [error] 2021-08-06T13:38:55.080503Z couchdb@node1 <0.10417.1> -------- Bad security object in <<"test2">>: [{{[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},1},{{[{<<"admins">>,{[{<<"names">>,[<<"superuser">>]},{<<"roles">>,[<<"admins">>]}]}},{<<"members">>,{[{<<"names">>,[<<"user1">>,<<"user2">>]},{<<"roles">>,[<<"developers">>]}]}}]},1}]

This was discussed with @janl and @rnewson on Slack at https://couchdb.slack.com/archives/C49LEE7NW/p1628257123045300 which can be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant