-
Notifications
You must be signed in to change notification settings - Fork 235
rfc(per-doc-access): first draft #424
Changes from 1 commit
e5a54a2
0e12884
1077dbc
a08438f
b1d6a59
2b8f941
748828d
e52b3da
a0cfaea
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,397 @@ | ||
--- | ||
name: Per-Document Access Control | ||
about: Make the db-per-user pattern obsolete. | ||
title: 'Per-Document Access Control' | ||
labels: rfc, discussion, access control, security | ||
assignees: '@janl' | ||
|
||
--- | ||
|
||
# Introduction | ||
|
||
Up until now (version 2.3.1), CouchDB could not serve mutually | ||
untrusting users accessing the same database. If a user has access to | ||
one document in a database, they have access to all other documents in | ||
the database. Some restrictions can be added about writing documents | ||
(designs docs are db-admin only, validate doc update (VDU) functions | ||
could restrict write access based on the writing user and/or the target | ||
document). For the remainder of this document, “db-admin” SHALL include | ||
server admins as well. | ||
|
||
## Abstract | ||
|
||
This lead to CouchDB developers making use of a pattern called | ||
janl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
db-per-user, where all documents belonging to one user are kept in a | ||
separate database. This is a decent enough workaround, but has the | ||
following downsides: | ||
|
||
- queries across all databases are not possible. An additional | ||
workaround exists where all per-user databases are replicated | ||
continuously into a central, admin-only database that can be used for | ||
querying the entire data set, but that adds latency and uses | ||
significant CPU resources. Successful systems have been built where | ||
increased latency could be traded for fewer CPU resources, but | ||
overall, this is not an optimal design. | ||
|
||
- handling many small databases, say >10000 (depending on hardware) can | ||
become a challenge, if most of them are active concurrently. It | ||
forces dbs to be set to `q=1`, migrating off `q!=1` requires | ||
downtime, 10k bidirectional replications are going to need A LOT of | ||
CPU and RAM. sharing documents among two or more users requires the | ||
creation of yet more databases. | ||
|
||
Per-user document access aims to solve many of the above problems. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this paragraph and the goals are probably all thats needed for an abstract. |
||
Predominantly, that multiple users can use a single database without | ||
being able to see each other’s documents. A first iteration is not | ||
going to solve sharing of documents across multiple users and/or groups. | ||
|
||
Goals for this iteration of this feature: | ||
|
||
* allow developers to build apps wihtout having to resort to using the | ||
db-per-user pattern. Specifically PouchDB applications and CouchDB | ||
setups with a central server/cluster and many independent satellite | ||
installations with replication should be supported. | ||
|
||
Non-goals for now: | ||
|
||
* per-access views | ||
* differentiation between read and write access for documents | ||
* sharing infividual documents between multitple users or groups. | ||
|
||
However, the design of this iteration aims to allow turning these | ||
non-goals into actual goals later. | ||
|
||
## Dramatis Personae | ||
|
||
*user*: a CouchDB-user, a record defined in the _users db identified by | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you escape the |
||
a username and password, has associated roles. | ||
|
||
*developer*: creator of an application built on top of CouchDB | ||
|
||
## Requirements Language | ||
|
||
[NOTE]: # ( Do not alter the section below. Follow its instructions. ) | ||
|
||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | ||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | ||
document are to be interpreted as described in | ||
[RFC 2119](https://www.rfc-editor.org/rfc/rfc2119.txt). | ||
|
||
--- | ||
|
||
# Detailed Description | ||
|
||
You will be able to create databases with the “access” feature enabled | ||
via an option passed at database creation time. If you create a | ||
database without that option, it works like any database in CouchDB | ||
today. | ||
|
||
This is how you create an access-enabled database: | ||
|
||
``` | ||
PUT /database?access=true | ||
``` | ||
|
||
This option can be set only at database creation time, it can’t be | ||
turned off and on while the database exists. | ||
|
||
An access-enabled database behaves like this: | ||
|
||
* only admin users can read or write to the database (as per 3.x | ||
defaults) | ||
|
||
* admins can grant individual users and groups access to a database | ||
using the database’s `_security` object. A special new role `_users` | ||
can be used to say “all users defined in the `_users` database”. | ||
|
||
* documents created without an `_access` field are accessible to | ||
db-admins only | ||
|
||
* this allows existing databases to be replicated into an | ||
access-enabled database, but granting access of individual docs to | ||
specific users needs to be an explicit step handled by developers. | ||
|
||
* documents created with an `_access` field are only accessible by | ||
admins and the user named inside `_access`. | ||
|
||
* `_access: ["shirley"]` | ||
|
||
* later iterations of this could allow for `["shirley"]` being | ||
shorthand for `[{"read": "shirley", "write": "shirley"}]` for | ||
more fine-grained access control, but that is out of scope for | ||
this RFC. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A change of this nature leads to pretty gnarly code in consumers of this information, as the type of the data in As a suggestion: "_access": [{"shirly": "readwrite"}] Version two might extend this to allow: "_access": [{"shirly": "write"}] And version 3 (later you note that v1 doesn't support multiple users in "_access": [{"shirly": "write"}, {"mike": "read"}] While more verbose, it feels like the most future proof way of expressing this is as follows, which allows us to fill in extra fields later if needed: "_access": [{"user": "shirly", "access": "readwrite"}, {"user": "mike", "access": "read"}] For example, adding an ability via an extra field to allow a given user to add and remove users to |
||
|
||
* users can only create documents with their own username in | ||
`_access`. | ||
|
||
* admins can add any users to `_access` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "user" perhaps for this RFC iteration, as later it's state multiple users are a future thing. |
||
|
||
* documents can only be owned by one user at any one point. | ||
|
||
* in a 2.0 > X < 4.0 cluster, two different users could create | ||
the same document with a different _access definition | ||
concurrently and both get successful write responses back. As | ||
with _users documents in conflict, if a document has a conflict | ||
with separate _access entries, it becomes admin-only by | ||
default. This case needs to be handled by an applications | ||
_conflict handler. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've gone from the concept of "able to read and write the document" to a concept of "owning" a document, which is a very different thing (one might imagine that an "owner" can grant "read/write" to a "non-owner", where "owner" is really conveying something like "admin of this document"). Here, rather than owner, I think right now we mean "documents can only have one or zero entries in It's hard for me to see here what the behaviour of |
||
|
||
* document _ids are shared across all users. So only the first user who | ||
creates the doc `_id: config` gets it. Applications need to ensure to | ||
work around this and potentially prefix docs with the username before | ||
writing/replicating them in. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This feels like a more complex security issue - the suggestion that applications work around this feels a little weak given that the user themselves are able to pollute the global document space via direct API access? I guess here our suggestion is either "use a separate database for system docs" or "use a VDU to prevent system docs with |
||
|
||
* _security members are allowed to write design docs, but the have to | ||
janl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
have an `_access` field and those design docs with an `_access` field | ||
are ignored on the server side. Db-admin ddocs get indexes built as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it might be clearer if you write this as: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I must have missed this one, I have it that way further down |
||
normal. | ||
|
||
* you can’t access their views, no view indexes are built, their | ||
validate_doc_update functions do not run on db inserts. | ||
|
||
* this allows full pouchdb / satellite db replication, but avoids | ||
problems with having 10000s of VDUs or 10000s of view indexes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest that using the Instead, this "don't build this on the server" feels like a new API concept to support the mobile user case of "spoke ddocs" which I'd feel more confident drawing out as just that via either specific flags on the ddoc or even new API paths for the ddocs ( |
||
|
||
* users can not remove themselves from `_access`, nor can they remove | ||
the `_access` property. They can only `DELETE` a doc. | ||
|
||
* If an existing doc changes the user mentioned in `_access` or an admin | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens for a user that previously had access to the document? Is there a way to notify them that they have lost access? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not as per this RFC, but we could consider a mode to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that needs to be solved on CouchDB level. In other systems, if my access gets revoked, I don't necessarily get notified either. It would be easy enough to solve on an application level There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The main thing here is that notifying the users device (say) allows the application to remove the document from the users device as a UX convenience (clearly there is no security benefit to this as users could have other copies). |
||
user adds a non-admin user after updating the document a couple of | ||
times, that new user will gain access to the full history of the | ||
document. | ||
|
||
* if compaction hasn’t run yet, they get access to all previous | ||
revision bodies that still exist. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the Perhaps we are imaging that applications may typically deal with this by creating a document copy with the altered |
||
|
||
* all conflicted versions will also be visible to the new user | ||
janl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
* regardless of compaction, they get access to the full list of | ||
revision ids for the document. Extremely crafty people could try | ||
to create a matching body for a revision they didn’t have access | ||
to by trying to recreate an old hash. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This may be easier than it seems for documents that, say, typically only differ in a a few low-cardinality fields over their lifetime. |
||
|
||
* accessing `_changes` gives users the subset of docs they own in last | ||
updated order | ||
|
||
* gaps in the sequence id would allow folks to deduce how many other | ||
docs have been created/updated/deleted in between two of their | ||
docs. | ||
|
||
* this includes all the user’s docs PLUS all non-`_access` design | ||
janl marked this conversation as resolved.
Show resolved
Hide resolved
|
||
docs, so apps can centrally control design docs going down to | ||
satellites. | ||
|
||
* accessing `_all_docs` gives users the subset of docs they own in `_id` | ||
order. | ||
|
||
* this includes all the user’s docs PLUS all non-`_access` design | ||
docs, so apps can centrally control design docs going down to | ||
satellites. | ||
|
||
* Replication check-points / local docs | ||
|
||
* local docs behave exactly like regular docs in that they have to | ||
include an _access property when being written by a non-admin user. | ||
|
||
* this means that replicator implementations will have to be | ||
amended to include that property in the checkpoint local docs | ||
they write. | ||
|
||
* that `_access` property then will also have to be included in | ||
the replication session id calculation to make sure each user | ||
gets their own replication id | ||
|
||
## Implementation Details | ||
|
||
The main addition is a new native query server called | ||
`couch_access_native_proc`, which implements two new indexes | ||
`by-access-id` and `by-access-seq` which do what you’d expect, pass in | ||
a userCtx and retrieve the equivalent of `_all_docs` or `_changes`, but | ||
only including those docs that match the username and roles in their | ||
`_access` property. The existing handlers for `_all_docs` and | ||
`_changes` have been augmented to use the new indexes instead of the | ||
default ones, unless the user is an admin. | ||
|
||
https://github.com/apache/couchdb/compare/access?expand=1&ws=0#diff-fbb5 | ||
3323f07579be5e46ba63cb6701c4 | ||
|
||
|
||
# Advantages and Disadvantages | ||
|
||
The downsides of this are the additional bookkeeping required in the | ||
newly created `by-access-seq` and `by-access-id` indexes. Given the | ||
resource requirements of the alternative db-per-user, this is a more | ||
than welcome trade-off. | ||
|
||
As a first iteration, this aims to tackle enough probelms to be useful | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/probelms/problems |
||
for solving real-world problems people run into. | ||
|
||
I’m envisioning future iterations that add the following features: | ||
|
||
* per-access-seq powered views | ||
* differentiation between read and write access for documents | ||
* support for multiple users in `_access: []` | ||
* support for groups in `_access: []` | ||
|
||
The latter two might be better suited to be implemented on a future | ||
FoundationDB backend. | ||
|
||
All changes proposed here should translate seamlessly to a FoundationDB | ||
future. | ||
|
||
|
||
# Key Changes | ||
|
||
There are no default changes, but folks can op into the new behaviour. | ||
|
||
## Applications and Modules affected | ||
|
||
`couch`, `couch_mrview`, `couch_index`, `couch_replicator`, `chttpd` | ||
|
||
## HTTP API additions | ||
|
||
Note: this list is acopypasta from the 2.3.1 API documentation. | ||
|
||
`/db` | ||
|
||
* no changes | ||
|
||
`/db/_all_docs` | ||
`/db/{doc}` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
|
||
`/db/_design_docs` | ||
|
||
* TBD: problem: maybe map admin-only ddocs as `_admin` in `_access` | ||
index, and then use that for this endpoint. * that would probably | ||
also help with loading ddocs for VDU evaluation | ||
|
||
`/db/_bulk_get` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
* ids requested that belong to other users return an `{error: {reason: | ||
unauthorized}}` row | ||
|
||
`/db/_bulk_docs` | ||
|
||
* admin: no changes | ||
* user: only the docs where` req.userCtx.name == _access: [$name]` | ||
* ids requested that belong to other users return an `{error: {reason: | ||
unauthorized}}` row | ||
|
||
`/db/_find` | ||
`/db/_index` | ||
`/db/_explain` | ||
|
||
* admin only | ||
|
||
`/db/_shards` TBD probably no changes | ||
|
||
`/db/_shards/doc` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` plus | ||
non-_access ddocs | ||
|
||
`/db/_sync_shards` TBD probably no changes | ||
|
||
`/db/_changes` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` plus non-_access ddocs | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a way to get the documents for which the access has been revoked from the changes feed? Let's say that a user replicates the documents they have access in a pouchdb. Later, one of those documents is updated and the access for the user is revoked. If the user triggers a replication from CouchDB to PouchDB, they won't get this document from the changes feed (as I understand), and the document will still be in PouchDB with the old values. Is it correct? |
||
|
||
`/db/_compact` | ||
`/db/_compact/design-doc` | ||
`/db/_ensure_full_commit` | ||
`/db/_view_cleanup` | ||
`/db/_security` | ||
`/db/_purged_infos_limit` | ||
`/db/_revs_limit` | ||
|
||
* all no changes | ||
|
||
`/db/_purge` | ||
|
||
* admin: no changes | ||
|
||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
|
||
`/db/_missing_revs` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
* users of _missing_revs (i.e. replicators) need to understand a new | ||
response format which includes an {error: unauthorized} message. | ||
|
||
`/db/_revs_diff` | ||
|
||
* admin: no changes | ||
* user: only the docs where req.userCtx.name == _access: [$name] | ||
* users of _missing_revs (i.e. replicators) need to understand a new | ||
response format which includes an {error: unauthorized} message. | ||
|
||
`/db/doc` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
|
||
`/db/doc/attachment` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
|
||
`/db/_design/design-doc` | ||
`/db/_design/design-doc/attachment` | ||
`/db/_design/design-doc/_info` | ||
|
||
* admin: no changes unless doc includes _access value | ||
* user: no access, see above | ||
|
||
`/db/_design/design-doc/_view/view-name` | ||
|
||
* admin: no changes | ||
* user: no access, see above | ||
|
||
`/db/_design/design-doc/_show/show-name` | ||
`/db/_design/design-doc/_show/show-name/doc-id` | ||
`/db/_design/design-doc/_list/list-name/view-name` | ||
`/db/_design/design-doc/_list/list-name/other-ddoc/view-name` | ||
`/db/_design/design-doc/_update/update-name` | ||
`/db/_design/design-doc/_update/update-name/doc-id` | ||
`/db/_design/design-doc/_rewrite/path` | ||
|
||
* these are available on non-_access ddocs only (or not supported, as | ||
per other changes) | ||
|
||
`/db/_local_docs /db/_local/id` | ||
|
||
* admin: no changes | ||
* user: only the docs where `req.userCtx.name == _access: [$name]` | ||
* replication engines MUST be changed to include an _access member in | ||
the replication definition that can be included in _local checkpoints | ||
AND _access MUST be included in the session id calculation. | ||
|
||
## HTTP API deprecations | ||
|
||
None | ||
|
||
# Security Considerations | ||
|
||
This is a significant change to the CouchDB security model. All of the | ||
above are security considerations. | ||
|
||
# References | ||
|
||
https://lists.apache.org/thread.html/6aa77dd8e5974a3a540758c6902ccb509ab5a2e4802ecf4fd724a5e4@%3Cdev.couchdb.apache.org%3E | ||
|
||
https://lists.apache.org/thread.html/1aae26aa329817d8c54bab615a0df1c3a7b0fd34f17a2321ecf047f3@%3Cdev.couchdb.apache.org%3E | ||
|
||
|
||
# Acknowledgements | ||
|
||
Thanks to @wohali who helped me talk some of these things through and | ||
of course all of dev@, specifically the Boston Summit attendees for | ||
kickstarting this effort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The introduction and Abstract don't feel right to me. What you have in the introduction seems to continue in the Abstract. I actually think all of that should go in the detailed description of what is the problem and how we trying to solve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your comments and I agree, this is mostly a paste from earlier, less structured, write-ups