Skip to content
This repository has been archived by the owner on Oct 17, 2022. It is now read-only.

rfc(per-doc-access): first draft #424

Closed
wants to merge 9 commits into from
Closed

Conversation

janl
Copy link
Member

@janl janl commented Jul 7, 2019

@janl janl self-assigned this Jul 7, 2019
@ermouth
Copy link
Contributor

ermouth commented Jul 7, 2019

Are skip and limit query params expected to work as in existing versions?

@janl
Copy link
Member Author

janl commented Jul 7, 2019

@ermouth could you give an example?

@ermouth
Copy link
Contributor

ermouth commented Jul 7, 2019

@janl I mean a protected bucket may have interleaving docs of users A and B, like:

{_id: '01', _access:['A']}, {_id:'02', _access:['B']}, {_id: '03', _access:['A']}

What exactly _all_docs?limit=1&skip=1 will return for user A?

@janl
Copy link
Member Author

janl commented Jul 7, 2019

@ermouth empty result like if _id 02 didn’t exist. by-access-id is all docs in order of _access + _id and then any one authenticated users only gets their _access range, but can use that like regular by-id (_all_docs)

@janl janl force-pushed the rfc/010-per-document-access branch from c633008 to a08438f Compare July 15, 2019 08:18
Copy link
Member

@garrensmith garrensmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start to this.

# Introduction

Up until now (version 2.3.1), CouchDB could not serve mutually
untrusting users accessing the same database. If a user has access to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction and Abstract don't feel right to me. What you have in the introduction seems to continue in the Abstract. I actually think all of that should go in the detailed description of what is the problem and how we trying to solve it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your comments and I agree, this is mostly a paste from earlier, less structured, write-ups

CPU and RAM. sharing documents among two or more users requires the
creation of yet more databases.

Per-user document access aims to solve many of the above problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this paragraph and the goals are probably all thats needed for an abstract.


* _security members are allowed to write design docs, but they have to
have an `_access` field and those design docs with an `_access` field
are ignored on the server side. Db-admin ddocs get indexes built as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be clearer if you write this as:
design doc's with an _access field will be ignored in an access database

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I must have missed this one, I have it that way further down

* users can not remove themselves from `_access`, nor can they remove
the `_access` property. They can only `DELETE` a doc.

* If an existing doc changes the user mentioned in `_access` or an admin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for a user that previously had access to the document? Is there a way to notify them that they have lost access?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not as per this RFC, but we could consider a mode to _changes, e.g. /_changes?access=true that would include new rows with id/rev/revoked:true that clients could opt into.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that needs to be solved on CouchDB level. In other systems, if my access gets revoked, I don't necessarily get notified either. It would be easy enough to solve on an application level

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing here is that notifying the users device (say) allows the application to remove the document from the users device as a UX convenience (clearly there is no security benefit to this as users could have other copies).


## Dramatis Personae

*user*: a CouchDB-user, a record defined in the _users db identified by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you escape the _users. Everything below is ital

resource requirements of the alternative db-per-user, this is a more
than welcome trade-off.

As a first iteration, this aims to tackle enough probelms to be useful
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/probelms/problems

@garrensmith
Copy link
Member

Sorry @janl I meant to add another note here the other day. Could you add a section on the data model for the new changes and _all_doc indexes.

Copy link

@mikerhodes mikerhodes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end of the abstract, you get to the goal of this RFC:

  • allow developers to build apps wihtout having to resort to using the
    db-per-user pattern. Specifically PouchDB applications and CouchDB setups
    with a central server/cluster and many independent satellite installations
    with replication should be supported.

I think it's worth calling out at the very start instead that this is the goal
of this RFC because it's only possible to evaluate the design in light
of this goal.

I would then go on to specifically draw out that this document doesn't want to
cover traditional three-tier style architectures, which really this
RFC leaves to be handled in the traditional way of, for example, using
views with keys prefixed by user name and having a trusted server.

I say this because when reading the RFC, for the first read through I was pretty
much like "this is crazy" before re-reading the abstract and starting to view
the RFC through the prism of supporting the hub/spoke type model, when suddenly
it made sense. Particularly the "don't build user views" thing was a total
confusion until I flipped my viewpoint. With the right context, it was easy to
see how these were a nice solution to a vexing problem.

The main worry that I have about this proposal is that it introduces a feature
that sounds like it's applicable to more scopes than it really is. We have
introduced a per-doc security model that only really works for one use-case.
While I can see this proposal doesn't block figuring out how to make server
built views (reduces specifically) work with per-doc access, I fear that we end
up supporting a (valuable) mobile-first specific feature (safe replication from
a single database) using the terms of a much more generic feature (per document
access control) that we may never build. CouchApps are another feature that
suffered from being incomplete, in part because of our lack of server-side
per-doc controls :)

As noted in my comments, one thing I particularly wonder is whether these "spoke
only" design documents should be drawn out as a new API context rather than
inferring from _access whether they should be build on the server (this
behaviour seems to only make sense in the hub/spoke use-case). The behaviour
overloads "authn/z" concerns with server behaviours and I have found inferred
orthogonal behaviours usually cause a lot of pain later.

* later iterations of this could allow for `["shirley"]` being
shorthand for `[{"read": "shirley", "write": "shirley"}]` for
more fine-grained access control, but that is out of scope for
this RFC.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A change of this nature leads to pretty gnarly code in consumers of this information, as the type of the data in _access changes. I think we can have a better route here which removes the need for type changes and instead uses extra values on existing fields.

As a suggestion:

"_access": [{"shirly": "readwrite"}]

Version two might extend this to allow:

"_access": [{"shirly": "write"}]

And version 3 (later you note that v1 doesn't support multiple users in _access):

"_access": [{"shirly": "write"}, {"mike": "read"}]

While more verbose, it feels like the most future proof way of expressing this is as follows, which allows us to fill in extra fields later if needed:

"_access": [{"user": "shirly", "access": "readwrite"}, {"user": "mike", "access": "read"}]

For example, adding an ability via an extra field to allow a given user to add and remove users to _access for documents they "owned". Right now we are only dealing in read/write permissions, whereas in later parts of the document we alter this concept to documents "owned" by users; I don't think the access model here captures ownership explicitly, which is likely important as we expand to "sharing" type scenarious.

* users can only create documents with their own username in
`_access`.

* admins can add any users to `_access`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"user" perhaps for this RFC iteration, as later it's state multiple users are a future thing.

* users can not remove themselves from `_access`, nor can they remove
the `_access` property. They can only `DELETE` a doc.

* If an existing doc changes the user mentioned in `_access` or an admin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main thing here is that notifying the users device (say) allows the application to remove the document from the users device as a UX convenience (clearly there is no security benefit to this as users could have other copies).

with _users documents in conflict, if a document has a conflict
with separate _access entries, it becomes admin-only by
default. This case needs to be handled by an applications
_conflict handler.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've gone from the concept of "able to read and write the document" to a concept of "owning" a document, which is a very different thing (one might imagine that an "owner" can grant "read/write" to a "non-owner", where "owner" is really conveying something like "admin of this document").

Here, rather than owner, I think right now we mean "documents can only have one or zero entries in _access at any one point".

It's hard for me to see here what the behaviour of _access: [] is.

* document _ids are shared across all users. So only the first user who
creates the doc `_id: config` gets it. Applications need to ensure to
work around this and potentially prefix docs with the username before
writing/replicating them in.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a more complex security issue - the suggestion that applications work around this feels a little weak given that the user themselves are able to pollute the global document space via direct API access?

I guess here our suggestion is either "use a separate database for system docs" or "use a VDU to prevent system docs with _access set"?

validate_doc_update functions do not run on db inserts.

* this allows full pouchdb / satellite db replication, but avoids
problems with having 10000s of VDUs or 10000s of view indexes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that using the _access field as a flag for "don't build this on the server" will cause use problems later because it's implicit behaviour. It also feels like it might get in the way later of using this field to control access to views on the server.

Instead, this "don't build this on the server" feels like a new API concept to support the mobile user case of "spoke ddocs" which I'd feel more confident drawing out as just that via either specific flags on the ddoc or even new API paths for the ddocs (/_spoke/design/...).

document.

* if compaction hasn’t run yet, they get access to all previous
revision bodies that still exist.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the _access field exist as revision-specific metadata? If it does, then with this implementation we're saying that the _access field on a given revision may not be accurate with respect to who can view this revision. That feels hard to reason about.

Perhaps we are imaging that applications may typically deal with this by creating a document copy with the altered _access?

* regardless of compaction, they get access to the full list of
revision ids for the document. Extremely crafty people could try
to create a matching body for a revision they didn’t have access
to by trying to recreate an old hash.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be easier than it seems for documents that, say, typically only differ in a a few low-cardinality fields over their lifetime.

`/db/_changes`

* admin: no changes
* user: only the docs where `req.userCtx.name == _access: [$name]` plus non-_access ddocs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to get the documents for which the access has been revoked from the changes feed?

Let's say that a user replicates the documents they have access in a pouchdb. Later, one of those documents is updated and the access for the user is revoked. If the user triggers a replication from CouchDB to PouchDB, they won't get this document from the changes feed (as I understand), and the document will still be in PouchDB with the old values. Is it correct?

@wohali
Copy link
Member

wohali commented Aug 12, 2020

@janl Can we get this merged please? Thanks. I'm merging all of the RFCs that don't have any other sort of blockers; @garrensmith 's requests haven't been dismissed/changed so I'm not going to robo-merge this one.

@janl
Copy link
Member Author

janl commented Aug 22, 2020

@wohali this RFC isn’t fit to be merged. If you want the PR queue clean, we can close this and I’ll reopen once I addressed all the comments.

@janl
Copy link
Member Author

janl commented Sep 9, 2022

closing this for now, I’ll open a new variant on the couchdb repo now that the docs repo is being retired.

@janl janl closed this Sep 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Per-document access control
8 participants