Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Sub-Document Operations #1559

Open
janl opened this issue Aug 15, 2018 · 21 comments
Open

RFC: Sub-Document Operations #1559

janl opened this issue Aug 15, 2018 · 21 comments

Comments

@janl
Copy link
Member

janl commented Aug 15, 2018

(supersedes #1498)

In regular CouchDB operation, clients have to provide a full document revision to be sent in order to make a document update. If only one or a few field values are to be updated, espcially on larger documents, the overhead of sending a full JSON body is significant.

This proposal introduces CouchDB sub-document operations with the help of JSON Pointer (RFC 6901)and JSON Patch (RFC 6902).

JSON Pointer is designed to work inside a URI fragment (think index.html#foo), so it’d be great to get single-field oeprations handled that way.

Single Field Updates

Sample doc:

{
  "_id": "doc",
  "field": "hey there",
  "tags": ["zero", "one", "two"],
  "coordinates": {
    "lat": "123.456",
    "lon": "321.654"
  },
  "age": 22
}

Example GET requests

GET /db/doc#field

Result:

"hey there"

GET /db/doc#coordinates

Result:

{
    "lat": "123.456",
    "lon": "321.654"
}

GET /db/doc#coordinates/lat

Result:

"123.456"

GET /db/doc#age

Result:

22

GET /db/doc#tags/1

Result:

"one"

Example PUT requests

PUT /db/doc?rev=$rev#field
"hello there"

Result:

{"ok": true, "rev": "$rev2"}

New doc revision:

{
  "_id": "doc",
  "field": "hello there",
  "tags": ["zero", "one", "two"],
  "coordinates": {
    "lat": "123.456",
    "lon": "321.654"
  },
  "age": 22
}

PUT /db/doc?rev=$rev#coordinates/updatedAt
1534321264

Result:

{"ok": true, "rev": "$rev2"}

New doc revision:

{
  "_id": "doc",
  "field": "hello there",
  "tags": ["zero", "one", "two"],
  "coordinates": {
    "lat": "123.456",
    "lon": "321.654",
    "updatedAt": 1534321264
  },
  "age": 22
}

Example DELETE requests

DELETE /db/doc?rev=$rev#field

New doc revision:

{
  "_id": "doc",
  "tags": ["zero", "one", "two"],
  "coordinates": {
    "lat": "123.456",
    "lon": "321.654"
  },
  "age": 22
}
DELETE /db/doc?rev=$rev#tags/0

New doc revision:

{
  "_id": "doc",
  "field": "hey there",
  "tags": ["one", "two"],
  "coordinates": {
    "lat": "123.456",
    "lon": "321.654"
  },
  "age": 22
}

Multi Field Updates

Of course, making multiple changes will be the next logical step. Thankfull JSON Patch provides a way to do this without having to go too much into canonical JSON serialisation required for a fully fledged diffing algorithm.

{POST,PATCH} /db/doc?rev=$rev
Content-Type: application/json-patch+json
[
  { "op": "replace", "path": "/field", "value": "hello there"},
  { "op": "replace", "path": "/age", "value": 23},
  { "op": "add", "path": "/horse", "value": false}
]

New doc revision:

{
  "_id": "doc",
  "field": "hello there",
  "tags": ["zero", "one", "two"],
  "coordinates": {
    "lat": "123.456",
    "lon": "321.654"
  },
  "age": 23,
  "horse": false
}

Implementation

Internally, CouchDB will still do the regular load, compare-revision, store procedure, it’s just that the new revision is now produced by applying a JSON Pointer or JSON Patch to the old revision of the doc, instead of being provided by the client.

I’m happy to donate my JSON Pointer implementation, but any other will do, too, this is nice and minimal. For JSON Patch, there is an MPL licensed library which might be problematic for LEGAL/LICENSING reasons, but the implementation doesn’t look too complicated to do from scratch.

Notes

I puntend on Content-Type defintions for single field operation because I couldn’t find proper precedent here. Strictly, JSON is always wrapped by an {}-object value, but we’re allowing to send the native JSON values like strings, numbers, booleans, arrays and objects, and only objects are strictly JSON themselves. I’d love some input here on my sparkling white bikeshed that desparately needs colour suggestions.

I opted for keeping only explicit operations on collections, e.g. add value x to array in position 7, and not “append x to {beginning,end} of array”. While the revision system should avoid any unexpected results here, it felt not quite right to allow relative operations. I might be able to be convinced otherwise :)

There is also JSON Merge Patch which is essentially sending fragments of changes that the recipient then has to apply as needed, which would work as well, but might be atiny bit more complicated to implement. If there is significant community interest, we can look into it.

@st-fankl-in
Copy link

Just a heads-up

JSON Pointer is designed to work inside a URI fragment (think index.html#foo), so it’d be great to get single-field oeprations handled that way.

URI fragments are not send for GET Requests (at least)
I ran netcat and checked with curl, chrome and firefox, neither of them included the URI fragment:

$ echo -e "HTTP/1.1 200 OK\n\n Hello World\n" | nc -l -p 1500
GET /db/doc HTTP/1.1
Host: localhost:1500
User-Agent: curl/7.58.0
Accept: */*
$ curl -v 'http:https://localhost:1500/db/doc#field'
*   Trying ::1...
* TCP_NODELAY set
* connect to ::1 port 1500 failed: Connection refused
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 1500 (#0)
> GET /db/doc HTTP/1.1
> Host: localhost:1500
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/1.1 200 OK
* no chunk, no close, no size. Assume close to signal end
< 
 Hello World

netcat and curl to make sure nothing "smart" is interfering, firefox and chrome did send the same GET though more headers.

@janl
Copy link
Member Author

janl commented Aug 15, 2018

@st-fankl-in a drat, but thanks!


Modified proposal: /db/doc/_/ instead of /db/doc#

@wohali
Copy link
Member

wohali commented Aug 15, 2018

Duplicate of #1280 as well, I'll close that one.

@wohali
Copy link
Member

wohali commented Aug 15, 2018

From the license FAQ on MPL:

Software under the following licenses may be included in binary form within an Apache product if the inclusion is appropriately labeled (see below):

  • Mozilla Public Licenses: MPL 1.0, MPL 1.1, and MPL 2.0

For small amounts of source that is directly consumed by the ASF product at runtime in source form, and for which that source is unmodified and unlikely to be changed anyway (say, by virtue of being specified by a standard), inclusion of appropriately labeled source is also permitted. An example of this is the web-facesconfig_1_0.dtd, whose inclusion is mandated by the JSR 127: JavaServer Faces specification.

Based on this we couldn't pull the JSON Patch functionality in via the source repo, but if it's in hex, for instance, we could handle it that way.

Alternately we wake up one morning and @davisp has written his own implementation 😉

@lazedo
Copy link
Contributor

lazedo commented Aug 15, 2018

@janl Hi, not sure if i understood correctly but it seems that you're proposing that /db/doc?rev=$rev will allow apps/users to partially update documents without knowing the current rev, is that correct ?
i love this proposal but have some concerns if the above is correct. two users updating the same document at same time in different fields is great, but if they try to update the same fields at same they both succeed but only the last will preserve, this can cause some unwanted disputes.

@wohali
Copy link
Member

wohali commented Aug 15, 2018

@lazedo no, I think that's parametrized, you'd still need to know the current rev of the document to make this work.

@lazedo
Copy link
Contributor

lazedo commented Aug 15, 2018

@wohali thanks. an alternative for concurrent updates in different parts of document would be to send a computed hash of previous values being updated and have couchdb validate against current values before accepting. that way, two users can have the same revof a document and update different parts of the document at same time with no need to send the rev for partial updates. thoughts ?

@wohali
Copy link
Member

wohali commented Aug 15, 2018

@lazedo Sorry, that's not under consideration here. This patch proposal will still create a new _rev after it completes, so one of those users will "lose" and either introduce a conflict (if they try and update at the same time) or will receive a 409 and have to retry the operation at the client. Having the server retry multiple times is just asking for runaway race conditions.

Remember that previous revisions of the document may no longer exist due to compaction, so it'd be entirely possible for user 1 to transmit their patch (upping the rev from 1 -> 2) and by the time user 2 tries to transmit their patch, referencing rev 1, that revision no longer exists - so the data you want to compare against isn't available.

@linacs
Copy link

linacs commented Aug 22, 2018

Hello!
I really love this proposal, which would have a very strong impact on my use cases. I use CouchApp, so my comments will be about two points I see to consider:

_ Dynamic _rewrites: On the "to" field, the REST API from the browser could be rewritten to target directly the partial document.

_ Error handling: What happens if the targeted id/field is missing? There should be a way to handle that (right now, when a document is missing after a rewrite, it's already a mess...).

I like your syntax, very comprehensive (but indeed, as said earlier, the '#' could be a technical problem. "/_/" is very interesting, it looks like the old "/#/" but with an underscore, beloved by couchDB).
Moreover, Couchbase has already this feature (but without a REST access?).
https://developer.couchbase.com/documentation/server/current/sdk/subdocument-operations.html

Gilles

@janl
Copy link
Member Author

janl commented Aug 23, 2018

Thanks for your additional input.

_ Dynamic _rewrites:

unless this works already, there isn’t anything we’re going to change here

_ Error handling: What happens if the targeted id/field is missing?

missing doc will cause a normal 404, missing fields is an interesting one that is unspecified as of now, thanks for pointing that out.

GET /doc/_/foo/nonexistent -> 404
GET /doc/_/foo/nonexistent/bar/baz -> 404
DELETE /doc/_/nonexistent -> 404 | alternatively: 200 and return the old rev to signify no change, but the result is achieved. // maybe one of this is the default and one of this is available via a `?option=true|false`
DELETE /doc/_/foo/nonexistent/bar/baz -> same as before
PUT /doc/_/nonexistent -> 200 -> creates new field
PUT /doc/_/nonexistent/bar -> 200 -> creates new field and new sub field // this is analogous to how we create attachments and docs in one go // we could add an option `?fail_if_no_exist=true` 

@janl
Copy link
Member Author

janl commented Aug 23, 2018

And one more general point. For me, this proposal predates Mango by a couple of years, I only now got around to writing it up.

When I talked about this potential feature internally at work, we came to the conclusion that it might be less confusing to use Mango index definitions here, instead of JSON Pointer. The benefit of JSON Pointer fitting natively into URLs are nice, but not necessarily important enough to introduce two different ways for specifying how to specify fields inside a document.

As a refresher, here’s how Mango does it:

{
    "index": {
        "fields": ["foo.bar.baz.3"]
    },
    "name" : "foo-index",
    "type" : "json"
}

this would index d this doc:

{
  "foo": { "bar": { "baz": ["a", "b", "c","d","e","f"] } }
}

So effectively, Mango syntax is just like JSON Pointer, expect every / is a ..

I think this would make for a more coherent proposal.

@linacs
Copy link

linacs commented Aug 23, 2018

I'm not familiar with Mango, can't tell...
About 404: What I think is missing, for CouchApps, is some kind of hook to adapt the answer. But I think I could manage with the frontend. I give you an example: I use vhosts to redirect to a given document representing a school. If was really complicated to make a proper webpage displaying that a school is not existing (to avoid a CouchDB error in case of missing document!). Customising 404 is something quite common on webservers I think. However, for this particular scenario with subdocuments, if 404 return a Json with the missing id, it would be fine to deal with I think.

Here is how I use rewrites (one line extracted from my API, I have a hundred like this):

{ "from": "webapp/:school/api/timetables/:timetableid/teachers/:teacherid", "to": `"_update/api/:school", "method": "POST", "query": { "module": "timetables", "callback": "os_timetables_teachers_post", "access": "admin", "timetableid": ":timetableid", "teacherid": ":teacherid" } },`

So, if it can be changed to:
"to": "_update/api/:school/_/timetables/:timetableid/teachers/:teacherid"
it would be fantastatic! :)

@ermouth
Copy link
Contributor

ermouth commented Aug 24, 2018

@linacs > So, if it can be changed to...

Have you tried JS rewrites (rewrites as a function) http:https://docs.couchdb.org/en/2.1.2/api/ddoc/rewrites.html?highlight=rewrites#rewrite-section-a-is-stringified-function ?

@linacs
Copy link

linacs commented Aug 24, 2018

OMG, I didn't know about it, my whole architecture was based on CouchDB1.6! (And as I went directly to 2.0, I may have skipped the intermediate changelog)
What would be perfect would be to merge those two behaviors, but I guess if I put my rewrite array in a regular file, I'll be able to retrieve it from the _design with "this."? It opens a lot of possibilities!!
Thanks! :)

@ermouth
Copy link
Contributor

ermouth commented Aug 24, 2018

@linacs

I'll be able to retrieve it from the _design with "this."?

Yes.

It opens a lot of possibilities!!

Indeed. To pattern match req obj parts (path, userCtx and so on) in Erlang-like style you might also find this gist useful https://gist.github.com/ermouth/f954e71411590b66eb58.

@zekenaulty
Copy link

Is there a proposed time frame for something like this to be implemented?

I am currently looking at a project where a user may import 1,000,000+ "rows" of data at a time. Not needing to do a full fetch to update would be very, very nice with that amount of data. CouchDB is currently on the same server, no wire, but that may not always be the case.

This feature should be a priority, I know my use case would benefit from it, and I assume many others would as well.

@wohali
Copy link
Member

wohali commented Jan 14, 2019

@zekenaulty Sorry, no time frame as of yet. We are a project of volunteers -- pull requests are always welcome! :)

@janl
Copy link
Member Author

janl commented Jan 18, 2019

@zekenaulty in your scenario, you probably don’t want a single doc with 1M+ keys or array elements.

@SinanGabel
Copy link

SinanGabel commented Jan 18, 2019 via email

@janl
Copy link
Member Author

janl commented Jan 24, 2019

Heads-up: proof of concept work on this is progressing on this with a team at work, progress is somewhat slow as we do this as a side and training-project. I just wanted to let the project know in case someone else wants to work on this as well.

@andyward
Copy link

andyward commented Mar 6, 2019

This looks really useful if it could be implemented. Has any thought been given to how you might add/update an item to an array in the documents. Given the original example with a doc with tags, am I right in thinking it might look like this:

Add new tag "three"

PUT /db/doc?rev=$rev/_/tags
"three"

Update tag "two"

PUT /db/doc?rev=$rev/_/tags/2
"two-updated"

Delete tag "two"

DELETE/db/doc?rev=$rev/_/tags/2

@wohali wohali moved this from Proposed for 3.x to Proposed (backlog) in Roadmap Jul 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Roadmap
  
Proposed (backlog)
Development

No branches or pull requests

9 participants