-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max document size check is inaccurate #3354
Comments
+1 to have a ceiling to max_document_size And then I believe if we see a 2103 it should be a bug and we shouldn't see the error if we do the chunking properly. For chunking we may need a ceiling to the Is 57 bytes per-row overhead you mentioned fixed for all encryption types, or it depends on the plugin which does the actual encryption? If it depends on the plugin, maybe we should have a new aegis plugin API which exposes the per-row-encryption-overhead. Or even better, switch to encrypting, then chunking. Would that be possible? That would eliminate the involvement of chunking parameter from the discussion altogether.
Where was the 2103 error raised for bulk_docs, would you have a stack trace? With _bulk_docs I can see getting a 2101 (transaction size exceeded) easily. There is a heuristic there how we split the request into smaller transactions currently and it probably needs adjustment. |
the 57 bytes is from aegis core code, the only pluggable aspect is the key manager (i.e, how the 256 bits of key material are retrieved). it's 1 byte as a marker, 40 bytes for the wrapped row key and 16 bytes for the gcm integrity check. |
[{erlfdb_nif,erlfdb_future_get,[#Ref<0.1405062055.739115012.214974>],[]},{erlfdb,do_transaction,2,[{file,"src/erlfdb.erl"},{line,686}]},{fabric2_fdb,do_transaction,2,[{file,"src/fabric2_fdb.erl"},{line,171}]},{fabric2_fdb,transactional,2,[{file,"src/fabric2_fdb.erl"},{line,142}]},{fabric2_db,batch_update_docs,1,[{file,"src/fabric2_db.erl"},{line,1707}]},{fabric2_db,batch_update_docs,3,[{file,"src/fabric2_db.erl"},{line,1693}]},{fabric2_db,update_docs,3,[{file,"src/fabric2_db.erl"},{line,888}]},{chttpd_db,db_req,2,[{file,"src/chttpd_db.erl"},{line,520}]}] |
encrypting before chunking is a good thought but it'll be tricky. aegis encrypts at the fdb key/value level. the key is included in the gcm integrity check, as a protection against programming error and tampering. |
Makes sense about the 57 byte aegis overhead and about encrypting per row. It might be possible to account for the overhead probably when chunking and just reduce chunk size read from the config - 57 bytes. |
or we could move the chunking into aegis. Pass it the entire binary and it returns 1 or more rows each below the chunk limit. |
Letting aegis do the chunking would make make sense, yeah. |
Thanks for the _bulk_docs stacktrace. I tried to reproduce the 2103 error but had no luck so far. I used this script to insert docs of various sizes and batches https://gist.github.com/nickva/e7c7aa2a3d1fec3586fc4fee2f3a4cf8 At first I set Then in the test script if I set
With the default 8MB doc size same thing:
|
try #!/usr/bin/env python
import requests, json, random, string, os
docs = []
docs.append({"foo": "".join(random.choice(string.ascii_lowercase) for x in range(1000000))})
r = requests.post(
'http:https://localhost:15984/db1/_bulk_docs',
json={"docs":docs}, auth=('foo', 'bar')
)
print (r.status_code, r.text) |
Thanks, with max_doc_size = 1000000 it returns a 413:
It does that up until 1000009 after 1000010 it returns a 201
|
@rnewson @nickva @noahshaw11 I believe this is resolved by Noah's work in #3632, please reopen if not |
Description
CouchDB applies some validation to a request before attempting to process it, in order to reject work that has no chance of success.
The check for whether a document exceeds the max_document_size config value is flawed, however. It compares the (approximated) ejson encoded size to the max_document_size config. This check can pass and yet the document still cannot be persisted to FDB, yielding a 2103 code ("value_too_large", https://apple.github.io/foundationdb/api-error-codes.html). There are several reasons for this;
Important note that the couch_ejson_size:encoded_size function isn't going to calculate the right value even for the first step.
During the course of this investigation I have written a commit that ensures that every possible FDB error code, should it be thrown as an error and not caught until we get to chttpd.erl, will yield a readable error response with a suitable status code. In the motivating case it is a 413 with response body {"error":"value_too_large","reason":"Value length exceeds limit"}.
So, what to do?
In my view, the max_document_size config setting must have a hardcoded ceiling, so that administrators are informed if they have set it above a value that can be honoured given the FDB limits.
As a principle, if max_document_size is set to an acceptable value, then any document that is not larger than that value must be capable of being stored by couchdb in principle. Thus, it is important that the comparison is corrected or, alternatively, removed (wherein the attempt would proceed to FDB which, given the newly enforced upper bound, will succeed).
Finally, I noticed that we introduce new failure modes for _bulk_docs, or perhaps just a new case of an existing one. If I attempt a _bulk_docs with two documents, one of which would cause a 2103 error on the txn and one that doesn't, what should happen? Currently the entire attempt fails, which exposes an unintended "group transaction" semantic. However, this is the case if the max_document_size check fails on one doc (in 4.x and 3.x).
Steps to Reproduce
Insert a large document.
Expected Behaviour
Unsure!
Your Environment
Additional Context
The text was updated successfully, but these errors were encountered: