-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle all erlfdb error codes #3355
Conversation
There's an open question on whether we should expose these internal error states at all and I'm undecided myself. My current view is I would rather have this PR in place so that we have "nice" error responses but do work elsewhere to eliminate the chance that we actually return one. I'm also completely open to the idea of sending a generic error in the response but recording the FDB details (the output of fdb_error_name/1, say) to couch_log. |
src/chttpd/src/chttpd.erl
Outdated
fdb_error_to_http_status_code(2102) -> | ||
413; | ||
fdb_error_to_http_status_code(2103) -> | ||
413; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In many cases I've seen so far 2103 and 2102 where generated due to bugs or misconfiguration. Emitting 413 http error for those cases might affect replication, as the replicator will assume a 413 error is because the document has exceeded a configured size and skip over it and continue on.
I wonder if 2102 (key_too_large
) may be a case of an 414 (uri too long) HTTP code too. It depends where it would come from in the code. Maybe if there is a too large a max_document_id_length
? I think it may even be set to infinity
currently which is silly. We should have a ceiling there as well just as discussed in #3354
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I can't see how key_too_large would be related to the uri. the uri, at most, contains the doc id. in the case of _bulk_docs
not even that. sending a 414 when the uri is not too long would be a clear mistake on our part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large document IDs was just a guess when a 2102 would be emitted and if it makes sense to bubble it up as a 413. Currently we emit a 2102 erlfdb error with a 500 http code:
$ ID=$(cat /dev/urandom | tr -dc 'a-z' | fold -w 11000 | head -n 1)
$ http put http:https://adm:pass@localhost:15984/db/$ID a=b
{
"error": "erlfdb_error",
"reason": "2102",
"ref": 2388180944
}
Note: To reproduce it we'd need to extend the user and socket buffers, otherwise another bug in erlang http header protocol parser leads to a different 400 error (boring and gory details in #1810)
[chttpd]
+socket_options = [{sndbuf, 262144}, {buffer, 262144}, {nodelay, true}]
+server_options = [{recbuf, 262144}, {buffer, 262144}]
If we set a <10kb ceiling (say 7kb which we use at Cloudant), for max document id length the http error returned should be a 400 with illegal_docid
reason and not a 413.
The max_uri_length
would also protect against long document ID for individual doc PUTs but then the response would be a 414 instead of 413.
In other words, I couldn't think of cases when 2102 should result in a 413 http error. I am not sure about 2103 emitting a 413 http error either as discussed in #3354 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nickva yes, I think I agree here. We should only be sending 4xx errors when we're 100% sure that the client is at fault (and that retrying will not help). The existing 413 for transaction_too_large is a mistake under this view as the request was within the max_{document,request,attachment}_size params but, because aegis has extended the fdb values, still fails.
So, I think any FDB error that bubbles up to chttpd is a 500. Separately we should strive to make the explicit max_* checks actually fail if the request would fail when padded by aegis (where encryption is enabled).
I have updated the PR to just send readable 500's for FDB errors and removed the previous transaction_too_large work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR to just send readable 500's for FDB errors and removed the previous transaction_too_large work.
Agree. This change would save time having to look up error code numbers in the logs. And we'd want to have better doc/attachment/etc limit "ceilings" dictated by FDB limits like discussed in #3354
I just had a minor question about handling new error codes, then +1 to merge.
79a6959
to
44fbf1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall but I don't think chttpd is the place for the fdb error code translation.
44fbf1b
to
4a7348b
Compare
4a7348b
to
3a26da1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done
I tested by increasing the max document size limit over 10mb then posted doc > 10mb in size
-max_document_size = 8000000 ; bytes
+max_document_size = 80000000 ; bytes
$ openssl rand -hex 11000000 > ~/tmp/11m
$ http post http:https://adm:pass@localhost:15984/db a=@~/tmp/11m
HTTP/1.1 500 Internal Server Error
{
"error": "erlfdb_error",
"reason": "code: 2101, desc: Transaction exceeds byte limit"
}
Overview
Ensure a reasonable error message should an FDB error reach the top of the request handling stack.
This change also removes the inappropriate usage of the 408 Request Timeout status code. a 1031 transaction_timeout from FDB is sent if the transaction takes too long, which is typically not even started until after we've received the full request.
Testing recommendations
Just break something. :)
Related Issues or Pull Requests
#3354
Checklist
rel/overlay/etc/default.ini