Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle all erlfdb error codes #3355

Merged
merged 1 commit into from
Feb 8, 2021
Merged

Handle all erlfdb error codes #3355

merged 1 commit into from
Feb 8, 2021

Conversation

rnewson
Copy link
Member

@rnewson rnewson commented Jan 29, 2021

Overview

Ensure a reasonable error message should an FDB error reach the top of the request handling stack.

This change also removes the inappropriate usage of the 408 Request Timeout status code. a 1031 transaction_timeout from FDB is sent if the transaction takes too long, which is typically not even started until after we've received the full request.

Testing recommendations

Just break something. :)

Related Issues or Pull Requests

#3354

Checklist

@rnewson
Copy link
Member Author

rnewson commented Jan 29, 2021

There's an open question on whether we should expose these internal error states at all and I'm undecided myself. My current view is I would rather have this PR in place so that we have "nice" error responses but do work elsewhere to eliminate the chance that we actually return one. I'm also completely open to the idea of sending a generic error in the response but recording the FDB details (the output of fdb_error_name/1, say) to couch_log.

Comment on lines 1058 to 1061
fdb_error_to_http_status_code(2102) ->
413;
fdb_error_to_http_status_code(2103) ->
413;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In many cases I've seen so far 2103 and 2102 where generated due to bugs or misconfiguration. Emitting 413 http error for those cases might affect replication, as the replicator will assume a 413 error is because the document has exceeded a configured size and skip over it and continue on.

I wonder if 2102 (key_too_large) may be a case of an 414 (uri too long) HTTP code too. It depends where it would come from in the code. Maybe if there is a too large a max_document_id_length? I think it may even be set to infinity currently which is silly. We should have a ceiling there as well just as discussed in #3354

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I can't see how key_too_large would be related to the uri. the uri, at most, contains the doc id. in the case of _bulk_docs not even that. sending a 414 when the uri is not too long would be a clear mistake on our part.

Copy link
Contributor

@nickva nickva Jan 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large document IDs was just a guess when a 2102 would be emitted and if it makes sense to bubble it up as a 413. Currently we emit a 2102 erlfdb error with a 500 http code:

$ ID=$(cat /dev/urandom | tr -dc 'a-z' | fold -w 11000 | head -n 1)
$ http put http:https://adm:pass@localhost:15984/db/$ID a=b

{
    "error": "erlfdb_error",
    "reason": "2102",
    "ref": 2388180944
}

Note: To reproduce it we'd need to extend the user and socket buffers, otherwise another bug in erlang http header protocol parser leads to a different 400 error (boring and gory details in #1810)

[chttpd]
+socket_options = [{sndbuf, 262144}, {buffer, 262144}, {nodelay, true}]
+server_options = [{recbuf, 262144}, {buffer, 262144}]

If we set a <10kb ceiling (say 7kb which we use at Cloudant), for max document id length the http error returned should be a 400 with illegal_docid reason and not a 413.

The max_uri_length would also protect against long document ID for individual doc PUTs but then the response would be a 414 instead of 413.

In other words, I couldn't think of cases when 2102 should result in a 413 http error. I am not sure about 2103 emitting a 413 http error either as discussed in #3354 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickva yes, I think I agree here. We should only be sending 4xx errors when we're 100% sure that the client is at fault (and that retrying will not help). The existing 413 for transaction_too_large is a mistake under this view as the request was within the max_{document,request,attachment}_size params but, because aegis has extended the fdb values, still fails.

So, I think any FDB error that bubbles up to chttpd is a 500. Separately we should strive to make the explicit max_* checks actually fail if the request would fail when padded by aegis (where encryption is enabled).

I have updated the PR to just send readable 500's for FDB errors and removed the previous transaction_too_large work.

Copy link
Contributor

@nickva nickva Feb 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR to just send readable 500's for FDB errors and removed the previous transaction_too_large work.

Agree. This change would save time having to look up error code numbers in the logs. And we'd want to have better doc/attachment/etc limit "ceilings" dictated by FDB limits like discussed in #3354

I just had a minor question about handling new error codes, then +1 to merge.

@rnewson rnewson force-pushed the fdb-error-handling branch 2 times, most recently from 79a6959 to 44fbf1b Compare February 5, 2021 13:26
@rnewson rnewson requested a review from nickva February 5, 2021 13:26
src/chttpd/src/chttpd.erl Outdated Show resolved Hide resolved
Copy link
Member

@davisp davisp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall but I don't think chttpd is the place for the fdb error code translation.

src/chttpd/src/chttpd.erl Outdated Show resolved Hide resolved
Copy link
Contributor

@nickva nickva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done

I tested by increasing the max document size limit over 10mb then posted doc > 10mb in size

-max_document_size = 8000000 ; bytes
+max_document_size = 80000000 ; bytes
$ openssl rand -hex 11000000 > ~/tmp/11m
$ http post http:https://adm:pass@localhost:15984/db a=@~/tmp/11m

HTTP/1.1 500 Internal Server Error

{
    "error": "erlfdb_error",
    "reason": "code: 2101, desc: Transaction exceeds byte limit"
}

@rnewson rnewson merged commit 0b488de into main Feb 8, 2021
@rnewson rnewson deleted the fdb-error-handling branch February 8, 2021 09:55
@bessbd bessbd mentioned this pull request Mar 2, 2021
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants