Handle all erlfdb error codes #3355

rnewson · 2021-01-29T14:27:23Z

Overview

Ensure a reasonable error message should an FDB error reach the top of the request handling stack.

This change also removes the inappropriate usage of the 408 Request Timeout status code. a 1031 transaction_timeout from FDB is sent if the transaction takes too long, which is typically not even started until after we've received the full request.

Testing recommendations

Just break something. :)

Related Issues or Pull Requests

#3354

Checklist

Code is written and works correctly
Changes are covered by tests
Any new configurable parameters are documented in rel/overlay/etc/default.ini
A PR for documentation changes has been made in https://github.com/apache/couchdb-documentation

rnewson · 2021-01-29T14:29:32Z

There's an open question on whether we should expose these internal error states at all and I'm undecided myself. My current view is I would rather have this PR in place so that we have "nice" error responses but do work elsewhere to eliminate the chance that we actually return one. I'm also completely open to the idea of sending a generic error in the response but recording the FDB details (the output of fdb_error_name/1, say) to couch_log.

nickva · 2021-01-29T16:35:31Z

src/chttpd/src/chttpd.erl

+fdb_error_to_http_status_code(2102) ->
+ 413;
+fdb_error_to_http_status_code(2103) ->
+ 413;


In many cases I've seen so far 2103 and 2102 where generated due to bugs or misconfiguration. Emitting 413 http error for those cases might affect replication, as the replicator will assume a 413 error is because the document has exceeded a configured size and skip over it and continue on.

I wonder if 2102 (key_too_large) may be a case of an 414 (uri too long) HTTP code too. It depends where it would come from in the code. Maybe if there is a too large a max_document_id_length? I think it may even be set to infinity currently which is silly. We should have a ceiling there as well just as discussed in #3354

hm, I can't see how key_too_large would be related to the uri. the uri, at most, contains the doc id. in the case of _bulk_docs not even that. sending a 414 when the uri is not too long would be a clear mistake on our part.

Large document IDs was just a guess when a 2102 would be emitted and if it makes sense to bubble it up as a 413. Currently we emit a 2102 erlfdb error with a 500 http code:

$ ID=$(cat /dev/urandom | tr -dc 'a-z' | fold -w 11000 | head -n 1) $ http put http:https://adm:pass@localhost:15984/db/$ID a=b { "error": "erlfdb_error", "reason": "2102", "ref": 2388180944 }

Note: To reproduce it we'd need to extend the user and socket buffers, otherwise another bug in erlang http header protocol parser leads to a different 400 error (boring and gory details in #1810)

[chttpd] +socket_options = [{sndbuf, 262144}, {buffer, 262144}, {nodelay, true}] +server_options = [{recbuf, 262144}, {buffer, 262144}]

If we set a <10kb ceiling (say 7kb which we use at Cloudant), for max document id length the http error returned should be a 400 with illegal_docid reason and not a 413.

The max_uri_length would also protect against long document ID for individual doc PUTs but then the response would be a 414 instead of 413.

In other words, I couldn't think of cases when 2102 should result in a 413 http error. I am not sure about 2103 emitting a 413 http error either as discussed in #3354 (comment)

@nickva yes, I think I agree here. We should only be sending 4xx errors when we're 100% sure that the client is at fault (and that retrying will not help). The existing 413 for transaction_too_large is a mistake under this view as the request was within the max_{document,request,attachment}_size params but, because aegis has extended the fdb values, still fails.

So, I think any FDB error that bubbles up to chttpd is a 500. Separately we should strive to make the explicit max_* checks actually fail if the request would fail when padded by aegis (where encryption is enabled).

I have updated the PR to just send readable 500's for FDB errors and removed the previous transaction_too_large work.

I have updated the PR to just send readable 500's for FDB errors and removed the previous transaction_too_large work.

Agree. This change would save time having to look up error code numbers in the logs. And we'd want to have better doc/attachment/etc limit "ceilings" dictated by FDB limits like discussed in #3354

I just had a minor question about handling new error codes, then +1 to merge.

src/chttpd/src/chttpd.erl

davisp

Looks good overall but I don't think chttpd is the place for the fdb error code translation.

src/chttpd/src/chttpd.erl

nickva

Well done

I tested by increasing the max document size limit over 10mb then posted doc > 10mb in size

-max_document_size = 8000000 ; bytes
+max_document_size = 80000000 ; bytes

$ openssl rand -hex 11000000 > ~/tmp/11m
$ http post http:https://adm:pass@localhost:15984/db a=@~/tmp/11m

HTTP/1.1 500 Internal Server Error

{
    "error": "erlfdb_error",
    "reason": "code: 2101, desc: Transaction exceeds byte limit"
}

rnewson requested review from davisp and garrensmith January 29, 2021 14:45

nickva reviewed Jan 29, 2021

View reviewed changes

rnewson force-pushed the fdb-error-handling branch 2 times, most recently from 79a6959 to 44fbf1b Compare February 5, 2021 13:26

rnewson requested a review from nickva February 5, 2021 13:26

nickva reviewed Feb 5, 2021

View reviewed changes

src/chttpd/src/chttpd.erl Outdated Show resolved Hide resolved

davisp requested changes Feb 5, 2021

View reviewed changes

src/chttpd/src/chttpd.erl Outdated Show resolved Hide resolved

rnewson force-pushed the fdb-error-handling branch from 44fbf1b to 4a7348b Compare February 7, 2021 12:09

Handle all erlfdb error codes

3a26da1

rnewson force-pushed the fdb-error-handling branch from 4a7348b to 3a26da1 Compare February 7, 2021 12:09

nickva approved these changes Feb 7, 2021

View reviewed changes

rnewson merged commit 0b488de into main Feb 8, 2021

rnewson deleted the fdb-error-handling branch February 8, 2021 09:55

bessbd mentioned this pull request Mar 2, 2021

Re-add transaction size exceeded test #3395

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle all erlfdb error codes #3355

Handle all erlfdb error codes #3355

rnewson commented Jan 29, 2021

rnewson commented Jan 29, 2021

nickva Jan 29, 2021

rnewson Jan 29, 2021

nickva Jan 30, 2021 •

edited

Loading

rnewson Feb 5, 2021

nickva Feb 5, 2021 •

edited

Loading

davisp left a comment

nickva left a comment

Handle all erlfdb error codes #3355

Handle all erlfdb error codes #3355

Conversation

rnewson commented Jan 29, 2021

Overview

Testing recommendations

Related Issues or Pull Requests

Checklist

rnewson commented Jan 29, 2021

nickva Jan 29, 2021

Choose a reason for hiding this comment

rnewson Jan 29, 2021

Choose a reason for hiding this comment

nickva Jan 30, 2021 • edited Loading

Choose a reason for hiding this comment

rnewson Feb 5, 2021

Choose a reason for hiding this comment

nickva Feb 5, 2021 • edited Loading

Choose a reason for hiding this comment

davisp left a comment

Choose a reason for hiding this comment

nickva left a comment

Choose a reason for hiding this comment

nickva Jan 30, 2021 •

edited

Loading

nickva Feb 5, 2021 •

edited

Loading