Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/_scheduler/docs/_replicator/{doc_id} sometimes returns 500 #1000

Open
flimzy opened this issue Nov 16, 2017 · 1 comment
Open

/_scheduler/docs/_replicator/{doc_id} sometimes returns 500 #1000

flimzy opened this issue Nov 16, 2017 · 1 comment

Comments

@flimzy
Copy link
Member

flimzy commented Nov 16, 2017

Occasionally while querying the /_scheduler/docs/_replicator/{doc_id} endpoint for a recently created replication, I get a 500 error:

HTTP/1.1 500 Internal Server Error
Content-Length: 70
Cache-Control: must-revalidate
Content-Type: application/json
Date: Thu, 16 Nov 2017 20:14:25 GMT
Server: CouchDB/2.1.0 (Erlang OTP/17)
X-Couch-Request-Id: 65913f4727
X-Couch-Stack-Hash: 3194022798
X-Couchdb-Body-Time: 0

{"error":"unknown_error","reason":"function_clause","ref":3194022798}

In the logs, I see something like this:

[notice] 2017-11-16T20:03:25.316906Z nonode@nohost <0.19112.66> -------- Replication `b9d7fa6c1c76ccbfa6549b93bf7c1e78` completed (triggered by `001415590d64afd6ff7c5c7e7c00fd3f`)
[notice] 2017-11-16T20:03:25.317197Z nonode@nohost <0.353.0> -------- couch_replicator_scheduler: Job {"b9d7fa6c1c76ccbfa6549b93bf7c1e78",[]} completed normally
[error] 2017-11-16T20:03:25.319390Z nonode@nohost <0.18804.66> 332478bac4 req_err(3194022798) unknown_error : function_clause
    [<<"couch_replicator_httpd_util:update_db_name/1 L182">>,<<"couch_replicator_httpd:handle_scheduler_doc/3 L138">>,<<"chttpd:process_request/1 L295">>,<<"chttpd:handle_request_int/1 L231">>,<<"mochiweb_http:headers/6 L91">>,<<"proc_lib:init_p_do_apply/3 L237">>]
[notice] 2017-11-16T20:03:25.319694Z nonode@nohost <0.18804.66> 332478bac4 localhost:6002 172.17.0.1 undefined GET /_scheduler/docs/_replicator/001415590d64afd6ff7c5c7e7c00fd3f 500 ok 1

Based both on my observations, and the log output, I'm guessing it's a race condition of some sort; when the query happens at precisely the right moment with respect to the replication status update (or perhaps specifically a completion), the error occurs.

I have a pretty consistent reproduction case (estimate 90% of the time) in Go, as part of a much larger project. I tried reproducing it in Bash, and wasn't able to. If it's helpful to debug, I can attempt to produce a minimal reproduction case in Go, along with compilation and execution instructions.

@nickva
Copy link
Contributor

nickva commented Nov 16, 2017

Nice find, @flimzy !

Yup looks like a bug. I see the job completed, so it is probably right in the transition between the running and completed state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants