Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom replication DBs never get /_scheduler/docs entries #506

Closed
wohali opened this issue May 3, 2017 · 6 comments
Closed

Custom replication DBs never get /_scheduler/docs entries #506

wohali opened this issue May 3, 2017 · 6 comments

Comments

@wohali
Copy link
Member

wohali commented May 3, 2017

Steps to recreate

  1. Create database a and put a few documents in it. Ensure a database b does not exist.
  2. Create a _replicator document of the form:
{ "_id": "foo_error_rep", "source": "https://127.0.0.1:15984/a", "target": "https://127.0.0.1:15984/b" }
  1. Wait a bit and check _replicator/foo_error_rep. No state has been added. I would have expected one of crashing, running or pending.

Logfile excerpt

Note the throw of db_not_found.

[notice] 2017-05-03T21:49:42.495501Z [email protected] <0.309.0> 86f37313e7 127.0.0.1:15984 127.0.0.1 undefined GET /test_suite_db_ikklhwd%2F_replicator/foo_error_rep 200 ok 50
[notice] 2017-05-03T21:49:42.497481Z [email protected] <0.354.0> -------- starting new replication `b1137c827da5adb4376166374dcf79eb` at <0.1144.0> (`https://127.0.0.1:15984/test_suite_db_mxbhpygg/` -> `https://127.0.0.1:15984/nonexistent_test_db/`)
[notice] 2017-05-03T21:49:42.497981Z [email protected] <0.355.0> -------- couch_replicator_scheduler: Job {"b1137c827da5adb4376166374dcf79eb",[]} started as <0.1144.0>
[notice] 2017-05-03T21:49:42.547996Z [email protected] <0.309.0> 5ae0234b3c 127.0.0.1:15984 127.0.0.1 undefined GET /test_suite_db_ikklhwd%2F_replicator/foo_error_rep 200 ok 2
[notice] 2017-05-03T21:49:42.656934Z [email protected] <0.961.0> 7dd6682461 127.0.0.1:15984 127.0.0.1 undefined GET /test_suite_db_mxbhpygg/ 200 ok 147
[notice] 2017-05-03T21:49:42.657324Z [email protected] <0.309.0> 94c75a3e9a 127.0.0.1:15984 127.0.0.1 undefined GET /test_suite_db_ikklhwd%2F_replicator/foo_error_rep 200 ok 58
[notice] 2017-05-03T21:49:42.659778Z [email protected] <0.961.0> 286570172d 127.0.0.1:15984 127.0.0.1 undefined GET /nonexistent_test_db/ 404 ok 1
[error] 2017-05-03T21:49:42.660836Z [email protected] <0.1144.0> -------- throw:{db_not_found,<<"could not open https://127.0.0.1:15984/nonexistent_test_db/">>}: Replication failed to start for args {rep,{"b1137c827da5adb4376166374dcf79eb",[]},{httpdb,"https://127.0.0.1:15984/test_suite_db_mxbhpygg/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/2.1.0-5cad2a4"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20,nil,undefined},{httpdb,"https://127.0.0.1:15984/nonexistent_test_db/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/2.1.0-5cad2a4"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20,nil,undefined},[{checkpoint_interval,30000},{connection_timeout,30000},{http_connections,20},{retries,10},{socket_options,[{keepalive,true},{nodelay,false}]},{use_checkpoints,true},{worker_batch_size,500},{worker_processes,4}],{user_ctx,null,[],undefined},db,nil,<<"foo_error_rep">>,<<"shards/00000000-1fffffff/test_suite_db_ikklhwd/_replicator.1493848173">>,{1493,848182,496186}}: [{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,109}]},{couch_replicator_scheduler_job,init_state,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,568}]},{couch_replicator_scheduler_job,do_init,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,127}]},{couch_replicator_scheduler_job,handle_info,2,[{file,"src/couch_replicator_scheduler_job.erl"},{line,357}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]
[notice] 2017-05-03T21:49:42.709448Z [email protected] <0.309.0> 5a9498eacf 127.0.0.1:15984 127.0.0.1 undefined GET /test_suite_db_ikklhwd%2F_replicator/foo_error_rep 200 ok 2

/cc @nickva I think this is a new bug related to the scheduling replicator work.

@wohali wohali added this to the 2.1.0 milestone May 3, 2017
@nickva
Copy link
Contributor

nickva commented May 3, 2017

@wohali

Thanks for taking a look!

By default scheduling replicator does not update replication documents with transient states such as "triggered", "crashing", "error". This is one of the improvements and it is what allows it to run more replication jobs (since they don't write to disk as much).

To observe the state of the replications created in by that document, try to do one of these things (or both):

  1. Query https://127.0.0.1:15984/_scheduler/docs endpoint. You'd get all the replication documents there. Or query the specific document. https://127.0.0.1:15984/_scheduler/docs/foo_error_rep

  2. Enable compatibility mode [replicator] update_docs = True then replicator will behave very similarly to the old one and you should see error or triggered there. And you can still query document as before.

@wohali
Copy link
Member Author

wohali commented May 3, 2017

OK, I've rebuilt from master, and I've shifted over to using /_scheduler/docs/foo_error_rep which always shows up as a 404. I'm still getting a failure:

[notice] 2017-05-03T22:57:34.454264Z [email protected] <0.309.0> cc86958549 127.0.0.1:15984 127.0.0.1 undefined PUT /test_suite_db_hhyyrpqf%2F_replicator/foo_error_rep 201 ok 99
[notice] 2017-05-03T22:57:34.486167Z [email protected] <0.309.0> 79abcb1691 127.0.0.1:15984 127.0.0.1 undefined GET /_scheduler/docs/foo_error_rep 404 ok 30
...
[notice] 2017-05-03T22:57:38.881115Z [email protected] <0.343.0> -------- couch_replicator_clustering : publish cluster `stable` event
[notice] 2017-05-03T22:57:38.881746Z [email protected] <0.398.0> -------- Started replicator db changes listener <0.769.0>
[notice] 2017-05-03T22:57:38.916464Z [email protected] <0.348.0> -------- starting new replication `6e12fe920e76120ef871df8e18d208b0` at <0.811.0> (`https://127.0.0.1:15984/test_suite_db_dhrvtir/` -> `https://127.0.0.1:15984/nonexistent_test_db/`)
[notice] 2017-05-03T22:57:38.916701Z [email protected] <0.349.0> -------- couch_replicator_scheduler: Job {"6e12fe920e76120ef871df8e18d208b0",[]} started as <0.811.0>
[notice] 2017-05-03T22:57:38.918651Z [email protected] <0.309.0> 9e5593616e 127.0.0.1:15984 127.0.0.1 undefined GET /_scheduler/docs/foo_error_rep 404 ok 1
[notice] 2017-05-03T22:57:38.970437Z [email protected] <0.309.0> bdd084312b 127.0.0.1:15984 127.0.0.1 undefined GET /_scheduler/docs/foo_error_rep 404 ok 1
[notice] 2017-05-03T22:57:39.013987Z [email protected] <0.712.0> 0bab9a6ac6 127.0.0.1:15984 127.0.0.1 undefined GET /test_suite_db_dhrvtir/ 200 ok 81
[notice] 2017-05-03T22:57:39.016448Z [email protected] <0.712.0> 116c67f1f5 127.0.0.1:15984 127.0.0.1 undefined GET /nonexistent_test_db/ 404 ok 1
[error] 2017-05-03T22:57:39.018030Z [email protected] <0.811.0> -------- throw:{db_not_found,<<"could not open https://127.0.0.1:15984/nonexistent_test_db/">>}: Replication failed to start for args {rep,{"6e12fe920e76120ef871df8e18d208b0",[]},{httpdb,"https://127.0.0.1:15984/test_suite_db_dhrvtir/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/2.1.0-5cad2a4"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20,nil,undefined},{httpdb,"https://127.0.0.1:15984/nonexistent_test_db/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/2.1.0-5cad2a4"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20,nil,undefined},[{checkpoint_interval,30000},{connection_timeout,30000},{http_connections,20},{retries,10},{socket_options,[{keepalive,true},{nodelay,false}]},{use_checkpoints,true},{worker_batch_size,500},{worker_processes,4}],{user_ctx,null,[],undefined},db,nil,<<"foo_error_rep">>,<<"shards/00000000-1fffffff/test_suite_db_hhyyrpqf/_replicator.1493852250">>,{1493,852258,915293}}: [{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,109}]},{couch_replicator_scheduler_job,init_state,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,568}]},{couch_replicator_scheduler_job,do_init,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,127}]},{couch_replicator_scheduler_job,handle_info,2,[{file,"src/couch_replicator_scheduler_job.erl"},{line,357}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]
[notice] 2017-05-03T22:57:39.022584Z [email protected] <0.309.0> 46149cd223 127.0.0.1:15984 127.0.0.1 undefined GET /_scheduler/docs/foo_error_rep 404 ok 1

@wohali
Copy link
Member Author

wohali commented May 4, 2017

After investigation and chatting with you on IRC, I think this is because the standard JS harness is creating a custom replicator DB at test_suite_db_<letters>%2F_replicator and the scheduling replicator doesn't know what to do about that when a job fails.

After PUTting a doc into that database, waiting the requisite time, and getting a failure, I did a GET /_scheduler/docs and get back: {"total_rows":0,"offset":0,"docs":[]} so I think my hunch is correct.

It also occurs to me that you're going to have to de-duplicate doc IDs from different _replicator databases so perhaps the right thing to do is to prepend the replicator DB's name to the doc ID if it's not the default _replicator. In this case that means I'd have to GET /_scheduler/docs/test_suite_db_quzroaxh%2F_replicator/foo_error_rep which is acceptable, if ugly.

Edited log:

[notice] 2017-05-03T23:58:22.747195Z [email protected] <0.309.0> 9bde503c0a 127.0.0.1:15984 127.0.0.1 undefined PUT /test_suite_db_quzroaxh%2F_replicator/foo_error_rep 201 ok 136
[notice] 2017-05-03T23:58:27.179010Z [email protected] <0.348.0> -------- starting new replication `8742b3ea46dfe82f96a36485f46c6738` at <0.639.0> (`https://127.0.0.1:15984/test_suite_db_euuinovc/` -> `https://127.0.0.1:15984/nonexistent_test_db/`)
[notice] 2017-05-03T23:58:27.179239Z [email protected] <0.349.0> -------- couch_replicator_scheduler: Job {"8742b3ea46dfe82f96a36485f46c6738",[]} started as <0.639.0>
[error] 2017-05-03T23:58:27.303437Z [email protected] <0.639.0> -------- throw:{db_not_found,<<"could not open https://127.0.0.1:15984/nonexistent_test_db/">>}: Replication failed to start for args {rep,{"8742b3ea46dfe82f96a36485f46c6738",[]},{httpdb,"https://127.0.0.1:15984/test_suite_db_euuinovc/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/2.1.0-5cad2a4"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20,nil,undefined},{httpdb,"https://127.0.0.1:15984/nonexistent_test_db/",nil,[{"Accept","application/json"},{"User-Agent","CouchDB-Replicator/2.1.0-5cad2a4"}],30000,[{socket_options,[{keepalive,true},{nodelay,false}]}],10,250,nil,20,nil,undefined},[{checkpoint_interval,30000},{connection_timeout,30000},{http_connections,20},{retries,10},{socket_options,[{keepalive,true},{nodelay,false}]},{use_checkpoints,true},{worker_batch_size,500},{worker_processes,4}],{user_ctx,null,[],undefined},db,nil,<<"foo_error_rep">>,<<"shards/00000000-1fffffff/test_suite_db_quzroaxh/_replicator.1493855893">>,{1493,855907,177746}}: [{couch_replicator_api_wrap,db_open,3,[{file,"src/couch_replicator_api_wrap.erl"},{line,109}]},{couch_replicator_scheduler_job,init_state,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,568}]},{couch_replicator_scheduler_job,do_init,1,[{file,"src/couch_replicator_scheduler_job.erl"},{line,127}]},{couch_replicator_scheduler_job,handle_info,2,[{file,"src/couch_replicator_scheduler_job.erl"},{line,357}]},{gen_server,handle_msg,5,[{file,"gen_server.erl"},{line,599}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,237}]}]
[notice] 2017-05-03T23:58:27.752731Z [email protected] <0.309.0> 6e697a45b0 127.0.0.1:15984 127.0.0.1 undefined GET /_scheduler/docs 200 ok 4

@wohali wohali added the api label May 4, 2017
@wohali wohali changed the title replication to non-existent target db never updates _replicator doc with state Custom replication DBs never get /_scheduler/docs entries May 4, 2017
@sagelywizard
Copy link
Member

Most sensible solution to me solution is to make two changes:

  1. add GET /_scheduler/docs/:db/:doc endpoint. :db would need to be URL-encoded for slashes. Should be a pretty straightforward change.
  2. make the GET /_scheduler/docs endpoint use a couch_replicator_fabric:docs-type function to make requests to shards of all known replicator DBs rather than just _replicator.

@nickva
Copy link
Contributor

nickva commented May 8, 2017

  1. Yap. Can be done relatively simply. Also it looks like we could even allow un-escaped dbname and then rebuild the db name from path. The reason this is possible is because _replicator is not a valid document ID.

  2. This will be harder to implement. Currently _scheduler/docs mimics (and maybe ven shares some code with) _all_docs. Doing a multi-db _all_docs kind of thing might be tricky, especially in respect to handling limit, offset, total_rows.

nickva added a commit to cloudant/couchdb that referenced this issue May 8, 2017
Previously _scheduler/docs assumed only the default _replicator db.

Now these kinds of path are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/docid` : a specific doc from the default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: apache#506
nickva added a commit to cloudant/couchdb that referenced this issue May 8, 2017
Previously _scheduler/docs assumed only the default _replicator db.

Now these kinds of paths are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/docid` : a specific doc from the default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: apache#506
nickva added a commit to cloudant/couchdb that referenced this issue May 8, 2017
Previously _scheduler/docs assumed only the default _replicator db.

Now these kinds of paths are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/docid` : a specific doc from the default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: apache#506
nickva added a commit to cloudant/couchdb that referenced this issue May 8, 2017
Previously _scheduler/docs assumed only the default _replicator db.

Now these kinds of paths are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/docid` : a specific doc from the default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: apache#506
nickva added a commit to cloudant/couchdb that referenced this issue May 9, 2017
Previously _scheduler/docs assumed only the default _replicator db.

Now these kinds of paths are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/docid` : a specific doc from the default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: apache#506
nickva added a commit to cloudant/couchdb that referenced this issue May 9, 2017
Previously _scheduler/docs assumed only the default _replicator db.

To provide consistency and to allow disambiguation between a db named
'db/_replicator' and the document named 'db/_replicator' in the
default replicator db, access to the single document API is changed to
always require the replicator db. That is `/docid` should not be
`/_replicator/docid`.

Now these kinds of paths are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: apache#506
nickva added a commit that referenced this issue May 9, 2017
Previously _scheduler/docs assumed only the default _replicator db.

To provide consistency and to allow disambiguation between a db named
'db/_replicator' and the document named 'db/_replicator' in the
default replicator db, access to the single document API is changed to
always require the replicator db. That is `/docid` should not be
`/_replicator/docid`.

Now these kinds of paths are accepted after `_scheduler/docs`:

 * `/` : all docs from default _replicator db
 * `/_replicator` : all docs from default replicator db
 * `/other%2f_replicator` : non-default replicator db, urlencoded
 * `/other/_replicator` : non-default replicator db, unencoded
 * `/other%2f_replicator/docid` : doc from a non-default db, urlencoded
 * `/other/_replicator/docid` : doc from a non-default db, db is unencoded

Because `_replicator` is not a valid document ID, it's possible to unambiguously
parse unescaped db paths.

Issue: #506
@wohali
Copy link
Member Author

wohali commented May 9, 2017

Fixed by #509.

@wohali wohali closed this as completed May 9, 2017
nickva added a commit to apache/couchdb-documentation that referenced this issue May 10, 2017
Document these endpoints:
 * _scheduler/docs/{replicator_db}
 * _scheduler/docs/{replicator_db}/{docid}

Update replicator example due to the API change in how single replicator doc
info is retrieved.

Issue apache/couchdb#506
nickva added a commit to apache/couchdb-documentation that referenced this issue May 10, 2017
Document these endpoints:
 * _scheduler/docs/{replicator_db}
 * _scheduler/docs/{replicator_db}/{docid}

Update replicator example due to the API change in how single replicator doc
info is retrieved.

Issue apache/couchdb#506
jiangphcn pushed a commit to cloudant/couchdb-documentation that referenced this issue Mar 16, 2018
Document these endpoints:
 * _scheduler/docs/{replicator_db}
 * _scheduler/docs/{replicator_db}/{docid}

Update replicator example due to the API change in how single replicator doc
info is retrieved.

Issue apache/couchdb#506
nickva added a commit to nickva/couchdb that referenced this issue Sep 7, 2022
Document these endpoints:
 * _scheduler/docs/{replicator_db}
 * _scheduler/docs/{replicator_db}/{docid}

Update replicator example due to the API change in how single replicator doc
info is retrieved.

Issue apache#506
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants