-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication crashes just on single database from many #4204
Comments
There is one retry which the replicator will do that case. You can adjust the sleep period with this setting |
I added this parameter, but it didn't help. |
If you see One case where that could also apply is if there are lot of conflicted revisions, those would end passed to the fetch document request as Here is where I found a reference to it: couchdb/src/couch_replicator/src/couch_replicator_api_wrap.erl Lines 346 to 363 in 21eebad
|
After some drilling down and increasing logging level I managed to find some errors related to missing revision numbers: `61374 [error] 2022-10-17T11:54:12.491238Z [email protected] <0.31178.16> -------- Retrying fetch and update of document ABC as it is unexpectedly missing. Missing revisions are: 9-6ab086bc66baa1fffe312b90654d90e5 61419 [debug] 2022-10-17T11:54:32.173657Z [email protected] <0.217.0> -------- New task status for <0.20065.16>: [{changes_pending,null},{checkpoint_interval,30000},{checkpointed_source_seq,0},{continuous,true},{database,<<"shards/40000000-5fffffff/_replicator.1665052660">>},{doc_id,<<"ngraph">>},{doc_write_failures,0},{docs_read,92279},{docs_written,92279},{missing_revisions_found,92279},{replication_id,<<"58c7cfcc0f22e9d73693b78ec745e04d+continuous">>},{revisions_checked,406477},{source,<<"http:https://admin:[email protected]/cdb2/test/">>},{source_seq,<<"79173-g1AAAAJ7eJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____szKYkxgY1HbkAsXYk41NzUwtE7HpwWNSkgKQTLJHGDYJbJhZkmWqqWUyqYY5gAyLRxi2HGyYcbKpUYoByS5LABlWjzCsGmxYooV5oomJKYmG5bEASYYGIAU0bz7UwHawgYaGxoYGluZkGbgAYuB-qIH7wQYaGJlYpqUkkWXgAYiB96EGHgIbmGpqaJqYaEmWgQ8gBsLC8CLEQAMzCwszC2xaswBFLKSv">>},{started_on,1666007643},{target,<<"http:https://admin:[email protected]/cdb2/test/">>},{through_seq,<<"78031-g1AAAAJ7eJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____szKYkxgY1HRygWLsycamZqaWidj04DEpSQFIJtkjDGMDG2aWZJlqaplMqmEOIMPiEYYJgQ0zTjY1SjEg2WUJIMPq4YapfgcblmhhnmhiYkqiYXksQJKhAUgBzZsPNfA32EBDQ2NDA0tzsgxcADFwP9S7-mADDYxMLNNSksgy8ADEwPsoBqaaGpomJlqSZeADiIGwCLGGGGhgZmFhZoFNaxYAUe6iNw">>},{type,replication},{updated_on,1666007672},{user,null}]` I checked the document to which this error is regarding and the revision 9-6ab086bc66baa1fffe312b90654d90e5 exists on the document from the replication source. Any ideas are welcome... |
Try setting the Another thing to try is to check if this happens on the latest release 3.2.2 or not. Specifically it would be to check if when both the source and instance running the replication jobs are updated, if this happens as well. |
|
I haven't tried the upgrade yet. What does 404 error mean in this context? I guess it's a leftover old revision. Do I have to manually purge old revisions on source (is it viable)? Or maybe manual compaction is enough? |
@arahjan During replication the deleted document tombstones (markers) are also replicated. That's needed because if we have the same document on the target, we'd want it to also be deleted if the source deletes it. There are a few way to avoid replicating tombstones or removing them later:
Purging could work as well. |
Couchdb 2.3.1 on Centos7
I'm replicating more than 20dbs from one server to another. The process works flawlessly apart from a single database.
The database in question consists of more than 100k small documents like below:
"db_name": "test", "purge_seq": "0-g1AAAAFTeJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMeC5BkeACk_gNBViIDHrVJCUAyqZ6gOoiZCyBm7idG7QGI2vsE7FcA2W9P0P5EhiR5wp5xABkWT6RnGiAOnA9UmwUAtixejg", "update_seq": "100785-g1AAAAFreJzLYWBg4MhgTmEQTM4vTc5ISXIwNDLXMwBCwxygFFMiQ5L8____s5IYGAyr8KhLUgCSSfYwpdX4lDqAlMZDlRrswqc0AaS0HmaqPx6leSxAkqEBSAFVzwcr_0FQ-QKI8v1gh_wlqPwARPl9sOnsBJU_gCiHeHN7FgAkvmTM", "sizes": { "file": 82658990, "external": 77019483, "active": 82112519 }, "other": { "data_size": 77019483 }, "doc_del_count": 4, "doc_count": 100766, "disk_size": 82658990, "disk_format_version": 7, "data_size": 82112519, "compact_running": false, "cluster": { "q": 8, "n": 1, "w": 1, "r": 1 }, "instance_start_time": "0"
After replicating about 79k docs, the replication crashes with output like below:
[error] 2022-10-12T10:43:26.563124Z [email protected] <0.30702.7> -------- CRASH REPORT Process (<0.30702.7>) with 5 neighbors exited with reason: {worker_died,<0.30700.7>,{process_died,<0.3280.8>,{{nocatch,missing_doc},[{couch_replicator_api_wrap,open_doc_revs,6,[{file,"src/couch_replicator_api_wrap.erl"},{line,302}]},{couch_replicator_worker,'-spawn_doc_reader/3-fun-1-',4,[{file,"src/couch_replicator_worker.erl"},{line,323}]}]}}} at gen_server:terminate/7(line:812) <= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_replicator_worker,init,['Argument__1']}, ancestors: [<0.30607.7>,couch_replicator_scheduler_sup,couch_replicator_sup,...], messages: [], links: [<0.3401.8>,<0.3503.8>,<0.3700.8>,<0.3413.8>,<0.30703.7>], dictionary: [{last_stats_report,{1665,571404,580133}}], trap_exit: true, status: running, heap_size: 6772, stack_size: 27, reductions: 77732
When I copied this db manually to the second server, there is no problem anymore. Can add documents on main server and they are being copied to the second one.
What is a possible culprit of this issue?
The text was updated successfully, but these errors were encountered: