Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication for large file failure #1563

Open
paulbert opened this issue Jul 3, 2018 · 5 comments
Open

Replication for large file failure #1563

paulbert opened this issue Jul 3, 2018 · 5 comments

Comments

@paulbert
Copy link
Member

paulbert commented Jul 3, 2018

@dogi @lmmrssa Found the error message in the CouchDB log. This is from an attempt to replicate a 602 MB file from dev to a Raspberry Pi

[info] 2018-07-03T22:29:09.916116Z nonode@nohost <0.47.0> -------- alarm_handler: {clear,system_memory_high_watermark}
[notice] 2018-07-03T22:29:21.240591Z nonode@nohost <0.30196.12> 0e69c39755 192.168.0.101:2200 192.168.0.56 test GET /_active_tasks 200 ok 3
[error] 2018-07-03T22:29:23.913049Z nonode@nohost <0.6292.13> -------- Replicator, request GET to "https://dev.media.mit.edu:2200/resources/5620f429d9e6f8c5df84062efb01ef39?atts_since=%5B%221-67864fe65beb9f0443773055962d24f9%22%5D&revs=true&open_revs=%5B%222-8424b6e5d487f38eda80075fe0e7e97c%22%5D&latest=true" failed due to error {connection_closed,mid_stream}
[notice] 2018-07-03T22:29:23.913958Z nonode@nohost <0.10111.12> -------- Retrying GET to https://dev.media.mit.edu:2200/resources/5620f429d9e6f8c5df84062efb01ef39?atts_since=%5B%221-67864fe65beb9f0443773055962d24f9%22%5D&revs=true&open_revs=%5B%222-8424b6e5d487f38eda80075fe0e7e97c%22%5D&latest=true in 256.0 seconds due to error {error,{connection_closed,mid_stream}}
@paulbert
Copy link
Member Author

paulbert commented Jul 5, 2018

Was able to replicate a 483 mb file from dev to raspberry pi running treehouses 57 and planet 0.2.7

@paulbert
Copy link
Member Author

paulbert commented Jul 6, 2018

Another failure with 236 MB file from dev to raspberry pi running treehouses stretchjune and planet 0.2.8:

[error] 2018-07-06T17:55:44.276983Z nonode@nohost <0.5739.54> -------- Replication crashing because GET https://dev.media.mit.edu:2200/resources/dfbc328c45c93f22ffd2088971108188?revs=true&open_revs=%5B%224-1d83b37ca3053d9574a83a6ab8ca318f%22%5D&latest=true failed
[error] 2018-07-06T17:55:44.277936Z nonode@nohost <0.4673.54> -------- Worker <0.4807.54> died with reason: {process_died,<0.5739.54>,kaboom}
[error] 2018-07-06T17:55:44.278852Z nonode@nohost <0.4673.54> -------- Replication `c47db1a0af04585f73551cd8af445e91` (`https://dev.media.mit.edu:2200/resources/` -> `http:https://192.168.0.101:2200/resources/`) failed: {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}}
[error] 2018-07-06T17:55:44.279777Z nonode@nohost <0.4807.54> -------- gen_server <0.4807.54> terminated with reason: {process_died,<0.5739.54>,kaboom}
  last msg: {'EXIT',<0.5739.54>,kaboom}
     state: {state,<0.4673.54>,<0.4818.54>,20,{httpdb,"https://dev.media.mit.edu:2200/resources/",nil,[{"Accept","application/json"},{"Authorization","Basic dGVzdEBwYXVsMDUwNzIwMTg6dGVzdA=="},{"User-Agent","CouchDB-Replicator/2.1.1"}],300000,[{is_ssl,true},{socket_options,[{keepalive,true},{nodelay,false}]},{ssl_options,[{depth,3},{verify,verify_none}]}],5,250,<0.4697.54>,20,nil,undefined},{httpdb,"http:https://192.168.0.101:2200/resources/",nil,[{"Accept","application/json"},{"Authorization","Basic dGVzdDp0ZXN0"},{"User-Agent","CouchDB-Replicator/2.1.1"}],300000,[{socket_options,[{keepalive,true},{nodelay,false}]}],5,250,<0.5511.54>,20,nil,undefined},[<0.5739.54>],nil,nil,{<0.4818.54>,#Ref<0.0.24.88108>},[{missing_checked,1},{missing_found,1}],nil,nil,{batch,[],0}}
[error] 2018-07-06T17:55:44.281542Z nonode@nohost <0.4807.54> -------- CRASH REPORT Process  (<0.4807.54>) with 1 neighbors exited with reason: {process_died,<0.5739.54>,kaboom} at gen_server:terminate/6(line:737) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_replicator_worker,init,['Argument__1']}, ancestors: [<0.4673.54>,couch_replicator_scheduler_sup,couch_replicator_sup,...], messages: [], links: [<0.4818.54>,<0.4673.54>], dictionary: [{last_stats_report,{1530,898530,840350}}], trap_exit: true, status: running, heap_size: 376, stack_size: 27, reductions: 178
[notice] 2018-07-06T17:55:44.283415Z nonode@nohost <0.515.0> -------- couch_replicator_scheduler: Job {"c47db1a0af04585f73551cd8af445e91",[]} started as <0.28421.54>
[error] 2018-07-06T17:55:44.285461Z nonode@nohost <0.4673.54> -------- gen_server {couch_replicator_scheduler_job,{[99,52,55,100,98,49,97,48,97,102,48,52,53,56,53,102,55,51,53,53,49,99,100,56,97,102,52,52,53,101,57,49],[]}} terminated with reason: {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}}
  last msg: {'EXIT',<0.4807.54>,{process_died,<0.5739.54>,kaboom}}
     state: [{rep_id,{"c47db1a0af04585f73551cd8af445e91",[]}},{source,"https://dev.media.mit.edu:2200/resources/"},{target,"http:https://192.168.0.101:2200/resources/"},{db_name,<<"shards/60000000-7fffffff/_replicator.1530801482">>},{doc_id,<<"resources_pull_1530898528680">>},{options,[{checkpoint_interval,10000},{connection_timeout,300000},{create_target,false},{http_connections,20},{retries,5},{selector,{[{<<"$or">>,[{[{<<"_id">>,<<"dfbc328c45c93f22ffd2088971108188">>},{<<"_rev">>,<<"4-1d83b37ca3053d9574a83a6ab8ca318f">>}]}]}]}},{socket_options,[{keepalive,true},{nodelay,false}]},{use_checkpoints,true},{worker_batch_size,500},{worker_processes,4}]},{session_id,<<"1b0ee92a8ef661c4fa84382299c237e7">>},{start_seq,{0,0}},{source_seq,<<"240-g1AAAAEzeJzLYWBg4MhgTmHgzcvPy09JdcjLz8gvLskBCjMlMiTJ____PytREIeCJAUgmWQPVqOKS40DSE08WI0iLjUJIDX1YDX8ONTksQBJhgYgBVQ2H7ebIOoWQNTtz0o0xavuAETd_axEMbzqHkDUAd2nlwUA_GJj5g">>},{committed_seq,{0,0}},{current_through_seq,{0,0}},{highest_seq_done,{0,0}}]
[error] 2018-07-06T17:55:44.287446Z nonode@nohost <0.4673.54> -------- CRASH REPORT Process  (<0.4673.54>) with 1 neighbors exited with reason: {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}} at gen_server:terminate/6(line:737) <= proc_lib:init_p_do_apply/3(line:237); initial_call: {couch_replicator_scheduler_job,init,['Argument__1']}, ancestors: [couch_replicator_scheduler_sup,couch_replicator_sup,...], messages: [], links: [<0.4798.54>,<0.499.0>], dictionary: [{task_status_props,[{changes_pending,null},{checkpoint_interval,...},...]},...], trap_exit: true, status: running, heap_size: 2586, stack_size: 27, reductions: 33060
[error] 2018-07-06T17:55:44.288827Z nonode@nohost <0.499.0> -------- Supervisor couch_replicator_scheduler_sup had child undefined started with {couch_replicator_scheduler_job,start_link,undefined} at <0.4673.54> exit with reason {worker_died,<0.4807.54>,{process_died,<0.5739.54>,kaboom}} in context child_terminated

@paulbert
Copy link
Member Author

Error in the middle of logs during replication:

[error] 2018-07-10T16:04:53.849724Z nonode@nohost <0.14059.25> -------- Replicator, request GET to "https://dev.media.mit.edu:2200/resources/" failed due to error {error,{conn_failed,{error,ehostunreach}}}

@paulbert
Copy link
Member Author

Another error, from the parent/source CouchDB:

[error] 2018-07-17T17:24:38.901699Z nonode@nohost <0.8396.0> -------- Replicator, request PUT to "https://dev.media.mit.edu:2200/resources/47b2bf59c96d58704147a870f110f879?new_edits=false" failed due to error {error,
    {'EXIT',
        {{{nocatch,{mp_parser_died,noproc}},
          [{couch_att,'-foldl/4-fun-0-',3,
               [{file,"src/couch_att.erl"},{line,613}]},
           {couch_att,fold_streamed_data,4,
               [{file,"src/couch_att.erl"},{line,664}]},
           {couch_att,foldl,4,[{file,"src/couch_att.erl"},{line,617}]},
           {couch_httpd_multipart,atts_to_mp,4,
               [{file,"src/couch_httpd_multipart.erl"},{line,208}]}]},
         {gen_server,call,
             [<0.14678.0>,
              {send_req,
                  {{url,
                       "https://dev.media.mit.edu:2200/resources/47b2bf59c96d58704147a870f110f879?new_edits=false",
                       "dev.media.mit.edu",2200,undefined,undefined,
                       "/resources/47b2bf59c96d58704147a870f110f879?new_edits=false",
                       https,hostname},
                   [{"Accept","application/json"},
                    {"Authorization","Basic ZGV2OnZlZA=="},
                    {"Content-Length",247482906},
                    {"Content-Type",
                     "multipart/related; boundary=\"b556865da86f4acaf91e457b681c5048\""},
                    {"User-Agent","CouchDB-Replicator/2.1.1"}],
                   put,
                   {#Fun<couch_replicator_api_wrap.11.3480007>,
                    {<<"{\"_id\":\"47b2bf59c96d58704147a870f110f879\",\"_rev\":\"4-fcfed687ef902caf9acd4bbb816d375a\",\"title\":\"A Collection of Episodes: Star Trek (The Next Generation)\",\"author\":\"\",\"year\":\"\",\"description\":\"TV Show episode\",\"language\":\"\",\"publisher\":\"\",\"linkToLicense\":\"\",\"subject\":[\"Agriculture\"],\"level\":[\"Early Education\"],\"openWith\":\"\",\"resourceFor\":null,\"medium\":\"\",\"articleDate\":1529700743514,\"resourceType\":\"\",\"addedBy\":\"earth\",\"openUrl\":null,\"openWhichFile\":\"\",\"isDownloadable\":\"\",\"filename\":\"Star Trek TNG - 5x02 - Darmok.avi.mp4\",\"mediaType\":\"video\",\"sourcePlanet\":\"earth\",\"resideOn\":\"earth\",\"createdDate\":1530911380349,\"updatedDate\":1530912269118,\"_revisions\":{\"start\":4,\"ids\":[\"fcfed687ef902caf9acd4bbb816d375a\",\"00c6f8f6c3e473eaa12a5d31ce0fd288\",\"3e6c17a22e3d233a5cd50da3f4a8e299\",\"3f91526c3fd3e3ac99c68c91faddc0a6\"]},\"_attachments\":{\"Star Trek TNG - 5x02 - Darmok.avi.mp4\":{\"content_type\":\"video/mp4\",\"revpos\":2,\"digest\":\"md5-DQjX7ueKUeEMEhETdg5RyA==\",\"length\":247481637,\"follows\":true}}}">>,
                     [{att,<<"Star Trek TNG - 5x02 - Darmok.avi.mp4">>,
                          <<"video/mp4">>,247481637,247481637,
                          <<13,8,215,238,231,138,81,225,12,18,17,19,118,14,81,
                            200>>,
                          2,
                          {follows,<0.8395.0>,#Ref<0.0.0.258077>},
                          identity}],
                     <<"b556865da86f4acaf91e457b681c5048">>,247482906}},
                   [{response_format,binary},
                    {inactivity_timeout,30000},
                    {is_ssl,true},
                    {socket_options,[{keepalive,true},{nodelay,false}]},
                    {ssl_options,[{depth,3},{verify,verify_none}]}],
                   infinity}},
              infinity]}}}}

@paulbert
Copy link
Member Author

Error on my Raspberry Pi:

[error] 2018-07-17T17:35:01.286158Z nonode@nohost <0.16582.1> -------- Replicator, request GET to "https://dev.media.mit.edu:2200/resources/dfbc328c45c93f22ffd2088971108188?revs=true&open_revs=%5B%226-a47984c7d1f0e6cfa5e0127c74dfe0b3%22%5D&latest=true" failed due to error req_timedout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant