Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large number of DBs freeze the cluster when a node dies #3630

Open
vladimirralev opened this issue Jun 15, 2021 · 5 comments
Open

Large number of DBs freeze the cluster when a node dies #3630

vladimirralev opened this issue Jun 15, 2021 · 5 comments

Comments

@vladimirralev
Copy link

vladimirralev commented Jun 15, 2021

Description

I create 100,000 identical test databases with 100 documents each(or more in other tests) on a 3 node cluster. Then I bring one node down and continue to create databases in the remaining cluster. At this point creating new DBs doesn't work anymore and times out. More tests show there is a gradual slowness buildup noticeable from 10K DBs onwards progressing to a completely unusable state at about 60K DB (when a node is down). When all nodes are up the nodes sync and the cluster is very fast again.

Steps to Reproduce

Build a cluster with 3 machines r4-couch01-03. Create 100K DBs on r4-couch01. Then bring down the r4-couch03 machine and watch the script freeze. Additional replication attempts with the script also fail similarly. Issue is reproducible with 4-node cluster as well.

I use this script to replicate the DB many times on r4-couch01

for i in {1..200000}
do
 echo "Doing $i"
 curl -m 9600 -X POST -H "Content-Type: application/json" -d "{\"source\":\"http:https://user:pass@r4-couch01:5984/testdb100docs\",\"target\":\"http:https://user:pass@r4-couch01:5984/smalltestdb$i\",\"create_target\":true}" http:https://user:pass@r4-couch01:5984/_replicate
done

Expected Behaviour

I expect the cluster to continue working when one node is down, and even with two nodes down for my config.

Your Environment

{"couchdb":"Welcome","version":"3.1.1","git_sha":"ce596c65d","uuid":"6d44338b0b68f9437184992aa3587239","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"],"vendor":{"name":"The Apache Software Foundation"}}

Settings are defaults from the distro rpm on centos7

[cluster]
q=2
n=3

Here is one DB stats:

{"db_name":"smalltestdb1","purge_seq":"0-g1AAAAB_eJzLYWBgYMpgTmFQTc4vTc5ISXIoMtEFMw0M9TLzSvSKMvPSkyrzEnNT9ZLzc3NAyvNYgCRDA5D6DwRZiQwk6k9kSKqHaMwCANUcKg4","update_seq":"99-g1AAAACFeJzLYWBgYMpgTmFQTc4vTc5ISXIoMtEFMw0M9TLzSvSKMvPSkyrzEnNT9ZLzc3NAyvNYgCRDA5D6DwRZSQwMrKtJNCKRIakeqpeNJQsAvewqyg","sizes":{"file":8352220,"external":8386353,"active":6437405},"props":{},"doc_del_count":0,"doc_count":100,"disk_format_version":8,"compact_running":false,"cluster":{"q":2,"n":3,"w":2,"r":2},"instance_start_time":"0"}

This test was done on CouchDB3.1.1, but the same issue is present on CouchDB2.1 as well. The only known version that doesn't suffer from this is bigcouch (0.4.1)

Same issue is present with DBs of any size tested - from 100 documents to 10K documents.

Additional Context

I tried it with debug logs and with no logging enabled to rule out some excessive logging issue.

Debug logs show this:

[notice] 2021-06-13T07:32:32.180795Z [email protected] <0.271.0> -------- rexi_server_mon : cluster unstable
[notice] 2021-06-13T07:32:32.180959Z [email protected] <0.270.0> -------- rexi_buffer : cluster unstable
[notice] 2021-06-13T07:32:32.180951Z [email protected] <0.1127.0> -------- couch_replicator_clustering : cluste
r unstable
[notice] 2021-06-13T07:32:32.181281Z [email protected] <0.1788.0> -------- Stopping replicator db changes liste
ner <0.24052.80>
[notice] 2021-06-13T07:32:32.181103Z [email protected] <0.265.0> -------- rexi_server_mon : cluster unstable
[debug] 2021-06-13T07:32:32.181523Z [email protected] <0.269.0> -------- Supervisor rexi_buffer_sup started rex
i_buffer:start_link('[email protected]') at pid <0.5747.81>
[notice] 2021-06-13T07:32:32.181613Z [email protected] <0.264.0> -------- rexi_server : cluster unstable
[debug] 2021-06-13T07:32:32.182003Z [email protected] <0.263.0> -------- Supervisor rexi_server_sup started rex
i_server:start_link('[email protected]') at pid <0.9730.81>
.......
.....
....
[debug] 2021-06-13T07:32:37.676293Z [email protected] <0.289.0> -------- adding shards/00000000-7fffffff/smalltestdb1.1622759864 -> '[email protected]' to mem3_sync queue
[debug] 2021-06-13T07:32:37.677179Z [email protected] <0.289.0> -------- adding shards/80000000-ffffffff/smalltestdb10.1622759936 -> '[email protected]' to mem3_sync queue
[debug] 2021-06-13T07:32:37.677550Z [email protected] <0.289.0> -------- adding shards/00000000-7fffffff/smalltestdb10.1622759936 -> '[email protected]' to mem3_sync queue
[debug] 2021-06-13T07:32:37.680024Z [email protected] <0.289.0> -------- adding shards/00000000-7fffffff/smalltestdb100.1622760748 -> '[email protected]' to mem3_sync queue
.....

... this goes on for a long time and logs stop printing at some point

I did some remsh analysis and took a snapshot of the processes. The only interesting issue I found is some processes had queued messages related to a node being shutdown.

  • Before I shutdown a node:
([email protected])22> P1 = [process_info(P) || P<-processes()].
  • After I shutdown a node:
([email protected])23> P2 = [process_info(P) || P<-processes()].
  • After the node is down and I start replication process:
([email protected])25> P3 = [process_info(P) || P<-processes()].

Here are some results:

([email protected])30> length(P1).                                                                      
765
([email protected])31> length(P2).
768
([email protected])32> length(P3).
851
([email protected])36> rp(lists:filter(fun(A) -> proplists:get_value(message_queue_len,A)>0 end, P1)).
[]
ok
([email protected])37> rp(lists:filter(fun(A) -> proplists:get_value(message_queue_len,A)>0 end, P2)).
[]
ok
([email protected])38> rp(lists:filter(fun(A) -> proplists:get_value(message_queue_len,A)>0 end, P3)).
[[{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1266155522.11768>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1266155522.11769>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3326587>]},
  {dictionary,[{dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"361658b552"},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {body_time,0},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,144190},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1266155522.11478>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1266155522.11479>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3325869>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"8d13a81621"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,4185},
  {heap_size,4185},
  {stack_size,38},
  {reductions,36373},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,4},
  {messages,[{'DOWN',#Ref<0.3967453924.1263271939.104768>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.104769>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1266155522.12280>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1266155522.12281>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3325871>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"ed60378cdd"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,4185},
  {heap_size,4185},
  {stack_size,38},
  {reductions,116983},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,4},
  {messages,[{'DOWN',#Ref<0.3967453924.1259077639.153906>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.153907>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217118>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217119>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3325887>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"81ea0debc3"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,4185},
  {heap_size,4185},
  {stack_size,38},
  {reductions,95303},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,4},
  {messages,[{'DOWN',#Ref<0.3967453924.1260912645.28924>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1260912645.28925>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217319>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217320>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328368>]},
  {dictionary,[{dont_log_request,true},
               {chttpd_stats,{st,0,0,0}},
               {nonce,"12aac7e7f3"},
               {body_time,0},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {mochiweb_request_recv,true},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,8370},
  {heap_size,4185},
  {stack_size,38},
  {reductions,93093},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,6}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1272709121.79576>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1272709121.79577>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3325877>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"fa5229fa02"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]}, 
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,6772},
  {heap_size,6772},
  {stack_size,38},
  {reductions,24324},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1258029065.53052>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1258029065.53053>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328379>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"d159db4198"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,6772},
  {heap_size,6772},
  {stack_size,38},
  {reductions,66541},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1272709121.79756>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1272709121.79757>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3326440>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"abe0194147"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,10958},
  {heap_size,10958},
  {stack_size,38},
  {reductions,36476},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1261174788.117014>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1261174788.117015>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328371>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"ae567f647b"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,6772},
  {heap_size,6772},
  {stack_size,38},
  {reductions,25192},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,8},
  {messages,[{'DOWN',#Ref<0.3967453924.1259077639.154250>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.154251>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217188>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217189>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.155027>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.155028>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.105867>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.105868>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328386>]},
  {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
               {dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"89487db213"},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,242662},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,4},
  {messages,[{'DOWN',#Ref<0.3967453924.1266155522.12100>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1266155522.12101>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.154847>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.154848>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328364>]},
  {dictionary,[{dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"6eeaf9de1b"},
               {body_time,0},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {couch_rewrite_count,0}, 
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,147112},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1259077639.154716>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259077639.154717>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328374>]},
  {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
               {dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"a8c1643078"},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,2586},
  {heap_size,2586},
  {stack_size,10},
  {reductions,101521},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1272709121.80670>,
                     process,
                     {stream,'[email protected]'}, 
                     noproc},
             {'DOWN',#Ref<0.3967453924.1272709121.80671>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328340>]},
  {dictionary,[{rand_seed,{#{bits => 58,jump => #Fun<rand.8.15449617>,
                             next => #Fun<rand.5.15449617>,type => exrop,
                             uniform => #Fun<rand.6.15449617>,
                             uniform_n => #Fun<rand.7.15449617>,weak_low_bits => 1}, 
                           [192787032645768316|82777130814936740]}},
               {dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"5700ea3731"},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,290815},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,4},
  {messages,[{'DOWN',#Ref<0.3967453924.1260912645.28519>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1260912645.28520>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.105060>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.105061>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328372>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"6997339d4c"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,6772}, 
  {heap_size,6772},
  {stack_size,38},
  {reductions,38690},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,4},
  {messages,[{'DOWN',#Ref<0.3967453924.1261174788.117682>,
                     process,
                     {stream,'[email protected]'},
                     noproc}, 
             {'DOWN',#Ref<0.3967453924.1261174788.117683>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217131>,process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1259864070.217132>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328373>]},
  {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
               {dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"fd552b2275"},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,163990},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{couch_doc,'-doc_from_multi_part_stream/4-fun-2-',
                               1}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1263271939.104969>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.104970>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3326592>]},
  {dictionary,[{chttpd_stats,{st,0,0,0}},
               {dont_log_request,true},
               {nonce,"9af6f2a5ae"},
               {mochiweb_request_recv,true},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {mochiweb_request_cookie,[{"AuthSession",
                                          "daRtad56NjBFRHUwNjQ61MgmkVAde5gn6pOmT3NhytBZ85f"}]},
               {mochiweb_request_qs,[{"new_edits","false"}]},
               {mp_att_writers,3},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,10958},
  {heap_size,10958},
  {stack_size,38},
  {reductions,47625},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1263271939.105412>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1263271939.105413>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3326418>]},
  {dictionary,[{dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"06c04da577"},
               {body_time,0},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,112627},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,1},
  {messages,[{'DOWN',#Ref<0.3967453924.1258553352.77156>,
                     process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3326607>]},
  {dictionary,[{'$initial_call',{mochiweb_acceptor,init,4}},
               {dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"630bdcffe5"},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {couch_rewrite_count,0},
               {dont_log_response,true}]}, 
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,2586},
  {heap_size,2586},
  {stack_size,10},
  {reductions,103263},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}],
 [{current_function,{mochiweb_http,request,3}},
  {initial_call,{proc_lib,init_p,5}},
  {status,waiting},
  {message_queue_len,2},
  {messages,[{'DOWN',#Ref<0.3967453924.1258029065.53021>,
                     process,
                     {stream,'[email protected]'},
                     noproc},
             {'DOWN',#Ref<0.3967453924.1258029065.53022>,process,
                     {stream,'[email protected]'},
                     noproc}]},
  {links,[<0.1860.0>,#Port<0.3328391>]},
  {dictionary,[{dont_log_request,true},
               {chttpd_stats,{st,1,0,0}},
               {nonce,"d09a203770"},
               {body_time,0},
               {'$ancestors',[chttpd,chttpd_sup,<0.901.0>]},
               {'$initial_call',{mochiweb_acceptor,init,4}},
               {couch_rewrite_count,0},
               {dont_log_response,true}]},
  {trap_exit,false},
  {error_handler,error_handler},
  {priority,normal},
  {group_leader,<0.899.0>},
  {total_heap_size,1598},
  {heap_size,1598},
  {stack_size,10},
  {reductions,24075},
  {garbage_collection,[{max_heap_size,#{error_logger => true,kill => true,size => 0}},
                       {min_bin_vheap_size,46422},
                       {min_heap_size,233},
                       {fullsweep_after,65535},
                       {minor_gcs,0}]},
  {suspending,[]}]]
ok
@janl
Copy link
Member

janl commented Jun 18, 2021

hey hey, just a quick note without going into too much detail. The test you are running (creating a lot a lot of databases in a tight loop) is not a use-case that CouchDB 3.x will be very happy with. I’m sure there are things we can improve, but this isn’t a use-case I see us optimising for a lot, unless someone contributes compelling PRs.

@vladimirralev
Copy link
Author

Thanks for the response. I think this loop is a common backup strategy - just replicate all DBs to a backup server as fast as you can overnight or otherwise. Sometimes in parallel.

That being said, this issue has been observed with a slow buildup of databases and the rate of creating the DBs in the example is probably not related to the root cause. I'll be trying to find the root cause of this and any hints are appreciated.

@skeyby
Copy link

skeyby commented Jul 5, 2021

I experienced this problem as well on our clusters. After a lot of trial-and-errors I think the problem is related to the synchronization on the _dbs internal db across nodes. Maybe you can check a little with that.

@iilyak
Copy link
Contributor

iilyak commented Jul 5, 2021

That being said, this issue has been observed with a slow buildup of databases and the rate of creating the DBs in the example is probably not related to the root cause.

It could be related to the rate of creation of databases. CouchDB uses LRU cache and keeps only limited number of databases in opened state. When you create databases rapidly you exceed the LRU cache size. Since most of the requests are new database creations these are definitely not in the cache. When the LRU cache size is over the limit CouchDB starts closing databases.

@vladimirralev
Copy link
Author

I think CouchDB 3 does eager indexing which causes a huge CPU/IO follow-up load asynchronously after a replication is complete for specific DBs. I have to pace the replications based on this, but that's fine and not related to the issue. I am setting up a new build here with more logging in between the lines, but so far it looks like indeed each DB is polling independently for health or at least logs something that is causing the sudden spike of queued messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants