-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compaction Failures Similar to #2941 on Upgrade From 2.x to 3.1.1 #3292
Comments
Are you able to test the fix that was merged in #3001 ? |
I was under the impression the fix was included in 3.1.1 which we directly upgraded to from 2.x according to the release notes under bug fixes |
This looks like a different failure from #3001. @bdoyle0182 what version of 2.x was running on the old nodes? What Erlang VM version, OS (which version of CentOS), and file system was used? Looking through the issues so far it's first instance of |
Looking at the code the crash is coming from couchdb/src/couch/src/couch_file.erl Lines 729 to 730 in 23b9834
It looks like That gets called from couchdb/src/couch/src/couch_bt_engine_compactor.erl Lines 554 to 555 in 23b9834
or compactor's couchdb/src/couch/src/couch_bt_engine_compactor.erl Lines 662 to 669 in 23b9834
|
Current couchdb: 3.1.1 Current erlang: 20.3.8.24 centos: CentOS Linux release 7.9.2009 file system: overlay |
Just to update the compaction did successfully complete after being re-triggered so just deleting the compaction files seems like a fine remediation. |
@bdoyle0182 makes sense, thanks for confirming. This probably is an upgrade issue, we have upgrade code to handle the main .couch files but not necessarily the .compact.* files I think. @davisp I wonder if we can automatically detect the upgrade scenario and auto-delete or at least ignore the older compaction files when the format is upgraded? We should update the docs to advise users to complete compactions on the 2.x nodes before they are upgraded to 3.x, or alternatively to delete the .compact.meta and .compact.data after the upgrade. |
Description
we're seeing a similar issue to #2941 with random compactions on shards when upgrading from 2.x to 3.1.1. But might be completely unrelated. The compaction metadata file blew up to about 500gb over 24 hours for a shard that is about 30gb constantly hitting this error. Similarly we're also seeing large disk io on the nodes this is happening on versus nodes this is not happening on like in #2941. I've deleted the compaction files like discussed in the previous issue and seems to be working fine now, the compaction is running and the errors have stopped. disk io has gone back down.
<0.5181.0> -------- exit for compaction of ["shards/60000000-7fffffff/core_activations.1589336264"]: {badarith,[{couch_file,get_pread_locnum,3,[{file,"src/couch_file.erl"},{line,730}]},{lists,map,2,[{file,"lists.erl"},{line,1239}]},{lists,map,2,[{file,"lists.erl"},{line,1239}]},{couch_file,read_multi_raw_iolists_int,2,[{file,"src/couch_file.erl"},{line,719}]},{couch_file,handle_call,3,[{file,"src/couch_file.erl"},{line,507}]},{gen_server,try_handle_call,4,[{file,"gen_server.erl"},{line,636}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,665}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
-------- CRASH REPORT Process (<0.32443.1691>) with 3 neighbors crashed with reason: bad arithmetic expression at couch_file:get_pread_locnum/3(line:730) <= lists:map/2(line:1239) <= couch_file:read_multi_raw_iolists_int/2(line:719) <= couch_file:handle_call/3(line:507) <= gen_server:try_handle_call/4(line:636) <= gen_server:handle_msg/6(line:665) <= proc_lib:init_p_do_apply/3(line:247); initial_call: {couch_file,init,['Argument__1']}, ancestors: [<0.3352.1692>], message_queue_len: 0, messages: [], links: [<0.3352.1692>], dictionary: [{couch_file_fd,{{file_descriptor,prim_file,{#Port<0.1924847>,92}},"..."}},...], trap_exit: false, status: running, heap_size: 28690, stack_size: 27, reductions: 13483
an example of a shard I haven't cleaned up yet
-rw-r--r-- 1 1501 1501 16G Dec 9 17:13 core_activations.1589323247.couch
-rw-r--r-- 1 1501 1501 4.0G Dec 9 17:11 core_activations.1589323247.couch.compact.data
-rw-r--r-- 1 1501 1501 159G Dec 9 17:13 core_activations.1589323247.couch.compact.meta
Steps to Reproduce
Upgrade to 3.1.1 from 2.x mid compaction
Expected Behaviour
Compaction to complete as expected
Your Environment
If any specific environment details would be helpful just let me know
The text was updated successfully, but these errors were encountered: