Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wanted support of {error,enospc} #3576

Open
sergey-safarov opened this issue May 24, 2021 · 6 comments
Open

wanted support of {error,enospc} #3576

sergey-safarov opened this issue May 24, 2021 · 6 comments

Comments

@sergey-safarov
Copy link

Summary

I use dedicated volume for CouchDB. Our application time to tie start make lot of changes and database size gow very fast.
I can see this errors in couchdb log

[error] 2021-05-24T12:52:20.836423Z [email protected] <0.14081.36> -------- CRASH REPORT Process  (<0.14081.36>) with 0 neighbors crashed with reason: no match of right hand value {error,enospc} at couch_btree:'-write_node/3-lc$^0/1-0-'/5(line:443) <= couch_btree:write_node/3(line:441) <= couch_btree:modify_node/4(line:409) <= couch_btree:modify_kpnode/6(line:521) <= couch_btree:modify_node/4(line:393) <= couch_btree:query_modify/4(line:259) <= couch_btree:add_remove/3(line:237) <= couch_bt_engine:write_doc_infos/3(line:423); initial_call: {couch_db_updater,init,['Argument__1']}, ancestors: [<0.31318.34>], message_queue_len: 0, messages: [], links: [<0.229.0>], dictionary: [{idle_limit,61000},{io_priority,{db_update,<<"shards/00000000-1fffffff...">>}}], trap_exit: false, status: running, heap_size: 10958, stack_size: 27, reductions: 4113
[error] 2021-05-24T12:52:20.836555Z [email protected] emulator -------- Error in process <0.14137.36> on node '[email protected]' with exit value:
{{badmatch,{'EXIT',normal}},[{couch_file,pread_binary,2,[{file,"src/couch_file.erl"},{line,169}]},{couch_file,pread_term,2,[{file,"src/couch_file.erl"},{line,157}]},{couch_btree,get_node,2,[{file,"src/couch_btree.erl"},{line,434}]},{couch_btree,lookup,3,[{file,"src/couch_btree.erl"},{line,284}]},{couch_btree,lookup,2,[{file,"src/couch_btree.erl"},{line,274}]},{couch_bt_engine,open_docs,2,[{file,"src/couch_bt_engine.erl"},{line,327}]},{couch_db,before_docs_update,3,[{file,"src/couch_db.erl"},{line,1340}]},{couch_db,update_docs,4,[{file,"src/couch_db.erl"},{line,1169}]}]}

Desired Behaviour

added support of this error and generate relevant log message.

Additional context

I use CouchDB 2.3.1

@janl
Copy link
Member

janl commented Aug 30, 2021

heya, I just want to clarify what you’re looking for here? error,enospc already says “there was an error, there is no more space on the filesystem”.

Do you want this to say literally “there was an error, there is no more space on the filesystem”, or what are you asking for here?

CouchDB can’t “handle” a full filesystem any better because it can’t write any data to disk

@sergey-safarov
Copy link
Author

I mean:

  1. no all understand erlang error stack, some users may be stuck with this error;
  2. do not understand what is means error,enospc need some googling and manuals reading;
  3. when CouchDB has a lot of write operations, then this error flooding Linux host error log.

What is prefer:

  1. on no free space error write to logs no free space on database volume;
  2. write this error message onece peer hour (30 min, 5 min) and write to log "no free space error muted for 60 min".

This will make error messages more user-friendly and do not flood the error log in the future.

@iilyak
Copy link
Contributor

iilyak commented Aug 30, 2021

write this error message onece peer hour (30 min, 5 min) and write to log "no free space error muted for 60 min".

To implement it we would need to maintain a timestamp when the message was logged last time.
We don't have facility in the logger to log certain messages once in a given interval. The problem is a couch_file processes are independent for each database. So we cannot remove duplicates of the message across those processes. Also we terminate couch_file processes when needed which makes accounting for last_time_logged impossible (without using shared state).

CouchDB project would benefit in few other places if we would implement log once facility in the logger.
However implementation of this is not easy and would require:

  1. porting Add structured logging reports via new Erlang 21 logger #3526 to 3.x
  2. erlang upgrade to 21

@nickva
Copy link
Contributor

nickva commented Nov 13, 2023

We have implemented optional countermeasures when the disks are getting full in #4681

It may still be worth adding periodic clear warnings to the logs for {error,enospc}. We now support Erlang 24+ so we have access to persistent terms and atomics to possibly implement per-module log rate limiting.

@big-r81
Copy link
Contributor

big-r81 commented Jan 27, 2024

Can this help us to rate the output?

@nickva
Copy link
Contributor

nickva commented Jan 27, 2024

That seems interesting, yeah it could work. We'd specifically want to rate limit some messages, info or out of disk space ones but we may want make sure we don't limit critical or other such errors.

Our current log system was designed before the new logging system was written in Erlang/OTP so we could probably update it to simplify it and take better advantage of the new features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants