Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long running erlang map/reduce can block view compaction from completion, leaking erlang procs #4725

Open
KangTheTerrible opened this issue Aug 10, 2023 · 7 comments

Comments

@KangTheTerrible
Copy link
Contributor

KangTheTerrible commented Aug 10, 2023

Description

A long running/slow erlang map/reduce due to a new shard deployment, appears to be blocking that shards view compaction from completing. It also appears to be leaking/growing erlang procs at a steady rate, between 5k-10k per hour.

Steps to Reproduce

Start view Compaction
Start long Erlang/reduce
View Compaction tries to complete, is unable to until indexer completes (suspected, waiting to observe this outcome)
Observe steady increase in erlang procs (may require continued insertion/interaction with the shard)

Expected Behaviour

View compaction should not be blocked
Erlang procs should not continue to increase until it hits the limit and crashes

Your Environment

AWS C6i.x32large 5 nodes q=3 n=5

  • CouchDB version used: 3.2.2
  • Operating system and version: Debian Buster

Additional Context

We resharded which resulted in the erlang map reduce being a lot longer than it should(not incremental).

@KangTheTerrible KangTheTerrible changed the title Long running erlang map/reduce can block compaction from completion, leaking erlang procs Long running erlang map/reduce can block view compaction from completion, leaking erlang procs Aug 11, 2023
@KangTheTerrible
Copy link
Contributor Author

KangTheTerrible commented Aug 11, 2023

Additional piece of useful info, it seems that while the index is running for the first time I got this from the erlang views metadata, the leaking erlang procs appear to be the "clients waiting for the index".

_design/erlangstatsstats Metadata
Index Information
Language:Erlang
Currently being updated?Yes
Currently running compaction?Yes
Waiting for a commit?Yes
Clients waiting for the index:719422
Update sequence on DB:257926611
Processed purge sequence:0
Actual data size (bytes):602,563,809,246
Data size on disk (bytes):1,187,591,035,418
MD5 Signature:

@KangTheTerrible
Copy link
Contributor Author

This does eventually resolve gracefully, given enough erlang procs and storage. Additional change that had to be made to keep on top of storage was to increase the view ratio smoosh concurrency values since stuck compactions prevented other compactions from running.

@nickva
Copy link
Contributor

nickva commented Aug 22, 2023

One strategy could be to periodically ping the https://docs.couchdb.org/en/stable/api/ddoc/common.html#db-design-design-doc-info endpoint and wait until the index has completed building before querying it to avoid piling up too many client requests if the index is large.

Using a larger Q (resharding) could also help parallelize indexing building if you have the computation and disk throughput resources.

@KangTheTerrible
Copy link
Contributor Author

Yeah Nick, in our case unfortunately this was a live production server so we had no trivial means to block users from attempting to access the view. Worth noting, none of these clients were waiting, all view requests to this view are stable=false&update=lazy

@fr2lancer
Copy link

Hi actually I can't see any outstanding lines in debug mode in the log.
it just no logs from yesterday and process is unable to be recognized and top is 5.0.
Not consuming too much memory.

do you know how to flush debug from erlang?

@nickva
Copy link
Contributor

nickva commented Dec 6, 2023

  • I'll second @rnewson's proposal to try a old-ddoc/new-ddoc strategy to deploy new views.

  • For clients could use stable=false&update=false and let ken (index auto-builder) to build the indices for you in the background. Monitor with _active_tasks.

  • There is an undocumented [smoosh.ignore] $shard = true setting to allow the auto-compactor to ignore specific shards. For example:

[smoosh.ignore]
shards/e0000000-ffffffff/dbname.1660859921 = true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants