Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Couchdb error - Purge checkpoint '_local/purge-mrview-' not updated in 59 seconds #4181

Closed
aalegriadg opened this issue Sep 19, 2022 · 4 comments · Fixed by #4271
Closed
Milestone

Comments

@aalegriadg
Copy link

When I make a purge request of 1000 documents, I get the following error logs:



[error] 2022-09-14T10:17:18.414447Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-12e57817980f2e08f39604bed34e44ad' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.419849Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-279fd3dc8974ceb702c8b67421a6c856' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.439182Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-2e64b0bb954e88d783d3d85f54ab79c4' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.442420Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-33758fa40d7894fc9b86215711c23ca6' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.457073Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-33d3a809ad09581167acfdae005e8989' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.459602Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-78976a811e4bfad9f81151dd21139775' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.462753Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-7da2929059e4620d3b669fad56a31dbd' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db.1659444354">>
[error] 2022-09-14T10:17:18.465246Z [email protected] <0.6046.12> -------- Purge checkpoint '_local/purge-mrview-85532e923a6bdaa9e16645c6c7e34fd0' not updated in 59 seconds
                    in database <<"shards/00000000-7fffffff/db1659444354">>

Purging is successful but the logs generate mistrust.

These errors only appear when I make a purge request for a large number of documents. If, for example, the request only includes the ids and revs of 100 documents, the error does not appear.

Why do these errors occur? Is this a problem? How can I avoid it?

@nickva
Copy link
Contributor

nickva commented Sep 21, 2022

See some documentation about https://docs.couchdb.org/en/stable/cluster/purging.html#config-settings

Wonder if your index_lag_warn_seconds = 86400 setting was altered from the default?

@aalegriadg
Copy link
Author

aalegriadg commented Sep 21, 2022

Thank you for the reply,

No, the configuration is the default one: index_lag_warn_seconds = 86400

However, unlike described in the documentation, the seconds in my logs are not 86400. In my example logs appears "not update in 59 seconds" but testing I have seen other values like: 1335, 1237, 126, 9 seconds ...

@nickva
Copy link
Contributor

nickva commented Sep 21, 2022

@aalegriadg You're right, thanks for checking. It looks like it's a bug in the logging bit. It logs how much over the threshold it is, and not the threshold limit itself as the message implied:

Diff = NowSecs - Updated,
Fmt1 =
"Purge checkpoint '~s' not updated in ~p seconds\n"
" in database ~p",
couch_log:error(Fmt1, [DocId, Diff, DbName])

It does look like a warning only, and indicates that the views have not been updated (yet?) after the documents have been purged from the main db. Try to query the views (which forces an update) or perhaps adjust the auto-indexer, ken settings.

What I suspect might be happening is that we don't have have a decent auto-indexer trigger based on purged only. Only on document updates.

@aalegriadg
Copy link
Author

Thank you for your answer, it is very useful for my team.

You're right, thanks for checking. It looks like it's a bug in the logging bit. It logs how much over the threshold it is, and not the threshold limit itself as the message implied:

As it is a bug, this log should only appear if the purge takes more than 1 day (86400 seconds), which is not going to happen. So, do I stop worrying about these warnings?

It does look like a warning only, and indicates that the views have not been updated (yet?) after the documents have been purged from the main db. Try to query the views (which forces an update) or perhaps adjust the auto-indexer, ken settings.

As it is a partitioned database, the reindexing only affects the documents belonging to the queried partition right? i.e. if I purge partition1 the query time of partition2 should not be affected. I ask this because in my project we work with temporary replication windows: Documents are stored by daily partitions and when they are considered old enough they are not replicated and furthermore, they are purged. The query is for the partition with recent documents (those of the current day), not the purged ones. By the way I also opened this issue about purging.

@nickva nickva added this to the 3.3.0 milestone Oct 24, 2022
nickva added a commit that referenced this issue Nov 17, 2022
 * Refactor lag logging to a separate function.

 * When purge client validity throws an error log the error as well.

 * Use the newer `erlang:system_time(second)` call to get time.

 * When warning about update lag, log both the lag and the limit.

 * Add specific log message if update timestamp is invalid

 * Clarify that "Invalid purge doc" is not always because there is a malform
   checkpoint document, the most likely it's a stale view client checkpoint
   which hasn't been cleaned up properly.

Fix: #4181
nickva added a commit that referenced this issue Nov 18, 2022
 * Refactor lag logging to a separate function.

 * When purge client validity throws an error log the error as well.

 * Use the newer `erlang:system_time(second)` call to get time.

 * When warning about update lag, log both the lag and the limit.

 * Add a specific log message if update timestamp is invalid.

 * Clarify that an invalid is not always because of a malformed checkpoint
   document. The most likely case is it's a stale view client checkpoint which
   hasn't been cleaned up properly.

Fix: #4181
nickva added a commit that referenced this issue Nov 18, 2022
 * Refactor lag logging to a separate function.

 * When purge client validity throws an error log the error as well.

 * Use the newer `erlang:system_time(second)` call to get time.

 * When warning about update lag, log both the lag and the limit.

 * Add a specific log message if update timestamp is invalid.

 * Clarify that an invalid is not always because of a malformed checkpoint
   document. The most likely case is it's a stale view client checkpoint which
   hasn't been cleaned up properly.

Fix: #4181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants