Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use vacuum when pruning #1103

Merged
merged 2 commits into from
May 26, 2022
Merged

Conversation

KonradStaniec
Copy link
Contributor

Removes usage of vacuum from fluffy pruning. With new approach fluffy node will reach specified maxDb size, and than stay with this size through its life time as pages freed by deleting items will be re-used for further additions.

Two follow up tasks are necessary:

  1. Devise vacuum strategy - after few pruning cycles database can become fragmented which may impact performance, so at some point in time VACCUM will need to be run to defragment db.
  2. Deal with edge case when user would configure max db size lower than
    current db.size(). With such config data base would try to prune iteslf with
    each addition

@KonradStaniec KonradStaniec requested a review from kdeme May 25, 2022 13:01
Comment on lines 187 to 188
## Return size of pages which are used by databse, but are currently empty i.e
## they can be re-used for new content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Return size of pages which are used by databse, but are currently empty i.e
## they can be re-used for new content
## Returns the total size of the pages which are unused by the database,
## i.e they can be re-used for new content.

@@ -172,6 +183,18 @@ proc size*(db: ContentDB): int64 =
size = res).expectDb()
return size

proc unusedSize*(db: ContentDB): int64 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably remain private?


let dbSize = db.size()

# We use real size for our pruning treshold, which means that database file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We use real size for our pruning treshold, which means that database file
# We use real size for our pruning threshold, which means that database file

let dbSize = db.size()

# We use real size for our pruning treshold, which means that database file
# will reach size specified in db.maxSize, and will stay that size thourough
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# will reach size specified in db.maxSize, and will stay that size thourough
# will reach size specified in db.maxSize, and will stay that size thorough

Comment on lines 284 to 289
# 1. Devise vacuum strategy - after few pruning cycles database can be
# fragmented which may impact performance, so at some point in time `VACCUM`
# will need to be run to defragment db.
# 2. Deal with edge case when user would configre max db size lower than
# current db.size(). With such config data base would try to prune iteslf with
# each addition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 1. Devise vacuum strategy - after few pruning cycles database can be
# fragmented which may impact performance, so at some point in time `VACCUM`
# will need to be run to defragment db.
# 2. Deal with edge case when user would configre max db size lower than
# current db.size(). With such config data base would try to prune iteslf with
# each addition
# 1. Devise vacuum strategy - after few pruning cycles database can become
# fragmented which may impact performance, so at some point in time `VACUUM`
# will need to be run to defragment the db.
# 2. Deal with the edge case where a user configures max db size lower than
# current db.size(). With such config the database would try to prune itself with
# each addition.

@kdeme
Copy link
Contributor

kdeme commented May 25, 2022

Two follow up tasks are necessary:

  1. I think we don't need to solve this straight away, but lets create an issue for it. Not great UX, but perhaps for now a VACUUM could be called through a separate cli command or perhaps a json-rpc debug call
  2. The simplest solution here is to just delete the existing database and create a new one when this scenario occurs. The other obvious solution is running VACUUM when this occurs, but then you hit the same issue as at 1. (size doubling briefly).

@KonradStaniec
Copy link
Contributor Author

KonradStaniec commented May 25, 2022

I think we don't need to solve this straight away, but lets create an issue for it. Not great UX, but perhaps for now a VACUUM could be called through a separate cli command or perhaps a json-rpc debug call

Agreed that for now it is not an urgent thing. I will create a task for it.

The simplest solution here is to just delete the existing database and create a new one when this scenario occurs. The other obvious solution is running VACUUM when this occurs, but then you hit the same issue as at 1. (size doubling briefly).

Deleting database sounds harsh, and would probably incur some unhappy users do not expecting such behaviour. (although we still do not have users yet so 😅 ) Probably the most comprehensive solution would be to have database in incremental vacuum mode, delete enough data so db size will drop to the half of provided new limit, then change db size by incremental delete pragma and then vacuum. That way are within limit all the time, and end up with correct database state. Drawback is that client start up can probably take a while when such thing happen.

@kdeme
Copy link
Contributor

kdeme commented May 25, 2022

Deleting database sounds harsh, and would probably incur some unhappy users do not expecting such behaviour. (although we still do not have users yet so sweat_smile ) Probably the most comprehensive solution would be to have database in incremental vacuum mode, delete enough data so db size will drop to the half of provided new limit, then change db size by incremental delete pragma and then vacuum. That way are within limit all the time, and end up with correct database state. Drawback is that client start up can probably take a while when such thing happen.

Yes, I did call it the simplest solution, not the one with best UX. A more complex solution is for example what you mentioned. But, as you mention yourself already, when a user requests to use only half of the current storage that is being used now, it will practically keep only ~25% of their original database instead of ~50%. Better than a full reset, but still not great (And especially for small storage size settings, perhaps not worth it). The incremental vacuum setting will also increase the db overhead (although I don't know by how much, perhaps negligible).
We also need to ask the question whether it is an important use case to be able to lower to storage size originally set, that it is worth adding complexity / potential overhead. For now I wouldn't invest to much effort in it.

@KonradStaniec
Copy link
Contributor Author

We also need to ask the question whether it is an important use case to be able to lower to storage size originally set, that it is worth adding complexity / potential overhead. For now I wouldn't invest to much effort in it.

Yep agree 👍 , for now maybe it would be enough to return error to the user in such case

@KonradStaniec KonradStaniec merged commit af10e8f into master May 26, 2022
@KonradStaniec KonradStaniec deleted the do-not-use-vacuum-on-pruning branch May 26, 2022 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants