Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate blockdata from timestamp et al in database storage #10671

Open
lhofhansl opened this issue Nov 26, 2020 · 7 comments
Open

Separate blockdata from timestamp et al in database storage #10671

lhofhansl opened this issue Nov 26, 2020 · 7 comments
Labels
@ Builtin Performance Request / Suggestion The issue makes a suggestion for something that should be done but isn't a new feature.

Comments

@lhofhansl
Copy link
Contributor

As is, we have to save back the entire block (serialization, compression, and all) when just the block's timestamp and other ancillary information changes.

We should have a database schema where these fields can be updated independently, so that we avoid expensive serialization when it is not necessary.

@lhofhansl lhofhansl added the Feature request Issues that request the addition or enhancement of a feature label Nov 26, 2020
@SmallJoker
Copy link
Member

The big mapdata blob is currently deflate-compressed. Writing the entire thing is easier than searching the correct offset in the stream to overwrite the contents. To avoid that, there needs to be some sort of container (like tar archive) where each data offset is known without parsing everything.

@sfan5
Copy link
Member

sfan5 commented Nov 26, 2020

Alternatively the MapBlock class could keep node- and metadata in compressed form until they're actually needed. Then if the block is loaded but never "really" modified, writing it would be quick.

@paramat paramat added Request / Suggestion The issue makes a suggestion for something that should be done but isn't a new feature. and removed Feature request Issues that request the addition or enhancement of a feature labels Nov 26, 2020
@lhofhansl
Copy link
Contributor Author

lhofhansl commented Nov 26, 2020

@sfan5 I tried that, but it lead to weird artifacts, nodedefs are updated it seems. I didn't try very hard, maybe I will again. :)
(Edit: What I tried was pre-compressing it at generation time in the emerge threads - with proper locking.)

For this, why do we have store everything together in a single blob? We can store it in a row in the database. The blob is one column, and attributes like timestamp would be other columns. So now it's possible to update the timestamp column only.

Perhaps I'm missing something...?

@lhofhansl
Copy link
Contributor Author

The thing is... Changing the timestamp does change the compressed blob. So the blob will change frequently.

@sfan5
Copy link
Member

sfan5 commented Nov 26, 2020

There are two compressed parts in a MapBlock, one is the node data and the other is the metadata. The timestamp is part of neither.
If only the timestamp is changed, the compressed blobs do not need to be touched.

We can store it in a row in the database.

I know and that would be the better solution, but I'm not sure how much pain database migration (gradual or at once?) is going to be.

@lhofhansl
Copy link
Contributor Author

lhofhansl commented Nov 26, 2020

I know and that would be the better solution, but I'm not sure how much pain database migration (gradual or at once?) is going to be.

As long as we can read the old format it should be OK, no?

Right now it's

"CREATE TABLE IF NOT EXISTS `blocks` (`pos` INT PRIMARY KEY, `data` BLOB);"

Would change it to at least

"CREATE TABLE IF NOT EXISTS `blocks` (`pos` INT PRIMARY KEY, `data` BLOB, `metadata` BLOB);"

But ideally to:

"CREATE TABLE IF NOT EXISTS `blocks` (`pos` INT PRIMARY KEY, `data` BLOB, timestamp..., daynightdiff..., ...;"

We should be able to alter the table and add the new columns, then start writing the new version while still being able to read the old version.

@lhofhansl
Copy link
Contributor Author

I want to warm this up again.

The fact is still that when just the timestamp is changed, the entire block is written to the DB again, which is wasteful in terms of CPU and IO.

My suggestion above avoids that by storing the timestamp separately, so the timestamp can be updated quickly and cheaply. Migration would be tricky, and I'll try to come up with something and the backwards compatible.

@Zughy Zughy added the @ Builtin label Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@ Builtin Performance Request / Suggestion The issue makes a suggestion for something that should be done but isn't a new feature.
Projects
None yet
Development

No branches or pull requests

5 participants