Separate blockdata from timestamp et al in database storage #10671

lhofhansl · 2020-11-26T05:08:46Z

As is, we have to save back the entire block (serialization, compression, and all) when just the block's timestamp and other ancillary information changes.

We should have a database schema where these fields can be updated independently, so that we avoid expensive serialization when it is not necessary.

SmallJoker · 2020-11-26T16:12:04Z

The big mapdata blob is currently deflate-compressed. Writing the entire thing is easier than searching the correct offset in the stream to overwrite the contents. To avoid that, there needs to be some sort of container (like tar archive) where each data offset is known without parsing everything.

sfan5 · 2020-11-26T16:27:38Z

Alternatively the MapBlock class could keep node- and metadata in compressed form until they're actually needed. Then if the block is loaded but never "really" modified, writing it would be quick.

lhofhansl · 2020-11-26T19:36:16Z

@sfan5 I tried that, but it lead to weird artifacts, nodedefs are updated it seems. I didn't try very hard, maybe I will again. :)
(Edit: What I tried was pre-compressing it at generation time in the emerge threads - with proper locking.)

For this, why do we have store everything together in a single blob? We can store it in a row in the database. The blob is one column, and attributes like timestamp would be other columns. So now it's possible to update the timestamp column only.

Perhaps I'm missing something...?

lhofhansl · 2020-11-26T19:41:42Z

The thing is... Changing the timestamp does change the compressed blob. So the blob will change frequently.

sfan5 · 2020-11-26T20:27:03Z

There are two compressed parts in a MapBlock, one is the node data and the other is the metadata. The timestamp is part of neither.
If only the timestamp is changed, the compressed blobs do not need to be touched.

We can store it in a row in the database.

I know and that would be the better solution, but I'm not sure how much pain database migration (gradual or at once?) is going to be.

lhofhansl · 2020-11-26T23:40:34Z

I know and that would be the better solution, but I'm not sure how much pain database migration (gradual or at once?) is going to be.

As long as we can read the old format it should be OK, no?

Right now it's

"CREATE TABLE IF NOT EXISTS `blocks` (`pos` INT PRIMARY KEY, `data` BLOB);"

Would change it to at least

"CREATE TABLE IF NOT EXISTS `blocks` (`pos` INT PRIMARY KEY, `data` BLOB, `metadata` BLOB);"

But ideally to:

"CREATE TABLE IF NOT EXISTS `blocks` (`pos` INT PRIMARY KEY, `data` BLOB, timestamp..., daynightdiff..., ...;"

We should be able to alter the table and add the new columns, then start writing the new version while still being able to read the old version.

lhofhansl · 2021-07-26T10:20:20Z

I want to warm this up again.

The fact is still that when just the timestamp is changed, the entire block is written to the DB again, which is wasteful in terms of CPU and IO.

My suggestion above avoids that by storing the timestamp separately, so the timestamp can be updated quickly and cheaply. Migration would be tricky, and I'll try to come up with something and the backwards compatible.

lhofhansl added the Feature request Issues that request the addition or enhancement of a feature label Nov 26, 2020

sfan5 added the Performance label Nov 26, 2020

paramat added Request / Suggestion The issue makes a suggestion for something that should be done but isn't a new feature. and removed Feature request Issues that request the addition or enhancement of a feature labels Nov 26, 2020

lhofhansl mentioned this issue Jan 6, 2021

Switch MapBlock compression to zstd #10788

Merged

Zughy added the @ Builtin label Oct 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate blockdata from timestamp et al in database storage #10671

Separate blockdata from timestamp et al in database storage #10671

lhofhansl commented Nov 26, 2020

SmallJoker commented Nov 26, 2020

sfan5 commented Nov 26, 2020

lhofhansl commented Nov 26, 2020 •

edited

Loading

lhofhansl commented Nov 26, 2020

sfan5 commented Nov 26, 2020

lhofhansl commented Nov 26, 2020 •

edited

Loading

lhofhansl commented Jul 26, 2021

Separate blockdata from timestamp et al in database storage #10671

Separate blockdata from timestamp et al in database storage #10671

Comments

lhofhansl commented Nov 26, 2020

SmallJoker commented Nov 26, 2020

sfan5 commented Nov 26, 2020

lhofhansl commented Nov 26, 2020 • edited Loading

lhofhansl commented Nov 26, 2020

sfan5 commented Nov 26, 2020

lhofhansl commented Nov 26, 2020 • edited Loading

lhofhansl commented Jul 26, 2021

lhofhansl commented Nov 26, 2020 •

edited

Loading

lhofhansl commented Nov 26, 2020 •

edited

Loading