Improved block serialization format #3946

ShadowNinja · 2016-04-01T03:37:36Z

Recently, while writing a utility that serializes MapBlocks and writes then to a database I noticed that there was some room for improvement for the format:

Compress the whole block at once, instead of separately compressing all the large sections. This is simpler and may improve performance and increase compression ratios.
Reorder node storage to YZX or YXZ order to improve compression (blocks are likely to have similar nodes in the same Y layer).
Add a binary Inventory list serialization format. Currently node metadata inventories are serialized in the same multi-line text format used in player files. It's compressed, but even an uncompressed binary format is probably smaller than the compressed text. Proposed format:
- u8 version = 0
- u8 flags { FLAG_EMPTY } Perhaps combine with version? Empty inventories are quite common, so this flag is probably worth having. Alternatively, have separate size and "included size" fields -- or store items as a Map<SlotID, Item>.
- u8 width
- u16 size
- for each item (if not empty):
  - u16 count (if 0 continue)
  - u8/u16 name_size
  - char name[name_size]
Split content array? Currently it's just serialized as an array of u16, however, this allows up to (2¹⁶) nodes per block, and only different ids are actually (2¹²) possible, wasting at least 4 bits. Also, most blocks won't have anywhere near 2¹² different nodes. Many will have less than 8. To reduce this waste, the content could be serialized as an array of u8, with an optional array of u4 (conceptually) included if there are more than 256 distinct nodes in a block. This might be pointless if compression does a good job though.
Omit IDs from name-id-map: The IDs are just serialized as 0, 1, 2, etc, so just serialize an array of strings and use the index as the ID.
Consider better compression levels? (or use a setting?)
Remove content_width/params_width these are set to constants and ignored. Changing then would break the format anyways.
Don't include timer data length if version is 0 (in fact, perhaps it should be completely removed).
Use bit shifts instead of multiplication for meta/timer positions (simpler and more efficient).

The text was updated successfully, but these errors were encountered:

kwolekr · 2016-04-01T05:44:32Z

Sure that all sounds good, but this is a longer-term enhancement that should probably come below getting other PRs merged in terms of priority.

est31 · 2016-04-01T14:17:29Z

Compressing the whole block at once: I'm not sure that this has the desired effect. Its better to have separate huffman trees for separate kinds of data.
Reorder coordinates to YZX/YXZ: makes sense, but first test it whether it actually improves the ratio
Add a binary Inventory list serialization format: good idea, 👍 . We should however still use the textual format for the player files, to keep it human-editable.
Split content array: We would be doing the work the huffman tree already does (or is supposed to do), I doubt it adds any improvements.
Ommit IDs from name-id-map: I'm not sure, but doesn't the current code write the name-id-map directly from what it has from the nodedef, not replacing it with 0,1,2,3, etc, but taking the higher IDs? If we omit the IDs then we must use 0,1,2,3, etc and this means we need an extra walk over the param0 array to translate the IDs to that format. What about throwing out the whole name-id-map-per-block concept, and making it a name-id-map-per-world one? Or multiple name-id-maps, and the block uses one specified by an id, to be used if you added a mod that added lots of nodes, but now chose another one that adds a lot of nodes, and the two node sets don't fit into one name id map.
Consider better compression levels: minetest is supposed to be a real time game, so compression/decompression needs to be fast. I don't think anybody wants this

ShadowNinja · 2016-04-01T19:10:30Z

@est31:

Will have to test then.
Well, MC changed to YZX for this same reason, so it seems like it does, but I could test it.
Yes, only meant for MapBlocks (although other places where they're sent over the network could benefit).
Actually, this is already implemented as content_width, it's just always set to 2 even when only 1 is needed. Changing this should be rather trivial.
getBlockNodeIdMapping generates a list specific to the MapBlock (it uses a global, but resets it on each call).
If you have plenty of storage but saving is taking too long you could reduce/remove compression too with a setting, but I'm pretty sure serialization runs infrequently in a background thread and performance isn't much of an issue.

EDIT: getBlockNodeIdMapping generates a sepate mapping per block.

o-jasper · 2016-04-08T01:22:57Z

I think you should consider merkle-tree-like structures first. Basically the hash of subsets of the map/data are hashed together in a tree. If a branch has the wrong hash, you ask for the sub-branch hashes, and then there must be at least one wrong there. You go down finding all the changes upto some granularity that is convenient and efficient.

Could prioritize keeping/not keeping chunks on hard disk by how long the player spends near those areas. Then when (s)he logs in, very little has to be sent.

(edit: as a next step)In case of changes nearby, could try for a synchronized simulation in that case, sending across events by users/server and using the hash to catch when the synchronization fails. That could be due to floating point differences or "edge of what users simulate" difference. (If floating points "diverge in small amounts too quickly" can use for the hash floor(float/small_value) allowing the small differences. Of course, depending the case, is likely the difference would grow.

Lua tables/object data can also be hashed that way. Got something sortah for that, but not quite..

Edit: simplest, is to simply have hashes calculated with each chunk. Main disadvantage is that you have to iterate and hash the whole thing for each change. So it is likely better to have a bunch of sub-chunks with hashes. If one block changes, that 4³ blocks + 4³ subhashes = 128 instead of 16³ = 4096 blocks. Also can just hash occasionally, for instance when changes start, wait 5ms until hashing it, so not every change triggers rehashing.

Also note that users can put whatever nodes they want. Malicious ones try find collisions causing desynchronizations. Probably not an issue though. Also springrts, had desync problems until they had this. At least, how i understand it.

ShadowNinja · 2016-04-09T23:49:10Z

@o-jasper: What are you talking about? It doesn't sound like what you're saying has anything to do with MapBlock serialization.

o-jasper · 2016-04-14T00:36:14Z

Not if you're serializing to disk, but if you're serializing for sending across the network to players, it might be premature optimization?

HybridDog · 2021-01-07T11:13:55Z

The mapblock data is similar to a 3D image with colour palette, so I think that compression algorithms for image formats could be used for mapblock compression, too.
The FLIF image format has been superseded by FUIF, which has been superseded by JPEG XL.
The file-size-relevant part of the Minetest map corresponds to a 3D image with colour palette (the different nodes) and three channels (nodes, param1, param2). The Meta-Adaptive Near-zero Integer Arithmetic Coding performs significantly better for indexed images than gzip compression I think. Someone could have a look at the relevant parts of the JPEG XL code (lossless compression, indexed mode) and then write an efficient compression algorithm for the Minetest maps.

rubenwardy · 2021-01-31T18:49:37Z

This should either be accepted and implemented or closed

sfan5 · 2021-05-05T15:16:51Z

Remove content_width/params_width these are set to constants and ignored. Changing then would break the format anyways.

This isn't true btw.
It is perfectly possible to write a valid block that can be successfully/correctly decoded with content_width = 1 and the latest serialization version. Because this trick could help with compression (not in MT itself, but external tools) I don't think it hurts to keep.

ShadowNinja added Enhancement @ Server / Client / Env. labels Apr 1, 2016

paramat removed the Enhancement label Oct 29, 2016

ShadowNinja mentioned this issue May 4, 2017

Using Zstandard for mapblock compression? #4495

Closed

ShadowNinja mentioned this issue Jan 6, 2021

Switch MapBlock compression to zstd #10788

Merged

rubenwardy added the Possible close label Jan 31, 2021

rubenwardy closed this as completed Mar 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved block serialization format #3946

Improved block serialization format #3946

ShadowNinja commented Apr 1, 2016 •

edited

Loading

kwolekr commented Apr 1, 2016

est31 commented Apr 1, 2016

ShadowNinja commented Apr 1, 2016

o-jasper commented Apr 8, 2016

ShadowNinja commented Apr 9, 2016

o-jasper commented Apr 14, 2016

HybridDog commented Jan 7, 2021

rubenwardy commented Jan 31, 2021

sfan5 commented May 5, 2021

Improved block serialization format #3946

Improved block serialization format #3946

Comments

ShadowNinja commented Apr 1, 2016 • edited Loading

kwolekr commented Apr 1, 2016

est31 commented Apr 1, 2016

ShadowNinja commented Apr 1, 2016

o-jasper commented Apr 8, 2016

ShadowNinja commented Apr 9, 2016

o-jasper commented Apr 14, 2016

HybridDog commented Jan 7, 2021

rubenwardy commented Jan 31, 2021

sfan5 commented May 5, 2021

ShadowNinja commented Apr 1, 2016 •

edited

Loading