Using Zstandard for mapblock compression? #4495

est31 · 2016-09-02T11:54:12Z

https://github.com/facebook/zstd

We could train zstd for minetest, or use multiple dictionaries per map.

Rogier-5 · 2016-09-02T15:49:52Z

Thumbs up for replacing zlib !

Speed tests I did for minetestmapper a while ago showed that about 50% (IIRC) of the running time was actually spent in zlib code, decompressing map data...

As the comparison table on the zstd page shows about 25% worse compression for lz4, but with the benefit of a 2-3 times speed increase, I'd welcome an implementation that allows different compression algorithms to be used, so that depending on CPU speed, disk speed and disk size, a server operator can choose for faster or for more compact compression. The actual algorithm that was used should obviously be encoded in a block's data.

At the same time, instead of only compressing the node data, all of the block's data could be compressed. IIRC, one important reason LevelDB databases are often smaller, is that LevelDB uses snappy to compress all data it stores.

SmallJoker · 2016-09-02T17:24:17Z

The compression ratio is not much better compared to zlib but for the same ratio it is able to process much more data, that's very nice. I am not sure about the compatibility with our supported OS' but it looks well maintained there.

sofar · 2016-09-02T21:52:15Z

I'm suspicious of zstandard's licensing at first, their example code isn't even licensed properly, so you can't even use their example code freely.

I would wait and see in a year or so if this is actually taking off anywhere....

sfan5 · 2016-09-19T19:01:13Z

If we're going the new shiny stuff™ route, we should also consider https://github.com/google/brotli.

kahrl · 2016-10-04T09:07:46Z

For the Germans amongst us, here's a relevant blog post: https://blog.fefe.de/?ts=a90c6783

sfan5 · 2016-10-04T13:44:29Z

Here's a testing branch with brotli support: https://github.com/sfan5/minetest/tree/brotli
All you need is libbrotli (AUR) and it will automatically compress new worlds with brotli.

Do not use this branch with your existing worlds, they will become unreadable to a "normal" Minetest.

est31 · 2016-10-04T14:42:31Z

Nice work, @sfan5 !

I would prefer zstd over brotli, as it seems to be better, unless its proven to perform worse for our use case.

One problem for both brotli and zstd will be however that we'd have to discontinue support for older distros (like ubuntu 12.04, the one @PilzAdam uses) which don't have the library. Both brotli and zstd only first appear in ubuntu 16.04.

Its the same issue as for C++11, all over again :(

Although one could argue that its much much easier to install or download + compile a new library than installing a totally new C++ compiler.

sfan5 · 2016-10-04T15:23:28Z

@est31

One problem for both brotli and zstd will be however that we'd have to discontinue support for older distros

Actually _lib_brotli isn't in any package repositories right now afaik, brotli doesn't have an official API (yet). zstd does so we wouldn't have that problem.

However, if we intend to support ubuntu 12.04 out of the box then we need to bundle brotli/zstd anyway and this won't matter.

nerzhul · 2016-10-04T15:26:32Z

i prefer lz4 than zlib, it's very fast and i already have a working version in my own version :). lz4 is cpu free

est31 · 2016-10-04T17:26:44Z

Bundling might be an idea, yeah. I hate the concept but if we want brotli/zstd any time soon we might be forced to.

Megaf · 2016-10-04T18:01:23Z

We already bundle jsoncpp, lua, gmp and maybe other things too.

nerzhul · 2016-10-04T18:14:07Z

lz4 bundling is easy, 3 files.

sfan5 · 2016-10-04T19:53:22Z

@est31 @nerzhul @Megaf

Edit: see here

Here's the steps I took to compare:

1) Download and extract MG_World.tar.bz2 from https://daconcepts.com/vanessa/hobbies/minetest.html
2) ./minetestserver --world ./MG_World --migrate sqlite3

3) sqlite3 ./MG_World/map.sqlite 'VACUUM;'
4) du -h --si ./MG_World/map.sqlite # note down size (zlib)
5) ./minetestserver --world ./MG_World --decompresstimes # note down bench (zlib)

6) ./minetestserver --world ./MG_World --recompress # note down stats
7) sqlite3 ./MG_World/map.sqlite 'VACUUM;'
8) du -h --si ./MG_World/map.sqlite # note down size (brotli)
9) ./minetestserver --world ./MG_World --decompresstimes # note down bench (brotli)

nerzhul · 2016-10-04T20:05:43Z

World size is not very important , but performance

sfan5 · 2016-10-04T20:38:50Z

@nerzhul
I chose compression quality 6 (same as zlib's default, but that's not important) because it made the compression unittest take exactly the same duration.

Which means that while performance should be about the same while the world is smaller.

Ferk · 2016-10-04T20:44:04Z

@sfan5 is it possible for you to check the duration for the decompression too? I suspect there's a lot more decompression than compression going on.

The numbers in lz4 benchmarks look very attractive, though.

Megaf · 2016-10-04T20:52:25Z

We have to compare those against lz4 now.
And for me the most important thing should be lower cpu usage/time. Size is secondary.
I will be very happy if the world doesnt get any larger.

nerzhul · 2016-10-04T21:50:19Z

@sfan5 noted. I think you should benchmark various mapblock serialization times

sfan5 · 2016-10-04T21:54:41Z

@Ferk @nerzhul

Edit: see here

est31 · 2016-10-07T01:13:43Z

I've ran the test by @sfan5 for my zstd branch. (not the most recent commit, but the commit called "unittest").

I got in fact an increase in space required (944148 K with the zlib compression, and 1062140 K with the zstd compression, an increase by 12%). Note that I didn't use a dictionary, while brotli has a hardcoded dictionary, and for the minetest use case a dictionary would be really useful I think, as it perfectly matches the dictionary use case (few KB of data). So there is certainly room for improvement.

The decompression time however is really really great.

For zstd:

############
Total decompression CPU time:     19.0198s
Decompression CPU time per block: 7.27373us
Decompression speed:              30393 KB/s
############

For zlib:

############
Total decompression CPU time:     72.0594s
Decompression CPU time per block: 27.5577us
Decompression speed:              7037.76 KB/s
############

A bit odd was that when I rewrote the code to use the streaming API, it suddenly got horribly slow (20x decrease). The benchmarks are from using the non streaming API.

sfan5 · 2016-10-07T07:40:58Z

I got in fact an increase in space required. Note that I didn't use a dictionary [...], and for the minetest use case a dictionary would be really useful I think

Brotli supports dictionaries too, I agree that we should try that approach. We might need/want different dicts for block data vs. metadata too.

The problem however is that every application that wants to read the Minetest's data needs the same dictionary and we can't ever change the dict. Unless we add dictionary versioning which would also require that applications keep up with dict updates done in Minetest.

A bit odd was that when I rewrote the code to use the streaming API, it suddenly got horribly slow (20x decrease). The benchmarks are from using the non streaming API.

The numbers from using the non-streaming API don't matter, to compare we need the benchmark while using the final design (streaming API).

sfan5 · 2016-10-07T12:09:16Z

So I did all of the benchmarks again with some improvements I made to the statistics it outputs:

map.sqlite zlib: 820.3 MiB
map.sqlite brotli: 719.4 MiB (12.3% smaller)

--recompress:

2016-10-07 13:51:16: ACTION[Main]: ############
2016-10-07 13:51:16: ACTION[Main]: Recompressed blocks:            2614854
2016-10-07 13:51:16: ACTION[Main]: Total data (uncomp.):           4008 MB
2016-10-07 13:51:16: ACTION[Main]: Total compression CPU time:     388.206s
2016-10-07 13:51:16: ACTION[Main]: Compression CPU time per block: 148.462us
2016-10-07 13:51:16: ACTION[Main]: Compression speed:              10573.3 KB/s
2016-10-07 13:51:16: ACTION[Main]: ############

Decompression benchmark:

2016-10-07 14:05:19: ACTION[Main]: ### zlib ###
2016-10-07 14:05:19: ACTION[Main]: Total blocks:                     2614854
2016-10-07 14:05:19: ACTION[Main]: Total data (comp.):               495 MB
2016-10-07 14:05:19: ACTION[Main]: Total decompression CPU time:     83.7842s
2016-10-07 14:05:19: ACTION[Main]: Decompression CPU time per block: 32.0416us
2016-10-07 14:05:19: ACTION[Main]: Decompression speed:              6052.89 KB/s
2016-10-07 14:05:19: ACTION[Main]: ############
2016-10-07 14:01:56: ACTION[Main]: ## brotli ##
2016-10-07 14:01:56: ACTION[Main]: Total blocks:                     2614854
2016-10-07 14:01:56: ACTION[Main]: Total data (comp.):               406 MB
2016-10-07 14:01:56: ACTION[Main]: Total decompression CPU time:     79.2955s
2016-10-07 14:01:56: ACTION[Main]: Decompression CPU time per block: 30.325us
2016-10-07 14:01:56: ACTION[Main]: Decompression speed:              5246.44 KB/s
2016-10-07 14:01:56: ACTION[Main]: ############

Fixer-007 · 2016-10-07T13:13:21Z

I wonder how faster unpacking will affect stuttering ingame.

est31 · 2016-10-09T00:20:43Z

The numbers from using the non-streaming API don't matter, to compare we need the benchmark while using the final design (streaming API).

Yes, only the final design should matter, which is the streaming API one way or another. However, the non streaming API shouldn't be slower that much, meaning that there is either a bug in zstd, or in the code to use it. So the only numbers matter when/if that bug has been removed.

sfan5 · 2016-10-09T13:13:04Z

@est31

The slowness with using the streaming API might be caused by the need to call unget() on the input stream a lot of times to put unused data back. That wouldn't be easily fixed though.

So the only numbers matter when/if that bug has been removed.

No, currently the only way to compare zlib and whatever else in a fair way is not to fix the bug.

Rogier-5 · 2016-10-21T06:37:34Z

@est31: I did some more investigation using perf and strace, and it turns out the malloc library is actually the culprit! It will repeatedly munmap() a free()d 5MB area of memory, only to mmap() it again on the subsequent malloc() in the zstd code. The original code even mmap()ed and munmap()ed 2 areas of 5MB and one of 2MB for every compression operation... Every time again, at least one of the 5MB areas is zeroed with memset(). How's that for a way to avoid having to turn on the heater when it's cold :-)

One would have thought malloc() was smarter than that (and apply some hysteresis). It may even be considered a bug.

Anyway, this can be fixed (workaround-style: the real problem is obviously with malloc and/or zstd) by using mallopt(). See my latest commit, and the graphs below. The total compression time reduction is now closer to 90% for levels 3 and up.

Comparison graphs for zstd (original code, initial patch, and final code):

New compression graphs (including a few more zstd levels):

It can be seen that in spite of the significant additional compression effort at higher levels, much further reduction is not achieved. The reason is most probably that even at lower levels, much of the data is already compressed from 16KB down to an amount ranging from 20 to at most a few 100 bytes. There simply isn't much more redundancy to squeeze out.

Megaf · 2017-05-03T19:47:08Z

Why was this just closed?

ShadowNinja · 2017-05-04T17:14:57Z

Note that you should also take a look at #3946 if this is implemented. You might be able to get better performance by compressing the whole mapblock instead of compressing several sections separately and there are a number of other improvements to be made if you're going to break the serialization format.

Fixer-007 · 2017-09-18T14:01:38Z

Zstd support (comp and decomp) included in newest linux 4.14 for Btrfs https://lkml.org/lkml/2017/6/29/816 (and for squashfs according to phoronix).

I'm suspicious of zstandard's licensing at first

License was changed 29 days ago. Zstd readme now says "Zstandard is dual-licensed under BSD and GPLv2."
Wikipedia had this:

The reference implementation is licensed under the BSD License, published at GitHub. Since version 1.0, it had additional Grant of Patent Rights.[4]
From version 1.3.1[5] this patent grant was dropped and the license was changed to BSD + GPLv2 dual license[6]
facebook/zstd#801
https://github.com/facebook/zstd/releases/tag/v1.3.1

I think if you want to introduce new compression algo, do it now for 0.5.0. IMO I would prefer speed over disk space, speed advantage that lz4, zstd, etc give is huuuge compared to slightly bigger database size.

nerzhul · 2017-09-18T14:47:30Z

zstd will be supported for ZFS and LZ4 was choosen too

Megaf · 2017-09-19T12:05:47Z

Linux Kernel people say LZO is the fastest.

ghost · 2017-11-17T16:59:49Z

So, what do we select? I think it's good to start voting.

sofar · 2017-11-18T02:23:16Z

On my server, the OS provides AVX512 accelerated zlib libraries, so I strongly prefer to stay on zlib.

There are also negative implications: Alternative compression algorithms are NOT backwards compatible with old releases and should be non-default to begin with. Maps will break and it may be impossible to make downloadable player maps that are widely supported by older clients.

All the talk about "oh the kernel has FOO support" is nonsense because that support IS NOT EXPORTED TO USER SPACE. It won't do minetest any good that your kernel has zstd. Minetest can't use it. You will need to rely on a completely different implementation for user space, and that will probably lack things like decent parallelism and AVX512 acceleration, and therefore lag in performance compared to ... zlib.

sfan5 · 2017-11-18T13:07:49Z

AVX512 accelerated

🤔 🤔 🤔 https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

Alternative compression algorithms are NOT backwards compatible with old releases

Old maps are going to be readable by new Minetest versions as usual.
We can't drop zlib anyway as it's also used for other things.

it may be impossible to make downloadable player maps that are widely supported by older clients

It's certainly possible to write a tool that deserializes maps in and serializes them again in a format understood e.g. by 0.4.16, this could also be integrated into Minetest itself (similar to --migrate).

like decent parallelism and [...]

Since we're mostly compressing 12k bytes at once I doubt threading has much effect (if it is used at all).
Benchmarks comparing how well Intel's zlib does compared to brotli/zstd would be a good idea though.

nerzhul · 2017-11-18T13:25:20Z

And it's pure Intel, arm and AMD cpu doesn't benefit from that.No need to convert the whole map, just add format version to 1 for legacy mapblocks and write new mapblocks using zstd when written or loaded

sofar · 2017-11-24T07:18:20Z

Many Intel optimizations also benefit AMD. Check out the various performance tests that phoronix has run on AMD's threadripper platform where they compare ClearLinux against non-specifically Intel optimized distros - you can decide for yourself based on the data whether "pure intel" (a complete misnomer) is such a bad thing for the competition. Article link because you're lazy: https://www.phoronix.com/scan.php?page=article&item=tr1950x-7900x-3os&num=2

ghost · 2017-11-24T10:51:49Z

I stay on using zlib. It is stable and many processors has optimizations for it.

clavinet · 2019-03-24T23:46:41Z

I stay on using zlib. It is stable and many processors has optimizations for it.

What optimizations exactly? And zstd is faster than zlib in any case, as seen from countless benchmarks.

nerzhul · 2019-03-25T06:27:56Z

no processor has optimization for a such algorithm, zlib and zstd has optimisations for processors.

sofar · 2019-03-25T21:17:48Z

Zstd is faster and equal in size or better than gzip, and in my work projects we are actively implementing zstd compression. So yes, it would have my support.

clavinet · 2019-03-25T21:29:47Z

It also has multithreading support built in which speeds up compression even more at absolutely no loss of compression ratio from my tests at least.

sofar · 2019-03-25T21:50:26Z

It also has multithreading support built in which speeds up compression

Minetest should already multithread mapblocks saves, but this isn't as much of a factor of importance IMHO

even more at absolutely no loss of compression ratio from my tests at least.

As I already stated. The biggest point is that it's faster to compress to the same size (and decompress too)

Baigle · 2021-03-18T00:44:58Z

I took the liberty of documenting my #engine post from the Discord server so it isn't lost in an endless stream of comments.

Zstandard can perform significantly better than almost any other option with training. This YouTube video by Stanford Research Talks highlights a number of gains by using an unmutable, locked and always memory-loaded custom dictionary for referencing small and large data. They claim that the compression ratio can be improved by 3.5x to 10x, the compression speed by 3.6x over baseline, and decompression speed by 5.4x over baseline when using these features.

Alternative compression/decompression algorithms can be discovered by keeping up-to-date on encode.su, the Hutter Challenge groups and Matthew Mahoney's ranking page, and on the GDCC (ru). Many algorithms based off of L-STM, Transformer/Few-shot learning, and Attention Matrices may work better in the future when these are more stable and developed.

See also similar closed issue #3946, where JPEG XL is suggested for 3D mapblock data compression. JPEG XL is a standard which incorporated some of Alexander Rhatushnyak's algorithms, which he still hold the best overall ratio-to-performance ranking of any contestant in the Hutter Competition. Generally, whether it is cryptanalysis, compression, or social prediction models, adding dimensions and ways for the solver to look at the problem can significantly improve performance. On heavy servers, such a system may begin to learn and differentiate between player-built architectural designs, and how player interact and cause modifications. It may enhance cheat detection and prevention, and trivialize some lag sources which has previously appeared chaotic and random.

The custom dictionary may function better if it is per-biome, or we can fiddle around with it to see what works best. Can anyone explain to me how I would start figuring that out? I'm still a noob when it comes to coding.

sfan5 · 2021-09-24T10:47:58Z

Done in #10788

Megaf · 2021-09-25T12:22:39Z

Glad it was done! Thanks!

est31 added Enhancement @ Server / Client / Env. labels Sep 2, 2016

sfan5 added the Non-trivial A large amount of work is required to address this (sometimes to the point of being infeasible) label Sep 2, 2016

sfan5 removed the Non-trivial A large amount of work is required to address this (sometimes to the point of being infeasible) label Oct 4, 2016

sfan5 added the Protocol bump label Oct 4, 2016

paramat removed the Enhancement label Oct 29, 2016

est31 closed this as completed Apr 25, 2017

sfan5 reopened this May 3, 2017

SmallJoker mentioned this issue Oct 21, 2017

PNG filtering for mapblock compression #6549

Closed

paramat added Core dev consideration needed Concept approved Approved by a core dev: PRs welcomed! labels Mar 25, 2019

paramat removed the Core dev consideration needed label Apr 21, 2019

sfan5 added the Discussion Issues meant for discussion of one or more proposals label May 13, 2020

sfan5 removed the Discussion Issues meant for discussion of one or more proposals label Sep 24, 2021

sfan5 closed this as completed Sep 24, 2021

Using Zstandard for mapblock compression? #4495

Using Zstandard for mapblock compression? #4495

Comments

est31 commented Sep 2, 2016

Rogier-5 commented Sep 2, 2016

SmallJoker commented Sep 2, 2016

sofar commented Sep 2, 2016

sfan5 commented Sep 19, 2016

kahrl commented Oct 4, 2016

sfan5 commented Oct 4, 2016

est31 commented Oct 4, 2016

sfan5 commented Oct 4, 2016 • edited Loading

nerzhul commented Oct 4, 2016

est31 commented Oct 4, 2016 • edited Loading

Megaf commented Oct 4, 2016

nerzhul commented Oct 4, 2016

sfan5 commented Oct 4, 2016 • edited Loading

nerzhul commented Oct 4, 2016

sfan5 commented Oct 4, 2016

Ferk commented Oct 4, 2016 • edited Loading

Megaf commented Oct 4, 2016

nerzhul commented Oct 4, 2016

sfan5 commented Oct 4, 2016 • edited Loading

est31 commented Oct 7, 2016 • edited Loading

sfan5 commented Oct 7, 2016 • edited Loading

sfan5 commented Oct 7, 2016

Fixer-007 commented Oct 7, 2016

est31 commented Oct 9, 2016

sfan5 commented Oct 9, 2016 • edited Loading

Rogier-5 commented Oct 21, 2016

Megaf commented May 3, 2017

ShadowNinja commented May 4, 2017

Fixer-007 commented Sep 18, 2017 • edited Loading

nerzhul commented Sep 18, 2017

Megaf commented Sep 19, 2017

ghost commented Nov 17, 2017 • edited by ghost Loading

sofar commented Nov 18, 2017

sfan5 commented Nov 18, 2017 • edited Loading

nerzhul commented Nov 18, 2017 via email

sofar commented Nov 24, 2017

ghost commented Nov 24, 2017

clavinet commented Mar 24, 2019 • edited Loading

nerzhul commented Mar 25, 2019

sofar commented Mar 25, 2019 via email

clavinet commented Mar 25, 2019

sofar commented Mar 25, 2019 via email

Baigle commented Mar 18, 2021 • edited Loading

sfan5 commented Sep 24, 2021

Megaf commented Sep 25, 2021

sfan5 commented Oct 4, 2016 •

edited

Loading

est31 commented Oct 4, 2016 •

edited

Loading

sfan5 commented Oct 4, 2016 •

edited

Loading

Ferk commented Oct 4, 2016 •

edited

Loading

sfan5 commented Oct 4, 2016 •

edited

Loading

est31 commented Oct 7, 2016 •

edited

Loading

sfan5 commented Oct 7, 2016 •

edited

Loading

sfan5 commented Oct 9, 2016 •

edited

Loading

Fixer-007 commented Sep 18, 2017 •

edited

Loading

ghost commented Nov 17, 2017 •

edited by ghost

Loading

sfan5 commented Nov 18, 2017 •

edited

Loading

clavinet commented Mar 24, 2019 •

edited

Loading

Baigle commented Mar 18, 2021 •

edited

Loading