Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Zstandard for mapblock compression? #4495

Closed
est31 opened this issue Sep 2, 2016 · 54 comments
Closed

Using Zstandard for mapblock compression? #4495

est31 opened this issue Sep 2, 2016 · 54 comments
Labels

Comments

@est31
Copy link
Contributor

est31 commented Sep 2, 2016

https://github.com/facebook/zstd

We could train zstd for minetest, or use multiple dictionaries per map.

@Rogier-5
Copy link
Contributor

Rogier-5 commented Sep 2, 2016

Thumbs up for replacing zlib !

Speed tests I did for minetestmapper a while ago showed that about 50% (IIRC) of the running time was actually spent in zlib code, decompressing map data...

As the comparison table on the zstd page shows about 25% worse compression for lz4, but with the benefit of a 2-3 times speed increase, I'd welcome an implementation that allows different compression algorithms to be used, so that depending on CPU speed, disk speed and disk size, a server operator can choose for faster or for more compact compression. The actual algorithm that was used should obviously be encoded in a block's data.

At the same time, instead of only compressing the node data, all of the block's data could be compressed. IIRC, one important reason LevelDB databases are often smaller, is that LevelDB uses snappy to compress all data it stores.

@SmallJoker
Copy link
Member

The compression ratio is not much better compared to zlib but for the same ratio it is able to process much more data, that's very nice. I am not sure about the compatibility with our supported OS' but it looks well maintained there.

@sfan5 sfan5 added the Non-trivial A large amount of work is required to address this (sometimes to the point of being infeasible) label Sep 2, 2016
@sofar
Copy link
Contributor

sofar commented Sep 2, 2016

I'm suspicious of zstandard's licensing at first, their example code isn't even licensed properly, so you can't even use their example code freely.

I would wait and see in a year or so if this is actually taking off anywhere....

@sfan5
Copy link
Member

sfan5 commented Sep 19, 2016

If we're going the new shiny stuff™ route, we should also consider https://github.com/google/brotli.

@kahrl
Copy link
Contributor

kahrl commented Oct 4, 2016

For the Germans amongst us, here's a relevant blog post: https://blog.fefe.de/?ts=a90c6783

@sfan5
Copy link
Member

sfan5 commented Oct 4, 2016

Here's a testing branch with brotli support: https://github.com/sfan5/minetest/tree/brotli
All you need is libbrotli (AUR) and it will automatically compress new worlds with brotli.

Do not use this branch with your existing worlds, they will become unreadable to a "normal" Minetest.

@sfan5 sfan5 removed the Non-trivial A large amount of work is required to address this (sometimes to the point of being infeasible) label Oct 4, 2016
@est31
Copy link
Contributor Author

est31 commented Oct 4, 2016

Nice work, @sfan5 !

I would prefer zstd over brotli, as it seems to be better, unless its proven to perform worse for our use case.

One problem for both brotli and zstd will be however that we'd have to discontinue support for older distros (like ubuntu 12.04, the one @PilzAdam uses) which don't have the library. Both brotli and zstd only first appear in ubuntu 16.04.

Its the same issue as for C++11, all over again :(

Although one could argue that its much much easier to install or download + compile a new library than installing a totally new C++ compiler.

@sfan5
Copy link
Member

sfan5 commented Oct 4, 2016

@est31

One problem for both brotli and zstd will be however that we'd have to discontinue support for older distros

Actually _lib_brotli isn't in any package repositories right now afaik, brotli doesn't have an official API (yet). zstd does so we wouldn't have that problem.

However, if we intend to support ubuntu 12.04 out of the box then we need to bundle brotli/zstd anyway and this won't matter.

@nerzhul
Copy link
Member

nerzhul commented Oct 4, 2016

i prefer lz4 than zlib, it's very fast and i already have a working version in my own version :). lz4 is cpu free

@est31
Copy link
Contributor Author

est31 commented Oct 4, 2016

Bundling might be an idea, yeah. I hate the concept but if we want brotli/zstd any time soon we might be forced to.

@Megaf
Copy link
Contributor

Megaf commented Oct 4, 2016

We already bundle jsoncpp, lua, gmp and maybe other things too.

@nerzhul
Copy link
Member

nerzhul commented Oct 4, 2016

lz4 bundling is easy, 3 files.

@sfan5
Copy link
Member

sfan5 commented Oct 4, 2016

@est31 @nerzhul @Megaf

Edit: see here

Here's the steps I took to compare:

1) Download and extract MG_World.tar.bz2 from https://daconcepts.com/vanessa/hobbies/minetest.html
2) ./minetestserver --world ./MG_World --migrate sqlite3

3) sqlite3 ./MG_World/map.sqlite 'VACUUM;'
4) du -h --si ./MG_World/map.sqlite # note down size (zlib)
5) ./minetestserver --world ./MG_World --decompresstimes # note down bench (zlib)

6) ./minetestserver --world ./MG_World --recompress # note down stats
7) sqlite3 ./MG_World/map.sqlite 'VACUUM;'
8) du -h --si ./MG_World/map.sqlite # note down size (brotli)
9) ./minetestserver --world ./MG_World --decompresstimes # note down bench (brotli)

@nerzhul
Copy link
Member

nerzhul commented Oct 4, 2016

World size is not very important , but performance

@sfan5
Copy link
Member

sfan5 commented Oct 4, 2016

@nerzhul
I chose compression quality 6 (same as zlib's default, but that's not important) because it made the compression unittest take exactly the same duration.

Which means that while performance should be about the same while the world is smaller.

@Ferk
Copy link
Contributor

Ferk commented Oct 4, 2016

@sfan5 is it possible for you to check the duration for the decompression too? I suspect there's a lot more decompression than compression going on.

The numbers in lz4 benchmarks look very attractive, though.

@Megaf
Copy link
Contributor

Megaf commented Oct 4, 2016

We have to compare those against lz4 now.
And for me the most important thing should be lower cpu usage/time. Size is secondary.
I will be very happy if the world doesnt get any larger.

@nerzhul
Copy link
Member

nerzhul commented Oct 4, 2016

@sfan5 noted. I think you should benchmark various mapblock serialization times

@sfan5
Copy link
Member

sfan5 commented Oct 4, 2016

@Ferk @nerzhul

Edit: see here

@est31
Copy link
Contributor Author

est31 commented Oct 7, 2016

I've ran the test by @sfan5 for my zstd branch. (not the most recent commit, but the commit called "unittest").

I got in fact an increase in space required (944148 K with the zlib compression, and 1062140 K with the zstd compression, an increase by 12%). Note that I didn't use a dictionary, while brotli has a hardcoded dictionary, and for the minetest use case a dictionary would be really useful I think, as it perfectly matches the dictionary use case (few KB of data). So there is certainly room for improvement.

The decompression time however is really really great.

For zstd:

############
Total decompression CPU time:     19.0198s
Decompression CPU time per block: 7.27373us
Decompression speed:              30393 KB/s
############

For zlib:

############
Total decompression CPU time:     72.0594s
Decompression CPU time per block: 27.5577us
Decompression speed:              7037.76 KB/s
############

A bit odd was that when I rewrote the code to use the streaming API, it suddenly got horribly slow (20x decrease). The benchmarks are from using the non streaming API.

@sfan5
Copy link
Member

sfan5 commented Oct 7, 2016

I got in fact an increase in space required. Note that I didn't use a dictionary [...], and for the minetest use case a dictionary would be really useful I think

Brotli supports dictionaries too, I agree that we should try that approach. We might need/want different dicts for block data vs. metadata too.

The problem however is that every application that wants to read the Minetest's data needs the same dictionary and we can't ever change the dict. Unless we add dictionary versioning which would also require that applications keep up with dict updates done in Minetest.

A bit odd was that when I rewrote the code to use the streaming API, it suddenly got horribly slow (20x decrease). The benchmarks are from using the non streaming API.

The numbers from using the non-streaming API don't matter, to compare we need the benchmark while using the final design (streaming API).

@sfan5
Copy link
Member

sfan5 commented Oct 7, 2016

So I did all of the benchmarks again with some improvements I made to the statistics it outputs:

map.sqlite zlib: 820.3 MiB
map.sqlite brotli: 719.4 MiB (12.3% smaller)

--recompress:

2016-10-07 13:51:16: ACTION[Main]: ############
2016-10-07 13:51:16: ACTION[Main]: Recompressed blocks:            2614854
2016-10-07 13:51:16: ACTION[Main]: Total data (uncomp.):           4008 MB
2016-10-07 13:51:16: ACTION[Main]: Total compression CPU time:     388.206s
2016-10-07 13:51:16: ACTION[Main]: Compression CPU time per block: 148.462us
2016-10-07 13:51:16: ACTION[Main]: Compression speed:              10573.3 KB/s
2016-10-07 13:51:16: ACTION[Main]: ############

Decompression benchmark:

2016-10-07 14:05:19: ACTION[Main]: ### zlib ###
2016-10-07 14:05:19: ACTION[Main]: Total blocks:                     2614854
2016-10-07 14:05:19: ACTION[Main]: Total data (comp.):               495 MB
2016-10-07 14:05:19: ACTION[Main]: Total decompression CPU time:     83.7842s
2016-10-07 14:05:19: ACTION[Main]: Decompression CPU time per block: 32.0416us
2016-10-07 14:05:19: ACTION[Main]: Decompression speed:              6052.89 KB/s
2016-10-07 14:05:19: ACTION[Main]: ############
2016-10-07 14:01:56: ACTION[Main]: ## brotli ##
2016-10-07 14:01:56: ACTION[Main]: Total blocks:                     2614854
2016-10-07 14:01:56: ACTION[Main]: Total data (comp.):               406 MB
2016-10-07 14:01:56: ACTION[Main]: Total decompression CPU time:     79.2955s
2016-10-07 14:01:56: ACTION[Main]: Decompression CPU time per block: 30.325us
2016-10-07 14:01:56: ACTION[Main]: Decompression speed:              5246.44 KB/s
2016-10-07 14:01:56: ACTION[Main]: ############

@Fixer-007
Copy link
Contributor

I wonder how faster unpacking will affect stuttering ingame.

@est31
Copy link
Contributor Author

est31 commented Oct 9, 2016

The numbers from using the non-streaming API don't matter, to compare we need the benchmark while using the final design (streaming API).

Yes, only the final design should matter, which is the streaming API one way or another. However, the non streaming API shouldn't be slower that much, meaning that there is either a bug in zstd, or in the code to use it. So the only numbers matter when/if that bug has been removed.

@sfan5
Copy link
Member

sfan5 commented Oct 9, 2016

@est31

The slowness with using the streaming API might be caused by the need to call unget() on the input stream a lot of times to put unused data back. That wouldn't be easily fixed though.

So the only numbers matter when/if that bug has been removed.

No, currently the only way to compare zlib and whatever else in a fair way is not to fix the bug.

@Rogier-5
Copy link
Contributor

@est31: I did some more investigation using perf and strace, and it turns out the malloc library is actually the culprit! It will repeatedly munmap() a free()d 5MB area of memory, only to mmap() it again on the subsequent malloc() in the zstd code. The original code even mmap()ed and munmap()ed 2 areas of 5MB and one of 2MB for every compression operation... Every time again, at least one of the 5MB areas is zeroed with memset(). How's that for a way to avoid having to turn on the heater when it's cold :-)

One would have thought malloc() was smarter than that (and apply some hysteresis). It may even be considered a bug.

Anyway, this can be fixed (workaround-style: the real problem is obviously with malloc and/or zstd) by using mallopt(). See my latest commit, and the graphs below. The total compression time reduction is now closer to 90% for levels 3 and up.

Comparison graphs for zstd (original code, initial patch, and final code):

plot-recompress-comp_blocktime

New compression graphs (including a few more zstd levels):

plot-recompress-comp_blocktime
plot-decompress-block_size
plot-decompress-comp_blocktime

It can be seen that in spite of the significant additional compression effort at higher levels, much further reduction is not achieved. The reason is most probably that even at lower levels, much of the data is already compressed from 16KB down to an amount ranging from 20 to at most a few 100 bytes. There simply isn't much more redundancy to squeeze out.

@Megaf
Copy link
Contributor

Megaf commented May 3, 2017

Why was this just closed?

@sfan5 sfan5 reopened this May 3, 2017
@ShadowNinja
Copy link
Member

Note that you should also take a look at #3946 if this is implemented. You might be able to get better performance by compressing the whole mapblock instead of compressing several sections separately and there are a number of other improvements to be made if you're going to break the serialization format.

@Fixer-007
Copy link
Contributor

Fixer-007 commented Sep 18, 2017

Zstd support (comp and decomp) included in newest linux 4.14 for Btrfs https://lkml.org/lkml/2017/6/29/816 (and for squashfs according to phoronix).

I'm suspicious of zstandard's licensing at first

License was changed 29 days ago. Zstd readme now says "Zstandard is dual-licensed under BSD and GPLv2."
Wikipedia had this:

The reference implementation is licensed under the BSD License, published at GitHub. Since version 1.0, it had additional Grant of Patent Rights.[4]
From version 1.3.1[5] this patent grant was dropped and the license was changed to BSD + GPLv2 dual license[6]
facebook/zstd#801
https://github.com/facebook/zstd/releases/tag/v1.3.1

I think if you want to introduce new compression algo, do it now for 0.5.0. IMO I would prefer speed over disk space, speed advantage that lz4, zstd, etc give is huuuge compared to slightly bigger database size.

@nerzhul
Copy link
Member

nerzhul commented Sep 18, 2017

zstd will be supported for ZFS and LZ4 was choosen too

@Megaf
Copy link
Contributor

Megaf commented Sep 19, 2017

Linux Kernel people say LZO is the fastest.

@ghost
Copy link

ghost commented Nov 17, 2017

So, what do we select? I think it's good to start voting.

@sofar
Copy link
Contributor

sofar commented Nov 18, 2017

On my server, the OS provides AVX512 accelerated zlib libraries, so I strongly prefer to stay on zlib.

There are also negative implications: Alternative compression algorithms are NOT backwards compatible with old releases and should be non-default to begin with. Maps will break and it may be impossible to make downloadable player maps that are widely supported by older clients.

All the talk about "oh the kernel has FOO support" is nonsense because that support IS NOT EXPORTED TO USER SPACE. It won't do minetest any good that your kernel has zstd. Minetest can't use it. You will need to rely on a completely different implementation for user space, and that will probably lack things like decent parallelism and AVX512 acceleration, and therefore lag in performance compared to ... zlib.

@sfan5
Copy link
Member

sfan5 commented Nov 18, 2017

AVX512 accelerated

🤔 🤔 🤔 https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/

Alternative compression algorithms are NOT backwards compatible with old releases

Old maps are going to be readable by new Minetest versions as usual.
We can't drop zlib anyway as it's also used for other things.

it may be impossible to make downloadable player maps that are widely supported by older clients

It's certainly possible to write a tool that deserializes maps in and serializes them again in a format understood e.g. by 0.4.16, this could also be integrated into Minetest itself (similar to --migrate).

like decent parallelism and [...]

Since we're mostly compressing 12k bytes at once I doubt threading has much effect (if it is used at all).
Benchmarks comparing how well Intel's zlib does compared to brotli/zstd would be a good idea though.

@nerzhul
Copy link
Member

nerzhul commented Nov 18, 2017 via email

@sofar
Copy link
Contributor

sofar commented Nov 24, 2017

Many Intel optimizations also benefit AMD. Check out the various performance tests that phoronix has run on AMD's threadripper platform where they compare ClearLinux against non-specifically Intel optimized distros - you can decide for yourself based on the data whether "pure intel" (a complete misnomer) is such a bad thing for the competition. Article link because you're lazy: https://www.phoronix.com/scan.php?page=article&item=tr1950x-7900x-3os&num=2

@ghost
Copy link

ghost commented Nov 24, 2017

I stay on using zlib. It is stable and many processors has optimizations for it.

@clavinet
Copy link

clavinet commented Mar 24, 2019

I stay on using zlib. It is stable and many processors has optimizations for it.

What optimizations exactly? And zstd is faster than zlib in any case, as seen from countless benchmarks.

@nerzhul
Copy link
Member

nerzhul commented Mar 25, 2019

no processor has optimization for a such algorithm, zlib and zstd has optimisations for processors.

@paramat paramat added Core dev consideration needed Concept approved Approved by a core dev: PRs welcomed! labels Mar 25, 2019
@sofar
Copy link
Contributor

sofar commented Mar 25, 2019 via email

@clavinet
Copy link

It also has multithreading support built in which speeds up compression even more at absolutely no loss of compression ratio from my tests at least.

@sofar
Copy link
Contributor

sofar commented Mar 25, 2019 via email

@sfan5 sfan5 added the Discussion Issues meant for discussion of one or more proposals label May 13, 2020
@Baigle
Copy link

Baigle commented Mar 18, 2021

I took the liberty of documenting my #engine post from the Discord server so it isn't lost in an endless stream of comments.

Zstandard can perform significantly better than almost any other option with training. This YouTube video by Stanford Research Talks highlights a number of gains by using an unmutable, locked and always memory-loaded custom dictionary for referencing small and large data. They claim that the compression ratio can be improved by 3.5x to 10x, the compression speed by 3.6x over baseline, and decompression speed by 5.4x over baseline when using these features.

Alternative compression/decompression algorithms can be discovered by keeping up-to-date on encode.su, the Hutter Challenge groups and Matthew Mahoney's ranking page, and on the GDCC (ru). Many algorithms based off of L-STM, Transformer/Few-shot learning, and Attention Matrices may work better in the future when these are more stable and developed.

See also similar closed issue #3946, where JPEG XL is suggested for 3D mapblock data compression. JPEG XL is a standard which incorporated some of Alexander Rhatushnyak's algorithms, which he still hold the best overall ratio-to-performance ranking of any contestant in the Hutter Competition. Generally, whether it is cryptanalysis, compression, or social prediction models, adding dimensions and ways for the solver to look at the problem can significantly improve performance. On heavy servers, such a system may begin to learn and differentiate between player-built architectural designs, and how player interact and cause modifications. It may enhance cheat detection and prevention, and trivialize some lag sources which has previously appeared chaotic and random.

The custom dictionary may function better if it is per-biome, or we can fiddle around with it to see what works best. Can anyone explain to me how I would start figuring that out? I'm still a noob when it comes to coding.

@sfan5 sfan5 removed the Discussion Issues meant for discussion of one or more proposals label Sep 24, 2021
@sfan5
Copy link
Member

sfan5 commented Sep 24, 2021

Done in #10788

@sfan5 sfan5 closed this as completed Sep 24, 2021
@Megaf
Copy link
Contributor

Megaf commented Sep 25, 2021

Glad it was done! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests