-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add zstd minetest.compress support #12515
Conversation
Although this is what the old code did, I don't think it's desirable. Your tests are failing, btw. Of note: the |
That was most certainly an accurate note. Turns out the problem was Lua 5.1 doesn't support string hex escapes. |
local compressed = core.compress(text, method) | ||
assert(core.decompress(compressed, method) == text, "input/output mismatch for compression method " .. method) | ||
if method == "zstd" then | ||
assert(compressed:sub(1, 4) == "\040\181\047\253", "zstd magic number not in zstd output") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you wanted to use hex here, you could use string.char(0x..., 0x..., 0x..., 0x...)
. You could shorten the assertions to assert((method == "zstd") == (compressed:sub(1, 4) == "\040\181\047\253"))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've partially introduced these changes in 1032f7b but I'm keeping differing error messages depending on the exact failure (i.e. specific method) unless explicitly told otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good (note that my approval is worthless since I'm not a core dev, so take my reviews with a grain of salt ;))
888358e should handle this batch of review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relatively clean implementation, but is missing important information about zstd compression compared to zlib/deflate. If you do not want to get too technical, try “if your input is measured in kilobytes, use deflate and if it is measured in megabytes, uze zstd”. There exist other reasons for worse-than-zlib-with-default-settings scenarios, but those are mostly not relevant for Minetest.
It might help to illustrate these pitfalls if you included an example in which zstd decisively beats zlib on size of the compressed string … so far there is only an example where zlib produces the smaller output, if I am not mistaken.
Examples for when exactly zstd can be worse than deflate can be found on the zstd bug tracker:
facebook/zstd#256 (data aligned to 24 bits compresses confuses zstd)
facebook/zstd#1134 (binary blobs up to 4kb compress better using zlib)
facebook/zstd#1325 (large little-endian integers compress better with zlib)
facebook/zstd#2832 (example payloads that zlib compresses better: 138kb zlib vs 234kb zstd, 5.9mb zlib vs 6.4mb zstd, …)
I also think it is important to warn about the low compression speed, high memory requirements and sometimes even worse compression of higher zstd compression levels, so that APi users do not always crank it up to maximum. The difference between something taking 0,01s and 1s could be noticeable in gameplay terms.
Personally, I do not see how zstd API without dictionary support is particularly useful – but I guess if mod devs need faster decompression, want to compress very big files or are willing to wait a long time for 10% better compressed files,, this provides an option to them. I am curious though, so far I have not encountered any mods that had performance issues with decompression.
Are you opposed to adding the dictionary support on principle or did you just not bother?
are: | ||
* Deflate: `level` - Compression level, `0`-`9` or `nil`. | ||
* Zstandard: `level` - Compression level. Integer or `nil`. | ||
Note any supported Zstandard compression level could be used here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should probably mention that Zstd beats other algorithms not on how well data is compressed, but on the speed of decompression with a wide range of compression options. If this is not noted, mod and game developers may assume that using zstd compression is always better. With default compression, zstd yields ssize-savings similar to zlib/deflate (it is worse up until some boundary, then gets better).
For inputs up to at least 4 kilobytes and even (in my tests) up to several 100 kilobytes at default compression level, zstd is likely to compress textual and biinary data about 5% worse than zlib … unless a pre-trained dictionary is used, which this API does not seem to provide. This is a WONTFIX from the zstd devs, as they are not interested in small files, but mods developers often do want to compress small payloads (e.g. bitmaps, item meta, json)..
Modders should also be warned that while increasing the Zstd compression level can make the compressed output smaller, but can make the zstd encoder take over 20 times or – at the highest compression levels – over 100 times as long as the default level makes it take. You can try this out yourself: Compressing a 94 kilobyte HTML page took <0.01s at default compression level on my machine, but took >1s at one of the highest zstd compression levels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...Since when was lua_api.txt
meant to contain a detailed in-depth comparison between compression formats?
* `...` indicates method-specific arguments. Currently defined arguments | ||
are: | ||
* Deflate: `level` - Compression level, `0`-`9` or `nil`. | ||
* Zstandard: `level` - Compression level. Integer or `nil`. | ||
Note any supported Zstandard compression level could be used here, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should probably mention that Zstd beats other algorithms not on how well data is compressed, but on the speed of decompression with a wide range of compression options. If this is not noted, mod and game developers may assume that using zstd compression is always better. With default compression, zstd yields ssize-savings similar to zlib/deflate (it is worse up until some boundary, then gets better).
For inputs up to at least 4 kilobytes and even (in my tests) up to several 100 kilobytes at default compression level, zstd is likely to compress textual and biinary data about 5% worse than zlib … unless a pre-trained dictionary is used, which this API does not seem to provide. This is a WONTFIX from the zstd devs, as they are not interested in small files, but mods developers often do want to compress small payloads (e.g. bitmaps, item meta, json)..
Modders should also be warned that while increasing the Zstd compression level can make the compressed output smaller, but can make the zstd encoder take over 20 times or – at the highest compression levels – over 100 times as long as the default level makes it take. You can try this out yourself: Compressing a 94 kilobyte HTML page took <0.01s at default compression level on my machine, but took >1s at one of the highest zstd compression levels.
doc/client_lua_api.txt
Outdated
* `...` indicates method-specific arguments. Currently defined arguments | ||
are: | ||
* Deflate: `level` - Compression level, `0`-`9` or `nil`. | ||
* Zstandard: `level` - Compression level. Integer or `nil`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does compression level nil
mean?
doc/lua_api.txt
Outdated
* `...` indicates method-specific arguments. Currently defined arguments | ||
are: | ||
* Deflate: `level` - Compression level, `0`-`9` or `nil`. | ||
* Zstandard: `level` - Compression level. Integer or `nil`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does compression level nil
mean?
Ok, so this has officially (by my personal definition of 'officially') become messy. I didn't implement dictionary support because the wrapping for it's use by the engine isn't in place yet as I understand it and minetest.compress's design seems to be more about "just dropping in a compression algorithm" than specific details (a DEFLATE compressor could theoretically be fed a dictionary, but this does not occur and is not supported) The text on the compression level has been through at least three iterations:
EDIT: I'm attaching further iterations here:
|
It looks to me that this seems to be the case indeed; you just seem to be the first person to attempt adding one. This would also be the case if zstd outperformed deflate on every metric all the time, which it obviously does not. I must admit that I assume that developers just want better compression at about the same speed. I assert that the inherent complexity of these trade-offs is what makes the whole documentation issue so messy. Unfortunately, I can only give examples where zstd is not useful, because my inputs are always below 64 kilobytes:
Maybe some other mod developers can speak up about for which purpose they intend to use zstd instead of deflate? |
Some days ago @rubenwardy did a search for usage of |
I use it for my schematics: https://github.com/appgurueu/modlib/blob/318e737a086c6e1bce8ba6783979e03afa9b3f96/minetest/schematic.lua#L170-L185. Using Zstd instead might be advantageous for the same reasons it was used to compress mapblocks for storage in the database (although TBF schematics aren't read or written as often as mapblocks). |
I consider this PR complete and ready for merge from my side and I doubt any other coredev is going to agree that lua_api.txt should contain detailed compression algorithm advice. |
@appgurueu are you just assuming or have you actually benchmarked the schematics case? Or is benchmarking not necessary because this is simply “I am willing to spend much more time on compression to get a 10% smaller file”, as schematics are much more often decompressed than compressed? @sfan5 I agree that detailed advice would be overkill. Nevertheless, people are going to wonder whether they should use the new option. I do not think that “to get the smallest results with minetest.compress(), use zlib to compress smaller inputs and use zstd to compress larger, multi-megabyte-inputs” is too detailed. Do you seriously think that such a hint constitutes too much detail? |
I haven't deeply considered Zstd use cases, but I think it should be exposed since the engine uses it internally already. |
That sounds like a really weak argument, given that: a) The already existing engine-internal zstd use is backed up by demonstrated space savings (or so I assume). @luatic so what advantages exactly did you have in mind, when you mentioned “modders compressing their data lose out on the advantages Zstd brings over Zlib (more efficient compression both in terms of size & speed).” in #12512? I seriously do not get it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea, and would like to see it in Minetest.
Since 5.6.0 has been released, I'm assuming this is on-track for 5.6.1 or 5.7.0? |
5.6.1 (if it happens at all) would be a patch release, 5.7.0 will be the next release adding features (such as this one) (assuming no breaking release is made anytime soon) |
Ok, fixed the problem as of eff6bc6 |
@20kdc rebase needed |
Thank you for the reminder, will prepare it now |
eff6bc6
to
6847428
Compare
A rebase was requested and a rebase has been performed |
@20kdc Please rebase again. |
TFM: The First Month Contains the following commits: + Add zstd compression support + copy over updates from server to client lua API + Just to prevent accidentally mixing up deflate and zstd, verify non-zstd compression does not produce a zstd magic number + Remove hex escapes and replace them with decimal escapes. Also show warnings on bad compression methods. + Apply requested changes (documentation fix, unit test cleanup, enum in l_util.cpp) + Remove warning for outdated server in compress unit test + Better-document zstd level, switch to string_to_enum for compression method detection + code style and such + Change documentation to not describe known-supported zstd compression levels + Documentation update for zstd stuff with tentative "5.6.1" name + Zstd compression feature is not 5.6.1, but 5.7.0
6847428
to
4fa0b2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Resolves #12512 - this provides the rationale and so forth.
Note that for possible compatibility reasons an unrecognized compression method is assumed to be
deflate
. I'm not sure if this element is necessary but it seemed a good safety measure.This PR is Ready for Review.
How to test
A unit test has been added to the
devtest
subgame.The unit test doesn't check the magic number of zlib as it could be inconsistent, but for ZSTD the magic number is checked to ensure the right method is used, and for zlib it's checked that zstd's magic number is not present, and both methods are checked to determine "what goes in comes out".