Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zstd minetest.compress support #12515

Merged
merged 1 commit into from
Sep 28, 2022

Conversation

20kdc
Copy link
Contributor

@20kdc 20kdc commented Jul 5, 2022

Resolves #12512 - this provides the rationale and so forth.

Note that for possible compatibility reasons an unrecognized compression method is assumed to be deflate. I'm not sure if this element is necessary but it seemed a good safety measure.

This PR is Ready for Review.

How to test

A unit test has been added to the devtest subgame.

The unit test doesn't check the magic number of zlib as it could be inconsistent, but for ZSTD the magic number is checked to ensure the right method is used, and for zlib it's checked that zstd's magic number is not present, and both methods are checked to determine "what goes in comes out".

@20kdc 20kdc changed the title Add zstd compression support Add zstd minetest.compress support Jul 5, 2022
@sfan5 sfan5 added Feature ✨ PRs that add or enhance a feature @ Script API Concept approved Approved by a core dev: PRs welcomed! labels Jul 5, 2022
@sfan5
Copy link
Member

sfan5 commented Jul 5, 2022

Note that for possible compatibility reasons an unrecognized compression method is assumed to be deflate.

Although this is what the old code did, I don't think it's desirable.
At the very least it should throw a warning and still go through, but IMO it's better to have it error entirely. Mods written to match the documentation will have no issues with this.

Your tests are failing, btw. Of note: the clang_3_9 run does not use LuaJIT.

@20kdc
Copy link
Contributor Author

20kdc commented Jul 5, 2022

That was most certainly an accurate note. Turns out the problem was Lua 5.1 doesn't support string hex escapes.
I've also implemented a warning - I'm conflicted as to if it should error entirely, it seems compatibility-breaking to do that.
Also, while it's not documented as being such, I've kept the method parameter optional for similar compatibility reasons.

doc/lua_api.txt Outdated Show resolved Hide resolved
local compressed = core.compress(text, method)
assert(core.decompress(compressed, method) == text, "input/output mismatch for compression method " .. method)
if method == "zstd" then
assert(compressed:sub(1, 4) == "\040\181\047\253", "zstd magic number not in zstd output")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to use hex here, you could use string.char(0x..., 0x..., 0x..., 0x...). You could shorten the assertions to assert((method == "zstd") == (compressed:sub(1, 4) == "\040\181\047\253")).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've partially introduced these changes in 1032f7b but I'm keeping differing error messages depending on the exact failure (i.e. specific method) unless explicitly told otherwise.

Copy link
Contributor

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good (note that my approval is worthless since I'm not a core dev, so take my reviews with a grain of salt ;))

doc/lua_api.txt Outdated Show resolved Hide resolved
doc/lua_api.txt Outdated Show resolved Hide resolved
src/script/lua_api/l_util.cpp Outdated Show resolved Hide resolved
src/script/lua_api/l_util.cpp Show resolved Hide resolved
@20kdc
Copy link
Contributor Author

20kdc commented Jul 9, 2022

888358e should handle this batch of review comments.

doc/lua_api.txt Outdated Show resolved Hide resolved
Copy link
Contributor

@erlehmann erlehmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively clean implementation, but is missing important information about zstd compression compared to zlib/deflate. If you do not want to get too technical, try “if your input is measured in kilobytes, use deflate and if it is measured in megabytes, uze zstd”. There exist other reasons for worse-than-zlib-with-default-settings scenarios, but those are mostly not relevant for Minetest.

It might help to illustrate these pitfalls if you included an example in which zstd decisively beats zlib on size of the compressed string … so far there is only an example where zlib produces the smaller output, if I am not mistaken.

Examples for when exactly zstd can be worse than deflate can be found on the zstd bug tracker:
facebook/zstd#256 (data aligned to 24 bits compresses confuses zstd)
facebook/zstd#1134 (binary blobs up to 4kb compress better using zlib)
facebook/zstd#1325 (large little-endian integers compress better with zlib)
facebook/zstd#2832 (example payloads that zlib compresses better: 138kb zlib vs 234kb zstd, 5.9mb zlib vs 6.4mb zstd, …)

I also think it is important to warn about the low compression speed, high memory requirements and sometimes even worse compression of higher zstd compression levels, so that APi users do not always crank it up to maximum. The difference between something taking 0,01s and 1s could be noticeable in gameplay terms.

Personally, I do not see how zstd API without dictionary support is particularly useful – but I guess if mod devs need faster decompression, want to compress very big files or are willing to wait a long time for 10% better compressed files,, this provides an option to them. I am curious though, so far I have not encountered any mods that had performance issues with decompression.

Are you opposed to adding the dictionary support on principle or did you just not bother?

are:
* Deflate: `level` - Compression level, `0`-`9` or `nil`.
* Zstandard: `level` - Compression level. Integer or `nil`.
Note any supported Zstandard compression level could be used here,
Copy link
Contributor

@erlehmann erlehmann Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should probably mention that Zstd beats other algorithms not on how well data is compressed, but on the speed of decompression with a wide range of compression options. If this is not noted, mod and game developers may assume that using zstd compression is always better. With default compression, zstd yields ssize-savings similar to zlib/deflate (it is worse up until some boundary, then gets better).

For inputs up to at least 4 kilobytes and even (in my tests) up to several 100 kilobytes at default compression level, zstd is likely to compress textual and biinary data about 5% worse than zlib … unless a pre-trained dictionary is used, which this API does not seem to provide. This is a WONTFIX from the zstd devs, as they are not interested in small files, but mods developers often do want to compress small payloads (e.g. bitmaps, item meta, json)..

Modders should also be warned that while increasing the Zstd compression level can make the compressed output smaller, but can make the zstd encoder take over 20 times or – at the highest compression levels – over 100 times as long as the default level makes it take. You can try this out yourself: Compressing a 94 kilobyte HTML page took <0.01s at default compression level on my machine, but took >1s at one of the highest zstd compression levels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...Since when was lua_api.txt meant to contain a detailed in-depth comparison between compression formats?

* `...` indicates method-specific arguments. Currently defined arguments
are:
* Deflate: `level` - Compression level, `0`-`9` or `nil`.
* Zstandard: `level` - Compression level. Integer or `nil`.
Note any supported Zstandard compression level could be used here,
Copy link
Contributor

@erlehmann erlehmann Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should probably mention that Zstd beats other algorithms not on how well data is compressed, but on the speed of decompression with a wide range of compression options. If this is not noted, mod and game developers may assume that using zstd compression is always better. With default compression, zstd yields ssize-savings similar to zlib/deflate (it is worse up until some boundary, then gets better).

For inputs up to at least 4 kilobytes and even (in my tests) up to several 100 kilobytes at default compression level, zstd is likely to compress textual and biinary data about 5% worse than zlib … unless a pre-trained dictionary is used, which this API does not seem to provide. This is a WONTFIX from the zstd devs, as they are not interested in small files, but mods developers often do want to compress small payloads (e.g. bitmaps, item meta, json)..

Modders should also be warned that while increasing the Zstd compression level can make the compressed output smaller, but can make the zstd encoder take over 20 times or – at the highest compression levels – over 100 times as long as the default level makes it take. You can try this out yourself: Compressing a 94 kilobyte HTML page took <0.01s at default compression level on my machine, but took >1s at one of the highest zstd compression levels.

* `...` indicates method-specific arguments. Currently defined arguments
are:
* Deflate: `level` - Compression level, `0`-`9` or `nil`.
* Zstandard: `level` - Compression level. Integer or `nil`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does compression level nil mean?

doc/lua_api.txt Outdated
* `...` indicates method-specific arguments. Currently defined arguments
are:
* Deflate: `level` - Compression level, `0`-`9` or `nil`.
* Zstandard: `level` - Compression level. Integer or `nil`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does compression level nil mean?

src/script/lua_api/l_util.cpp Show resolved Hide resolved
games/devtest/mods/unittests/misc.lua Show resolved Hide resolved
@20kdc
Copy link
Contributor Author

20kdc commented Jul 21, 2022

Ok, so this has officially (by my personal definition of 'officially') become messy.

I didn't implement dictionary support because the wrapping for it's use by the engine isn't in place yet as I understand it and minetest.compress's design seems to be more about "just dropping in a compression algorithm" than specific details (a DEFLATE compressor could theoretically be fed a dictionary, but this does not occur and is not supported)

The text on the compression level has been through at least three iterations:

* ZStd: `level` - Compression level. `3` is the default.
^ describes default compression level, makes no guarantees on range

* Zstandard: `level` - Compression level. `0`-`3` or `nil`.
Note any supported Zstandard compression level could be used here,
but these are subject to change between Zstandard versions.
^ Makes ABI-safe guarantees on range

* Zstandard: `level` - Compression level. Integer or `nil`.
Note any supported Zstandard compression level could be used here,
but these are subject to change between Zstandard versions.
^ mostly aligns with how DEFLATE is documented, makes no guarantees on range

EDIT: I'm attaching further iterations here:

* Zstandard: `level` - Compression level. Integer or `nil`. Default `3`.
Note any supported Zstandard compression level could be used here,
but these are subject to change between Zstandard versions.

@erlehmann
Copy link
Contributor

erlehmann commented Jul 21, 2022

minetest.compress's design seems to be more about "just dropping in a compression algorithm"

It looks to me that this seems to be the case indeed; you just seem to be the first person to attempt adding one.
I see nothing wrong with that – but I think as soon as you do, you should document when which option is useful.

This would also be the case if zstd outperformed deflate on every metric all the time, which it obviously does not.

I must admit that I assume that developers just want better compression at about the same speed.
But zstd does not provide that – it provides a greater range of trade-offs re: speed vs compression.

I assert that the inherent complexity of these trade-offs is what makes the whole documentation issue so messy.
I fear that if you do not document the trade-offs, then people will think that zstd is newer and therefore better …
which may ultimately leads to a worse situation if using deflate would have yielded smaller compressed outputs.

Unfortunately, I can only give examples where zstd is not useful, because my inputs are always below 64 kilobytes:

  • bitmap compression (PNG uses deflate internally anyways)
  • entity meta compression
  • item meta compression

Maybe some other mod developers can speak up about for which purpose they intend to use zstd instead of deflate?
If no one actually wants or needs a big improvement over zlib at small input sizes, then a dictionary is not necessary.

@erlehmann
Copy link
Contributor

Some days ago @rubenwardy did a search for usage of minetest.compress() in ContentDB. Maybe it can help the discussion:
https://web.archive.org/web/20220716125216/https://content.minetest.net/zipgrep/3c39b4a6-acc6-4bcb-82af-5bb3bc64bef9/

@appgurueu
Copy link
Contributor

appgurueu commented Jul 21, 2022

I use it for my schematics: https://github.com/appgurueu/modlib/blob/318e737a086c6e1bce8ba6783979e03afa9b3f96/minetest/schematic.lua#L170-L185. Using Zstd instead might be advantageous for the same reasons it was used to compress mapblocks for storage in the database (although TBF schematics aren't read or written as often as mapblocks).

@sfan5
Copy link
Member

sfan5 commented Jul 21, 2022

I consider this PR complete and ready for merge from my side and I doubt any other coredev is going to agree that lua_api.txt should contain detailed compression algorithm advice.

@erlehmann
Copy link
Contributor

erlehmann commented Jul 21, 2022

@appgurueu are you just assuming or have you actually benchmarked the schematics case? Or is benchmarking not necessary because this is simply “I am willing to spend much more time on compression to get a 10% smaller file”, as schematics are much more often decompressed than compressed?

@sfan5 I agree that detailed advice would be overkill. Nevertheless, people are going to wonder whether they should use the new option. I do not think that “to get the smallest results with minetest.compress(), use zlib to compress smaller inputs and use zstd to compress larger, multi-megabyte-inputs” is too detailed. Do you seriously think that such a hint constitutes too much detail?

@appgurueu
Copy link
Contributor

I haven't deeply considered Zstd use cases, but I think it should be exposed since the engine uses it internally already.

@erlehmann
Copy link
Contributor

I haven't deeply considered Zstd use cases, but I think it should be exposed since the engine uses it internally already.

That sounds like a really weak argument, given that:

a) The already existing engine-internal zstd use is backed up by demonstrated space savings (or so I assume).
b) Every input for minetest.compress I have seen is of “compresses slighlty worse with zstd than with deflate” size.

@luatic so what advantages exactly did you have in mind, when you mentioned “modders compressing their data lose out on the advantages Zstd brings over Zlib (more efficient compression both in terms of size & speed).” in #12512? I seriously do not get it.

Copy link
Member

@SmallJoker SmallJoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea, and would like to see it in Minetest.

doc/lua_api.txt Outdated Show resolved Hide resolved
doc/lua_api.txt Outdated Show resolved Hide resolved
src/script/lua_api/l_util.cpp Show resolved Hide resolved
src/script/lua_api/l_util.cpp Show resolved Hide resolved
@20kdc
Copy link
Contributor Author

20kdc commented Aug 5, 2022

Since 5.6.0 has been released, I'm assuming this is on-track for 5.6.1 or 5.7.0?

@SmallJoker SmallJoker self-requested a review August 5, 2022 19:34
@appgurueu
Copy link
Contributor

Since 5.6.0 has been released, I'm assuming this is on-track for 5.6.1 or 5.7.0?

5.6.1 (if it happens at all) would be a patch release, 5.7.0 will be the next release adding features (such as this one) (assuming no breaking release is made anytime soon)

@20kdc
Copy link
Contributor Author

20kdc commented Aug 6, 2022

Since 5.6.0 has been released, I'm assuming this is on-track for 5.6.1 or 5.7.0?

5.6.1 (if it happens at all) would be a patch release, 5.7.0 will be the next release adding features (such as this one) (assuming no breaking release is made anytime soon)

Ok, fixed the problem as of eff6bc6

@sfan5 sfan5 added the Rebase needed The PR needs to be rebased by its author. label Aug 14, 2022
@sfan5 sfan5 added this to the 5.7.0 milestone Aug 14, 2022
@Zughy
Copy link
Member

Zughy commented Aug 14, 2022

@20kdc rebase needed

@20kdc
Copy link
Contributor Author

20kdc commented Aug 15, 2022

Thank you for the reminder, will prepare it now

@20kdc
Copy link
Contributor Author

20kdc commented Aug 15, 2022

A rebase was requested and a rebase has been performed

@Zughy Zughy removed the Rebase needed The PR needs to be rebased by its author. label Aug 15, 2022
@TurkeyMcMac TurkeyMcMac added the Rebase needed The PR needs to be rebased by its author. label Sep 27, 2022
@TurkeyMcMac
Copy link
Contributor

@20kdc Please rebase again.

TFM: The First Month

Contains the following commits:

+ Add zstd compression support
+ copy over updates from server to client lua API
+ Just to prevent accidentally mixing up deflate and zstd, verify non-zstd compression does not produce a zstd magic number
+ Remove hex escapes and replace them with decimal escapes. Also show warnings on bad compression methods.
+ Apply requested changes (documentation fix, unit test cleanup, enum in l_util.cpp)
+ Remove warning for outdated server in compress unit test
+ Better-document zstd level, switch to string_to_enum for compression method detection
+ code style and such
+ Change documentation to not describe known-supported zstd compression levels
+ Documentation update for zstd stuff with tentative "5.6.1" name
+ Zstd compression feature is not 5.6.1, but 5.7.0
@sfan5 sfan5 removed the Rebase needed The PR needs to be rebased by its author. label Sep 28, 2022
Copy link
Contributor

@TurkeyMcMac TurkeyMcMac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Concept approved Approved by a core dev: PRs welcomed! Feature ✨ PRs that add or enhance a feature @ Script API >= Two approvals ✅ ✅
Projects
None yet
Development

Successfully merging this pull request may close these issues.

minetest.[de]compress: Expose zstd
8 participants