Skip to content

Latest commit

 

History

History
141 lines (100 loc) · 6.39 KB

readme.md

File metadata and controls

141 lines (100 loc) · 6.39 KB

Zig deflate compression/decompression implementation. It supports compression and decompression of gzip, zlib and raw deflate format. Andrew pushed for the implementation from the first principles so I give it a try.

Zig's implementation is ported from Go by huge effort of hdorio. Go implementation mainly follows original zlib implementation.

Here I used all those three as reference, but mostly started from scratch. Inflate (decompression) and deflate tokenization are implemented from the first principles. For deflate block writer I started from current std code.

Benchmark

Comparing this implementation with the one we currently have in Zig's standard library (std).
Std is roughly 1.2-1.4 times slower in decompression, and 1.1-1.2 times slower in compression. Compressed sizes are pretty much same in both cases.

Benchmark are done on aarch64/Linux (Apple M1 cpu):

Compression

Compression examples are using Zig repository tar ~177M file.

Compression time in comparison with std (current Zig standard library implementation):

level time [ms] std [ms] time/std
store 16.42 20.29 1.24
huffman only 442.32 587.74 1.33
4 912.58 1002.99 1.1
5 1258.05 1429.21 1.14
6 - default 1824.60 2065.15 1.13
7 2291.04 2798.50 1.22
8 4158.42 4706.27 1.13
9 6268.29 7738.94 1.23

Compressed size in comparison with std:

level size std size diff size/std
store 177257685 177257690 5 1.0000
huffman only 108397982 108397986 4 1.0000
4 26610356 26557083 -53273 0.9980
5 25230832 25212703 -18129 0.9993
6 - default 24716132 24716123 -9 1.0000
7 24571921 24562137 -9784 0.9996
8 24419337 24425085 5748 1.0002
9 24370739 24389533 18794 1.0008

Decompression

Decompression time for few different files in comparison with std:

file size time [ms] std [ms] time/std
ziglang.tar.gz 177244160 364.36 451.88 1.24
war_and_peace.txt.gz 3359630 13.65 18.51 1.36
large.tar.gz 11162624 38.31 49.71 1.3
cantrbry.tar.gz 2821120 9.33 12.19 1.31

URLs from which tests files are obtained can be found here.

Note

I was also comparing with gzip/gunzip system tools and the results are pretty much similar.

To compare with gzip/gunzip:

zig build -Doptimize=ReleaseSafe
export FILE=tmp/ziglang.tar.gz
hyperfine -r 5 'zig-out/bin/gunzip $FILE' 'gunzip -kf $FILE'

export FILE=tmp/ziglang.tar
hyperfine -r 5 'zig-out/bin/gzip $FILE' 'gzip -kf $FILE'

Running same benchmarks on x86_64 GNU/Linux (Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz):

level time [ms] std [ms] time/std
store 17.87 37.71 2.11
huffman only 536.96 712.69 1.33
4 1298.81 1611.90 1.24
5 1682.73 2181.59 1.3
6 - default 2328.75 3092.51 1.33
7 2888.99 4220.92 1.46
8 5100.41 7001.97 1.37
9 7265.84 10723.13 1.48
file size time [ms] std [ms] time/std
ziglang.tar.gz 177244160 538.45 680.66 1.26
war_and_peace.txt.gz 3359630 21.73 31.22 1.44
large.tar.gz 11162624 62.02 74.89 1.21
cantrbry.tar.gz 2821120 14.77 21.39 1.45

Memory usage

This library uses static allocations for all structures, doesn't require allocator. That makes sense especially for deflate where all structures, internal buffers are allocated to the full size. Little less for inflate where we std version uses less memory by not preallocating to theoretical max size array which are usually not fully used.

For deflate this library allocates 395K while std 779K.
For inflate this library allocates 74.5K while std usually around 36K.

Inflate difference is because we here use 64K history instead of 32K in std.

Interface

Currently in std lib we have different wording for the same things. Type is called Decompressor, Decompress, DecompressStream (deflate, gzip, zlib). I'll suggest that we stick with Decompressor/Compressor and decompressor/compressor for initializers of that type. That will free compress/decompress word for one-shot action. Fn compress receives reader and writer, reads all plain data from reader and writes compressed data to the writer. I expect that many places can use this simple implementation.

All gzip/zlib/flate will have same methods:

/// Compress from reader and write compressed data to the writer.
fn compress(reader: anytype, writer: anytype, options: Options) !void 

/// Create Compressor which outputs the writer.
fn compressor(writer: anytype, options: Options) !Compressor(@TypeOf(writer))

/// Compressor type
fn Compressor(comptime WriterType: type) type 


/// Decompress from reader and write plain data to the writer.
fn decompress(reader: anytype, writer: anytype) !void

/// Create Decompressor which reads from reader.
fn decompressor(reader: anytype) Decompressor(@TypeOf(reader)

/// Decompressor type
fn Decompressor(comptime ReaderType: type) type
 

References

Great materials for understanding deflate:

Bill Bird Video series
RFC 1951 - deflate
RFC 1950 - zlib
RFC 1952 - gzip
zlib algorithm explained
zlib manual
Mark Adler on stackoverflow
Faster zlib/DEFLATE
Reading bits with zero refill latency