lz4decode

This package contains multiple tuned implementations of the lz4 decode algorithm.

Changes in Go 1.20 improved the compiler combined with issues with sparsely compressed data led to the authorship of the package as a derivative of https://github.com/pierrec/lz4.

Issues with https://github.com/pierrec/lz4

The hand-coded assembly for amd64, arm64, and arm were coded explicitly for densely compressed data where the literals and matches were quite small (usually under 16 bytes). These techniques are extremely slow when handling sparsly compressed data where the literals and matches can be multiple kilobytes. The Go version included with https://github.com/pierrec/lz4 does not suffer from these issues with sparsely compressed data, but out of the box is 2x slower than the assmebly versions.

After going through the profiling of benchmarks, it was shown that the result of the 2x slowdown was the use of copy for all data copies, even single bytes. Thusly UncompressBlockInlineCopy was born that contained specialized versions of copy for 1 to 8 bytes using unsafe. This closed the gap on the assembly and retained the huge advantage in sparsely compressed data.

Advice

The best advice is to use 1.20 or later. The compiler changes vastly improved the generated code such that the Go version is faster than the hand coded assembly in all cases.

Default

The UncompressBlock is the default that is likely to be the best for the current Go version. For 1.20 and later, it's UncompressBlockInlineCopy, for pre 1.20, it's UncompressBlockAsm.

Per Versions

Pre 1.20 - Benchmark

UncompressBlockAsm: fastest for densely compressed data (ie, words list)
UncompressBlockGo: fastest for sparsly compressed data

as a result:

UncompressBlock == UncompressBlockAsm

Post 1.20 - Benchmark

goos: linux
goarch: arm64
pkg: github.com/lab47/lz4decode
         │    asm        │               ic                    │
         │    sec/op     │   sec/op     vs base                │
Copy       38391.5n ± 0%   475.2n ± 1%  -98.76% (p=0.000 n=10)
Speckled     1.786µ ± 0%   1.018µ ± 1%  -43.03% (p=0.000 n=10)
Words        7.702µ ± 0%   6.734µ ± 0%  -12.57% (p=0.000 n=10)
geomean      8.083µ        1.482µ       -81.66%

UncompressBlockInlineCopy is equal to UncompressBlockAsm for densly compressed data and up to 10x faster in sparsly compressed data. This is due to the hand-coded assembly using 8 byte copy loops for all data rather than using copy()/runtime.memmove which are fastly faster for large blocks of bytes.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
testdata		testdata
LICENSE		LICENSE
README.md		README.md
block.go		block.go
copy_go119.go		copy_go119.go
copy_go120.go		copy_go120.go
decode.go		decode.go
decode_amd64.s		decode_amd64.s
decode_arm.s		decode_arm.s
decode_arm64.s		decode_arm64.s
decode_asm.go		decode_asm.go
decode_other.go		decode_other.go
decode_test.go		decode_test.go
decode_unsafe.go		decode_unsafe.go
go.mod		go.mod
go.sum		go.sum
pick_go119.go		pick_go119.go
pick_go120.go		pick_go120.go
words		words

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lz4decode

Issues with https://github.com/pierrec/lz4

Advice

Default

Per Versions

Pre 1.20 - Benchmark

Post 1.20 - Benchmark

About

Releases

Packages

Languages

License

lab47/lz4decode

Folders and files

Latest commit

History

Repository files navigation

lz4decode

Issues with https://github.com/pierrec/lz4

Advice

Default

Per Versions

Pre 1.20 - Benchmark

Post 1.20 - Benchmark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages