Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: klauspost/compress Loading
base: v1.15.7
Choose a base ref
...
head repository: klauspost/compress Loading
compare: v1.15.8
Choose a head ref
  • 8 commits
  • 11 files changed
  • 2 contributors

Commits on Jun 29, 2022

  1. Update README.md

    klauspost committed Jun 29, 2022
    Configuration menu
    Copy the full SHA
    fa9e24a View commit details
    Browse the repository at this point in the history

Commits on Jul 4, 2022

  1. zstd: Optimize seqdeq amd64 asm (#636)

    copyMemoryPrecise now generates a loop over 16-byte blocks with a single
    branchless 16-byte fixup after it.
    
    This is a tiny bit faster on the whole and quite a bit faster for some
    inputs. Benchmark results on Intel Core i7-3770K:
    
    	name                                                         old speed      new speed      delta
    	Decoder_DecoderSmall/kppkn.gtb.zst-8                          369MB/s ± 0%   374MB/s ± 1%  +1.56%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/geo.protodata.zst-8                      977MB/s ± 0%  1056MB/s ± 1%  +8.17%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/plrabn12.txt.zst-8                       291MB/s ± 0%   289MB/s ± 0%  -0.74%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/lcet10.txt.zst-8                         329MB/s ± 1%   333MB/s ± 0%  +1.23%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/asyoulik.txt.zst-8                       310MB/s ± 0%   310MB/s ± 1%    ~     (p=1.000 n=5+5)
    	Decoder_DecoderSmall/alice29.txt.zst-8                        291MB/s ± 0%   291MB/s ± 1%    ~     (p=0.421 n=5+5)
    	Decoder_DecoderSmall/html_x_4.zst-8                          2.07GB/s ± 0%  2.15GB/s ± 2%  +4.05%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/paper-100k.pdf.zst-8                    3.58GB/s ± 3%  3.74GB/s ± 1%  +4.31%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/fireworks.jpeg.zst-8                    8.57GB/s ± 0%  8.60GB/s ± 0%    ~     (p=0.056 n=5+5)
    	Decoder_DecoderSmall/urls.10K.zst-8                           474MB/s ± 1%   507MB/s ± 1%  +6.80%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/html.zst-8                               745MB/s ± 0%   803MB/s ± 0%  +7.68%  (p=0.008 n=5+5)
    	Decoder_DecoderSmall/comp-data.bin.zst-8                      399MB/s ± 1%   400MB/s ± 0%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAll/kppkn.gtb.zst-8                             521MB/s ± 0%   521MB/s ± 0%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAll/geo.protodata.zst-8                        1.27GB/s ± 1%  1.29GB/s ± 0%  +1.19%  (p=0.008 n=5+5)
    	Decoder_DecodeAll/plrabn12.txt.zst-8                          429MB/s ± 0%   427MB/s ± 0%  -0.51%  (p=0.032 n=5+5)
    	Decoder_DecodeAll/lcet10.txt.zst-8                            435MB/s ± 0%   439MB/s ± 0%  +0.94%  (p=0.008 n=5+5)
    	Decoder_DecodeAll/asyoulik.txt.zst-8                          438MB/s ± 0%   436MB/s ± 0%  -0.39%  (p=0.008 n=5+5)
    	Decoder_DecodeAll/alice29.txt.zst-8                           423MB/s ± 0%   420MB/s ± 1%  -0.72%  (p=0.008 n=5+5)
    	Decoder_DecodeAll/html_x_4.zst-8                             1.59GB/s ± 0%  1.59GB/s ± 1%  +0.54%  (p=0.032 n=5+5)
    	Decoder_DecodeAll/paper-100k.pdf.zst-8                       4.53GB/s ± 1%  4.54GB/s ± 1%    ~     (p=0.310 n=5+5)
    	Decoder_DecodeAll/fireworks.jpeg.zst-8                       9.64GB/s ± 1%  9.57GB/s ± 0%    ~     (p=0.151 n=5+5)
    	Decoder_DecodeAll/urls.10K.zst-8                              683MB/s ± 0%   681MB/s ± 0%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAll/html.zst-8                                 1.04GB/s ± 1%  1.06GB/s ± 0%  +1.77%  (p=0.008 n=5+5)
    	Decoder_DecodeAll/comp-data.bin.zst-8                         398MB/s ± 1%   399MB/s ± 1%    ~     (p=1.000 n=5+5)
    	Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-8    439MB/s ± 0%   437MB/s ± 0%  -0.39%  (p=0.016 n=5+5)
    	Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-8    448MB/s ± 0%   448MB/s ± 0%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-8     478MB/s ± 0%   477MB/s ± 0%    ~     (p=0.151 n=5+5)
    	Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-8       463MB/s ± 0%   460MB/s ± 0%  -0.57%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/e.txt/fastest-8                       9.62GB/s ± 3%  9.66GB/s ± 1%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAllFiles/e.txt/default-8                        394MB/s ± 0%   395MB/s ± 0%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAllFiles/e.txt/better-8                         438MB/s ± 0%   442MB/s ± 0%  +0.82%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/e.txt/best-8                           501MB/s ± 0%   506MB/s ± 0%  +1.07%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/fse-artifact3.bin/fastest-8           1.04GB/s ± 0%  1.05GB/s ± 1%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAllFiles/fse-artifact3.bin/default-8           1.20GB/s ± 1%  1.20GB/s ± 1%    ~     (p=0.095 n=5+5)
    	Decoder_DecodeAllFiles/fse-artifact3.bin/better-8            1.01GB/s ± 0%  1.00GB/s ± 1%  -0.82%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/fse-artifact3.bin/best-8               386MB/s ± 0%   383MB/s ± 0%  -0.57%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/gettysburg.txt/fastest-8               271MB/s ± 1%   275MB/s ± 1%  +1.59%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/gettysburg.txt/default-8               224MB/s ± 1%   223MB/s ± 1%    ~     (p=0.222 n=5+5)
    	Decoder_DecodeAllFiles/gettysburg.txt/better-8                228MB/s ± 0%   226MB/s ± 0%  -0.89%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/gettysburg.txt/best-8                  223MB/s ± 1%   221MB/s ± 1%  -1.03%  (p=0.016 n=5+5)
    	Decoder_DecodeAllFiles/html.txt/fastest-8                     592MB/s ± 1%   611MB/s ± 0%  +3.20%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/html.txt/default-8                     597MB/s ± 0%   607MB/s ± 0%  +1.71%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/html.txt/better-8                      623MB/s ± 0%   633MB/s ± 0%  +1.57%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/html.txt/best-8                        603MB/s ± 0%   610MB/s ± 0%  +1.25%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/pi.txt/fastest-8                      9.59GB/s ± 1%  9.70GB/s ± 1%  +1.16%  (p=0.032 n=5+5)
    	Decoder_DecodeAllFiles/pi.txt/default-8                       391MB/s ± 0%   393MB/s ± 0%  +0.62%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/pi.txt/better-8                        437MB/s ± 1%   441MB/s ± 2%    ~     (p=0.087 n=5+5)
    	Decoder_DecodeAllFiles/pi.txt/best-8                          501MB/s ± 0%   507MB/s ± 0%  +1.22%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/pngdata.bin/fastest-8                 1.66GB/s ± 1%  1.70GB/s ± 0%  +2.49%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/pngdata.bin/default-8                 1.49GB/s ± 0%  1.51GB/s ± 0%  +1.18%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/pngdata.bin/better-8                  1.87GB/s ± 0%  1.90GB/s ± 1%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAllFiles/pngdata.bin/best-8                    1.44GB/s ± 1%  1.46GB/s ± 0%  +1.75%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFiles/sharnd.out/fastest-8                  9.64GB/s ± 1%  9.66GB/s ± 1%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAllFiles/sharnd.out/default-8                  9.70GB/s ± 1%  9.70GB/s ± 2%    ~     (p=1.000 n=5+5)
    	Decoder_DecodeAllFiles/sharnd.out/better-8                   9.71GB/s ± 1%  9.79GB/s ± 1%    ~     (p=0.151 n=5+5)
    	Decoder_DecodeAllFiles/sharnd.out/best-8                     9.76GB/s ± 0%  9.80GB/s ± 0%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-8  1.85GB/s ± 0%  1.85GB/s ± 0%  -0.31%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-8  1.86GB/s ± 0%  1.85GB/s ± 0%  -0.47%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-8   2.00GB/s ± 0%  2.00GB/s ± 0%  -0.32%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-8     1.93GB/s ± 0%  1.93GB/s ± 0%  -0.22%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/e.txt/fastest-8                      37.7GB/s ± 0%  37.5GB/s ± 0%  -0.38%  (p=0.016 n=5+5)
    	Decoder_DecodeAllFilesP/e.txt/default-8                      1.68GB/s ± 0%  1.69GB/s ± 0%  +0.55%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/e.txt/better-8                       1.91GB/s ± 0%  1.92GB/s ± 0%  +0.96%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/e.txt/best-8                         2.22GB/s ± 0%  2.25GB/s ± 0%  +1.50%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/fse-artifact3.bin/fastest-8          5.18GB/s ± 0%  5.05GB/s ± 2%  -2.50%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/fse-artifact3.bin/default-8          5.50GB/s ± 1%  5.34GB/s ± 1%  -2.86%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/fse-artifact3.bin/better-8           5.11GB/s ± 0%  5.14GB/s ± 0%  +0.57%  (p=0.016 n=5+5)
    	Decoder_DecodeAllFilesP/fse-artifact3.bin/best-8             2.36GB/s ± 0%  2.37GB/s ± 0%  +0.20%  (p=0.032 n=5+5)
    	Decoder_DecodeAllFilesP/gettysburg.txt/fastest-8             1.16GB/s ± 0%  1.16GB/s ± 0%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAllFilesP/gettysburg.txt/default-8             1.09GB/s ± 0%  1.08GB/s ± 0%  -1.19%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/gettysburg.txt/better-8              1.09GB/s ± 0%  1.08GB/s ± 1%  -0.96%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/gettysburg.txt/best-8                1.03GB/s ± 3%  1.02GB/s ± 0%    ~     (p=0.151 n=5+5)
    	Decoder_DecodeAllFilesP/html.txt/fastest-8                   2.50GB/s ± 1%  2.56GB/s ± 0%  +2.39%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/html.txt/default-8                   2.51GB/s ± 0%  2.55GB/s ± 0%  +1.69%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/html.txt/better-8                    2.61GB/s ± 0%  2.66GB/s ± 0%  +1.93%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/html.txt/best-8                      2.53GB/s ± 0%  2.56GB/s ± 0%  +1.13%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/pi.txt/fastest-8                     37.8GB/s ± 0%  37.6GB/s ± 0%  -0.44%  (p=0.016 n=5+5)
    	Decoder_DecodeAllFilesP/pi.txt/default-8                     1.67GB/s ± 0%  1.68GB/s ± 0%  +0.61%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/pi.txt/better-8                      1.91GB/s ± 0%  1.93GB/s ± 0%  +0.82%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/pi.txt/best-8                        2.23GB/s ± 0%  2.26GB/s ± 0%  +1.35%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/pngdata.bin/fastest-8                6.99GB/s ± 0%  7.00GB/s ± 0%    ~     (p=0.690 n=5+5)
    	Decoder_DecodeAllFilesP/pngdata.bin/default-8                6.88GB/s ± 0%  6.87GB/s ± 0%    ~     (p=0.222 n=5+5)
    	Decoder_DecodeAllFilesP/pngdata.bin/better-8                 8.49GB/s ± 0%  8.44GB/s ± 1%    ~     (p=0.310 n=5+5)
    	Decoder_DecodeAllFilesP/pngdata.bin/best-8                   6.59GB/s ± 1%  6.53GB/s ± 1%  -0.96%  (p=0.032 n=5+5)
    	Decoder_DecodeAllFilesP/sharnd.out/fastest-8                 37.8GB/s ± 0%  37.5GB/s ± 0%  -0.86%  (p=0.008 n=5+5)
    	Decoder_DecodeAllFilesP/sharnd.out/default-8                 37.9GB/s ± 1%  38.0GB/s ± 1%    ~     (p=0.310 n=5+5)
    	Decoder_DecodeAllFilesP/sharnd.out/better-8                  37.9GB/s ± 0%  37.8GB/s ± 2%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAllFilesP/sharnd.out/best-8                    37.8GB/s ± 0%  38.0GB/s ± 1%    ~     (p=0.310 n=5+5)
    	Decoder_DecodeAllParallel/kppkn.gtb.zst-8                    2.20GB/s ± 0%  2.20GB/s ± 0%    ~     (p=1.000 n=5+5)
    	Decoder_DecodeAllParallel/geo.protodata.zst-8                5.37GB/s ± 0%  5.39GB/s ± 0%  +0.35%  (p=0.008 n=5+5)
    	Decoder_DecodeAllParallel/plrabn12.txt.zst-8                 1.77GB/s ± 0%  1.76GB/s ± 0%  -0.19%  (p=0.008 n=5+5)
    	Decoder_DecodeAllParallel/lcet10.txt.zst-8                   1.90GB/s ± 0%  1.92GB/s ± 0%  +0.80%  (p=0.008 n=5+5)
    	Decoder_DecodeAllParallel/asyoulik.txt.zst-8                 1.83GB/s ± 0%  1.83GB/s ± 0%    ~     (p=0.841 n=5+5)
    	Decoder_DecodeAllParallel/alice29.txt.zst-8                  1.74GB/s ± 0%  1.74GB/s ± 0%    ~     (p=0.548 n=5+5)
    	Decoder_DecodeAllParallel/html_x_4.zst-8                     6.55GB/s ± 0%  6.49GB/s ± 0%  -0.97%  (p=0.008 n=5+5)
    	Decoder_DecodeAllParallel/paper-100k.pdf.zst-8               18.3GB/s ± 0%  18.3GB/s ± 0%    ~     (p=0.056 n=5+5)
    	Decoder_DecodeAllParallel/fireworks.jpeg.zst-8               37.4GB/s ± 0%  37.2GB/s ± 1%  -0.57%  (p=0.016 n=4+5)
    	Decoder_DecodeAllParallel/urls.10K.zst-8                     2.97GB/s ± 0%  2.96GB/s ± 0%    ~     (p=0.310 n=5+5)
    	Decoder_DecodeAllParallel/html.zst-8                         4.42GB/s ± 1%  4.43GB/s ± 0%    ~     (p=0.556 n=5+4)
    	Decoder_DecodeAllParallel/comp-data.bin.zst-8                1.69GB/s ± 1%  1.70GB/s ± 0%  +0.84%  (p=0.008 n=5+5)
    	[Geo mean]                                                   1.77GB/s       1.78GB/s       +0.57%
    greatroar committed Jul 4, 2022
    Configuration menu
    Copy the full SHA
    bf3f0fd View commit details
    Browse the repository at this point in the history
  2. zstd: Improve decoder memcopy (#637)

    Up to 25% faster decodes, depending on contents.
    
    Use s2 memcopier and eliminate a zero check.
    
    
    ```
    benchmark                                                                                       old MB/s     new MB/s     speedup
    Benchmark_seqdec_execute/n-12286-lits-13914-prev-9869-1990358-3296656-win-4194304.blk-32        1284.77      1493.64      1.16x
    Benchmark_seqdec_execute/n-12485-lits-6960-prev-976039-2250252-2463561-win-4194304.blk-32       1107.87      1580.86      1.43x
    Benchmark_seqdec_execute/n-14746-lits-14461-prev-209-8-1379909-win-4194304.blk-32               3947.25      4163.99      1.05x
    Benchmark_seqdec_execute/n-1525-lits-1498-prev-2009476-797934-2994405-win-4194304.blk-32        10281.12     10375.47     1.01x
    Benchmark_seqdec_execute/n-3478-lits-3628-prev-895243-2104056-2119329-win-4194304.blk-32        8115.99      8862.70      1.09x
    Benchmark_seqdec_execute/n-8422-lits-5840-prev-168095-2298675-433830-win-4194304.blk-32         1578.08      2306.80      1.46x
    Benchmark_seqdec_execute/n-1000-lits-1057-prev-21887-92-217-win-8388608.blk-32                  17079.65     15875.41     0.93x
    Benchmark_seqdec_execute/n-15134-lits-20798-prev-4882976-4884216-4474622-win-8388608.blk-32     2020.09      2077.16      1.03x
    Benchmark_seqdec_execute/n-2-lits-0-prev-620601-689171-848-win-8388608.blk-32                   35781.31     35736.03     1.00x
    Benchmark_seqdec_execute/n-90-lits-67-prev-19498-23-19710-win-8388608.blk-32                    33125.43     32874.37     0.99x
    Benchmark_seqdec_execute/n-931-lits-1179-prev-36502-1526-1518-win-8388608.blk-32                19394.38     19785.45     1.02x
    Benchmark_seqdec_execute/n-2898-lits-4062-prev-335-386-751-win-8388608.blk-32                   10494.30     10229.09     0.97x
    Benchmark_seqdec_execute/n-4056-lits-12419-prev-10792-66-309849-win-8388608.blk-32              7425.77      8034.31      1.08x
    Benchmark_seqdec_execute/n-8028-lits-4568-prev-917-65-920-win-8388608.blk-32                    2855.17      3336.71      1.17x
    BenchmarkDecoder_DecoderSmall/kppkn.gtb.zst-32                                                    537.74       653.10        1.21x
    BenchmarkDecoder_DecoderSmall/geo.protodata.zst-32                                                1500.59      1610.70       1.07x
    BenchmarkDecoder_DecoderSmall/plrabn12.txt.zst-32                                                 410.13       508.09        1.24x
    BenchmarkDecoder_DecoderSmall/lcet10.txt.zst-32                                                   467.83       602.22        1.29x
    BenchmarkDecoder_DecoderSmall/asyoulik.txt.zst-32                                                 434.53       528.57        1.22x
    BenchmarkDecoder_DecoderSmall/alice29.txt.zst-32                                                  433.95       544.60        1.25x
    BenchmarkDecoder_DecoderSmall/html_x_4.zst-32                                                     2860.31      3199.64       1.12x
    BenchmarkDecoder_DecoderSmall/paper-100k.pdf.zst-32                                               5336.43      5422.59       1.02x
    BenchmarkDecoder_DecoderSmall/fireworks.jpeg.zst-32                                               12327.10     12324.96      1.00x
    BenchmarkDecoder_DecoderSmall/urls.10K.zst-32                                                     660.52       769.09        1.16x
    BenchmarkDecoder_DecoderSmall/html.zst-32                                                         1076.67      1286.06       1.19x
    BenchmarkDecoder_DecoderSmall/comp-data.bin.zst-32                                                569.30       574.46        1.01x
    BenchmarkDecoder_DecodeAll/kppkn.gtb.zst-32                                                       812.16       822.43        1.01x
    BenchmarkDecoder_DecodeAll/geo.protodata.zst-32                                                   1943.14      1906.88       0.98x
    BenchmarkDecoder_DecodeAll/plrabn12.txt.zst-32                                                    712.27       723.91        1.02x
    BenchmarkDecoder_DecodeAll/lcet10.txt.zst-32                                                      688.23       781.85        1.14x
    BenchmarkDecoder_DecodeAll/asyoulik.txt.zst-32                                                    702.87       714.37        1.02x
    BenchmarkDecoder_DecodeAll/alice29.txt.zst-32                                                     717.44       738.78        1.03x
    BenchmarkDecoder_DecodeAll/html_x_4.zst-32                                                        1960.55      1975.63       1.01x
    BenchmarkDecoder_DecodeAll/paper-100k.pdf.zst-32                                                  5981.50      6118.97       1.02x
    BenchmarkDecoder_DecodeAll/fireworks.jpeg.zst-32                                                  13140.18     13126.95      1.00x
    BenchmarkDecoder_DecodeAll/urls.10K.zst-32                                                        983.71       979.34        1.00x
    BenchmarkDecoder_DecodeAll/html.zst-32                                                            1624.80      1585.31       0.98x
    BenchmarkDecoder_DecodeAll/comp-data.bin.zst-32                                                   569.84       572.56        1.00x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/fastest-32                                  504.31       623.48        1.24x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/default-32                                  564.68       723.22        1.28x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/better-32                                   615.18       781.33        1.27x
    BenchmarkDecoder_DecodeAllFiles/.tracker-unpacked.bin/best-32                                     786.17       862.88        1.10x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/fastest-32                                           12860.99     12908.39      1.00x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/default-32                                           619.06       626.95        1.01x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/better-32                                            630.33       628.85        1.00x
    BenchmarkDecoder_DecodeAllFiles/.tracker.bin/best-32                                              609.12       616.50        1.01x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-32                              658.22       669.16        1.02x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-32                              723.60       741.86        1.03x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-32                               735.73       750.40        1.02x
    BenchmarkDecoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-32                                 745.43       764.97        1.03x
    BenchmarkDecoder_DecodeAllFiles/e.txt/fastest-32                                                  12801.86     13043.13      1.02x
    BenchmarkDecoder_DecodeAllFiles/e.txt/default-32                                                  680.29       683.65        1.00x
    BenchmarkDecoder_DecodeAllFiles/e.txt/better-32                                                   739.23       748.08        1.01x
    BenchmarkDecoder_DecodeAllFiles/e.txt/best-32                                                     820.16       828.45        1.01x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/fastest-32                                      1186.63      1177.03       0.99x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/default-32                                      1384.74      1383.55       1.00x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/better-32                                       1104.17      1114.92       1.01x
    BenchmarkDecoder_DecodeAllFiles/fse-artifact3.bin/best-32                                         409.59       409.66        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/fastest-32                                         392.32       390.94        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/default-32                                         296.47       295.87        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/better-32                                          296.52       296.60        1.00x
    BenchmarkDecoder_DecodeAllFiles/gettysburg.txt/best-32                                            299.85       298.91        1.00x
    BenchmarkDecoder_DecodeAllFiles/html.txt/fastest-32                                               988.75       999.28        1.01x
    BenchmarkDecoder_DecodeAllFiles/html.txt/default-32                                               987.11       1018.97       1.03x
    BenchmarkDecoder_DecodeAllFiles/html.txt/better-32                                                1027.64      1030.76       1.00x
    BenchmarkDecoder_DecodeAllFiles/html.txt/best-32                                                  973.41       989.37        1.02x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/fastest-32                                                 12976.96     12976.25      1.00x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/default-32                                                 678.88       680.77        1.00x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/better-32                                                  746.38       751.28        1.01x
    BenchmarkDecoder_DecodeAllFiles/pi.txt/best-32                                                    823.52       833.27        1.01x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/fastest-32                                            2115.58      2106.14       1.00x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/default-32                                            1767.98      1767.57       1.00x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/better-32                                             2306.86      2288.16       0.99x
    BenchmarkDecoder_DecodeAllFiles/pngdata.bin/best-32                                               1660.52      1667.53       1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/fastest-32                                             13027.08     13044.50      1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/default-32                                             13054.18     13081.06      1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/better-32                                              13067.23     13066.65      1.00x
    BenchmarkDecoder_DecodeAllFiles/sharnd.out/best-32                                                13079.77     13061.36      1.00x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/fastest-32                                 10354.84     11876.83      1.15x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/default-32                                 11557.12     13415.35      1.16x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/better-32                                  12644.67     14515.52      1.15x
    BenchmarkDecoder_DecodeAllFilesP/.tracker-unpacked.bin/best-32                                    15934.00     17307.06      1.09x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/fastest-32                                          35354.57     35307.64      1.00x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/default-32                                          11392.27     11353.17      1.00x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/better-32                                           11793.77     11733.41      0.99x
    BenchmarkDecoder_DecodeAllFilesP/.tracker.bin/best-32                                             11203.91     11174.37      1.00x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/fastest-32                             12089.54     12097.65      1.00x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/default-32                             12604.67     12647.83      1.00x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/better-32                              13265.79     13275.92      1.00x
    BenchmarkDecoder_DecodeAllFilesP/Mark.Twain-Tom.Sawyer.txt/best-32                                13078.85     13130.26      1.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/fastest-32                                                 52477.17     51848.17      0.99x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/default-32                                                 11947.06     11922.24      1.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/better-32                                                  13184.17     13223.10      1.00x
    BenchmarkDecoder_DecodeAllFilesP/e.txt/best-32                                                    14630.26     14702.42      1.00x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/fastest-32                                     3013.25      3025.30       1.00x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/default-32                                     3125.61      2976.92       0.95x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/better-32                                      3181.68      3162.28       0.99x
    BenchmarkDecoder_DecodeAllFilesP/fse-artifact3.bin/best-32                                        3351.22      3372.69       1.01x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/fastest-32                                        1188.15      1147.96       0.97x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/default-32                                        1215.39      1156.01       0.95x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/better-32                                         1219.20      1177.16       0.97x
    BenchmarkDecoder_DecodeAllFilesP/gettysburg.txt/best-32                                           1216.72      1170.21       0.96x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/fastest-32                                              16901.32     17180.70      1.02x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/default-32                                              16819.66     16997.40      1.01x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/better-32                                               17805.12     17946.54      1.01x
    BenchmarkDecoder_DecodeAllFilesP/html.txt/best-32                                                 16916.87     17294.25      1.02x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/fastest-32                                                52314.15     52657.88      1.01x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/default-32                                                11878.94     11796.12      0.99x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/better-32                                                 13303.16     13216.13      0.99x
    BenchmarkDecoder_DecodeAllFilesP/pi.txt/best-32                                                   14622.76     14697.47      1.01x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/fastest-32                                           34134.48     36542.10      1.07x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/default-32                                           33589.32     34982.31      1.04x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/better-32                                            43754.89     44323.18      1.01x
    BenchmarkDecoder_DecodeAllFilesP/pngdata.bin/best-32                                              32422.22     33882.10      1.05x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/fastest-32                                            52706.00     52863.28      1.00x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/default-32                                            52527.76     52319.50      1.00x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/better-32                                             52177.25     52506.60      1.01x
    BenchmarkDecoder_DecodeAllFilesP/sharnd.out/best-32                                               52443.28     52402.30      1.00x
    BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-32                                               13992.47     14134.26      1.01x
    BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-32                                           34107.95     33812.99      0.99x
    BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-32                                            12012.34     12123.74      1.01x
    BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-32                                              12630.22     13586.02      1.08x
    BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-32                                            12327.02     12374.31      1.00x
    BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-32                                             11932.73     12059.89      1.01x
    BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-32                                                31233.38     36076.61      1.16x
    BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-32                                          97435.31     100702.06     1.03x
    BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-32                                          62247.22     61824.88      0.99x
    BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-32                                                18659.58     18502.10      0.99x
    BenchmarkDecoder_DecodeAllParallel/html.zst-32                                                    28464.78     28500.16      1.00x
    BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-32                                           3114.03      3132.86       1.01x
    BenchmarkDecoderSilesia/multithreaded-writer-32                                                   1099.69      1059.67       0.96x
    BenchmarkDecoderSilesia/multithreaded-writer-himem-32                                             1093.10      1054.67       0.96x
    BenchmarkDecoderSilesia/singlethreaded-writer-32                                                  803.85       819.16        1.02x
    BenchmarkDecoderSilesia/singlethreaded-writerto-32                                                812.83       828.44        1.02x
    BenchmarkDecoderSilesia/singlethreaded-himem-32                                                   813.14       824.41        1.01x
    BenchmarkDecoderEnwik9/multithreaded-writer-32                                                    877.55       981.68        1.12x
    BenchmarkDecoderEnwik9/multithreaded-writer-himem-32                                              961.20       1013.19       1.05x
    BenchmarkDecoderEnwik9/singlethreaded-writer-32                                                   632.07       629.32        1.00x
    BenchmarkDecoderEnwik9/singlethreaded-writerto-32                                                 634.62       635.76        1.00x
    BenchmarkDecoderEnwik9/singlethreaded-himem-32                                                    763.68       755.70        0.99x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-32           1626.86      1658.42       1.02x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/multithreaded-writer-himem-32     2299.80      2305.08       1.00x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writer-32          1221.34      1207.19       0.99x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-writerto-32        1236.18      1224.88       0.99x
    BenchmarkDecoderWithCustomFiles/github-june-2days-2019.json.zst/singlethreaded-himem-32           1749.21      1729.03       0.99x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-32               839.51       922.30        1.10x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/multithreaded-writer-himem-32         1055.54      1093.19       1.04x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writer-32              574.91       614.02        1.07x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-writerto-32            579.19       618.97        1.07x
    BenchmarkDecoderWithCustomFiles/github-ranks-backup.bin.zst/singlethreaded-himem-32               780.67       863.05        1.11x 
    ```
    klauspost committed Jul 4, 2022
    Configuration menu
    Copy the full SHA
    b16a9af View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2022

  1. huff0: Pass a single bitReader pointer to asm (#634)

    This makes the context object smaller and frees up three registers,
    which we can use to replace the limitPtr and bufferOrigin stack
    variables.
    
    Benchmark results show a tiny win (Go 1.19beta, Core i7-3770K):
    
    	name                                           old speed      new speed      delta
    	Decompress1XTable/digits-8                      347MB/s ± 0%   347MB/s ± 0%    ~     (p=0.650 n=8+10)
    	Decompress1XTable/gettysburg-8                  268MB/s ± 0%   268MB/s ± 0%    ~     (p=0.400 n=9+9)
    	Decompress1XTable/twain-8                       327MB/s ± 0%   327MB/s ± 1%    ~     (p=0.339 n=7+9)
    	Decompress1XTable/low-ent.10k-8                 385MB/s ± 0%   385MB/s ± 1%    ~     (p=0.510 n=9+10)
    	Decompress1XTable/superlow-ent-10k-8            376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.712 n=8+10)
    	Decompress1XTable/crash2-8                     17.3MB/s ± 1%  17.3MB/s ± 1%    ~     (p=0.926 n=10+10)
    	Decompress1XTable/endzerobits-8                52.9MB/s ± 1%  52.4MB/s ± 0%  -0.94%  (p=0.000 n=10+10)
    	Decompress1XTable/endnonzero-8                 11.4MB/s ± 0%  11.4MB/s ± 1%    ~     (p=0.343 n=10+10)
    	Decompress1XTable/case1-8                      22.0MB/s ± 0%  22.0MB/s ± 0%    ~     (p=0.618 n=9+9)
    	Decompress1XTable/case2-8                      18.1MB/s ± 0%  18.1MB/s ± 0%    ~     (p=0.348 n=9+9)
    	Decompress1XTable/case3-8                      19.1MB/s ± 0%  19.1MB/s ± 0%  +0.21%  (p=0.048 n=10+10)
    	Decompress1XTable/pngdata.001-8                 374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.861 n=9+10)
    	Decompress1XTable/normcount2-8                 54.3MB/s ± 1%  54.5MB/s ± 1%    ~     (p=0.093 n=10+10)
    	Decompress1XNoTable/digits/100-8                279MB/s ± 0%   280MB/s ± 0%  +0.30%  (p=0.003 n=10+9)
    	Decompress1XNoTable/digits/10000-8              366MB/s ± 0%   365MB/s ± 0%    ~     (p=0.113 n=10+9)
    	Decompress1XNoTable/digits/262143-8             347MB/s ± 0%   347MB/s ± 1%    ~     (p=0.739 n=10+10)
    	Decompress1XNoTable/gettysburg/100-8            278MB/s ± 1%   277MB/s ± 1%    ~     (p=0.676 n=10+9)
    	Decompress1XNoTable/gettysburg/10000-8          363MB/s ± 1%   362MB/s ± 0%  -0.50%  (p=0.001 n=10+9)
    	Decompress1XNoTable/gettysburg/262143-8         350MB/s ± 0%   347MB/s ± 0%  -0.90%  (p=0.000 n=10+8)
    	Decompress1XNoTable/twain/100-8                 268MB/s ± 0%   267MB/s ± 0%    ~     (p=0.384 n=9+8)
    	Decompress1XNoTable/twain/10000-8               363MB/s ± 0%   362MB/s ± 0%  -0.32%  (p=0.000 n=9+9)
    	Decompress1XNoTable/twain/262143-8              328MB/s ± 0%   329MB/s ± 0%    ~     (p=0.063 n=9+10)
    	Decompress1XNoTable/low-ent.10k/100-8           180MB/s ± 0%   181MB/s ± 0%    ~     (p=0.225 n=10+10)
    	Decompress1XNoTable/low-ent.10k/10000-8         385MB/s ± 0%   385MB/s ± 0%    ~     (p=0.289 n=10+10)
    	Decompress1XNoTable/low-ent.10k/262143-8        389MB/s ± 1%   389MB/s ± 1%    ~     (p=0.971 n=10+10)
    	Decompress1XNoTable/superlow-ent-10k/262143-8   389MB/s ± 0%   390MB/s ± 0%  +0.27%  (p=0.017 n=9+10)
    	Decompress1XNoTable/crash2/100-8                278MB/s ± 0%   279MB/s ± 1%    ~     (p=0.163 n=9+10)
    	Decompress1XNoTable/crash2/10000-8              373MB/s ± 1%   373MB/s ± 0%    ~     (p=0.370 n=10+8)
    	Decompress1XNoTable/crash2/262143-8             375MB/s ± 0%   375MB/s ± 0%    ~     (p=0.604 n=9+10)
    	Decompress1XNoTable/endzerobits/100-8           180MB/s ± 0%   181MB/s ± 0%  +0.26%  (p=0.005 n=10+9)
    	Decompress1XNoTable/endzerobits/10000-8         384MB/s ± 0%   385MB/s ± 0%    ~     (p=0.914 n=8+10)
    	Decompress1XNoTable/endzerobits/262143-8        389MB/s ± 0%   390MB/s ± 0%    ~     (p=0.739 n=10+10)
    	Decompress1XNoTable/endnonzero/100-8            180MB/s ± 1%   180MB/s ± 1%    ~     (p=0.926 n=10+10)
    	Decompress1XNoTable/endnonzero/10000-8          384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.965 n=10+8)
    	Decompress1XNoTable/endnonzero/262143-8         390MB/s ± 0%   390MB/s ± 0%    ~     (p=0.633 n=8+10)
    	Decompress1XNoTable/case1/100-8                 282MB/s ± 0%   283MB/s ± 0%  +0.34%  (p=0.005 n=10+10)
    	Decompress1XNoTable/case1/10000-8               372MB/s ± 0%   373MB/s ± 0%    ~     (p=0.113 n=9+9)
    	Decompress1XNoTable/case1/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.448 n=10+10)
    	Decompress1XNoTable/case2/100-8                 274MB/s ± 1%   274MB/s ± 0%    ~     (p=0.927 n=10+10)
    	Decompress1XNoTable/case2/10000-8               376MB/s ± 0%   376MB/s ± 0%    ~     (p=0.408 n=10+8)
    	Decompress1XNoTable/case2/262143-8              376MB/s ± 1%   377MB/s ± 0%    ~     (p=1.000 n=10+10)
    	Decompress1XNoTable/case3/100-8                 266MB/s ± 0%   265MB/s ± 0%    ~     (p=0.113 n=9+10)
    	Decompress1XNoTable/case3/10000-8               372MB/s ± 0%   372MB/s ± 0%    ~     (p=0.075 n=10+9)
    	Decompress1XNoTable/case3/262143-8              374MB/s ± 0%   374MB/s ± 0%    ~     (p=0.172 n=10+10)
    	Decompress1XNoTable/pngdata.001/100-8           238MB/s ± 0%   238MB/s ± 0%    ~     (p=0.438 n=9+8)
    	Decompress1XNoTable/pngdata.001/10000-8         384MB/s ± 0%   384MB/s ± 0%    ~     (p=0.448 n=10+10)
    	Decompress1XNoTable/pngdata.001/262143-8        378MB/s ± 0%   378MB/s ± 0%    ~     (p=0.836 n=10+10)
    	Decompress1XNoTable/normcount2/100-8            281MB/s ± 0%   282MB/s ± 1%    ~     (p=0.122 n=8+10)
    	Decompress1XNoTable/normcount2/10000-8          369MB/s ± 1%   369MB/s ± 0%    ~     (p=0.912 n=10+10)
    	Decompress1XNoTable/normcount2/262143-8         370MB/s ± 0%   370MB/s ± 1%    ~     (p=0.342 n=10+10)
    	Decompress4XNoTable/digits/100-8                197MB/s ± 0%   197MB/s ± 1%    ~     (p=0.764 n=10+9)
    	Decompress4XNoTable/digits/10000-8              594MB/s ± 0%   602MB/s ± 1%  +1.35%  (p=0.000 n=10+10)
    	Decompress4XNoTable/digits/262143-8             570MB/s ± 1%   578MB/s ± 0%  +1.30%  (p=0.000 n=10+8)
    	Decompress4XNoTable/gettysburg/100-8            258MB/s ± 1%   260MB/s ± 0%  +0.59%  (p=0.001 n=10+10)
    	Decompress4XNoTable/gettysburg/10000-8          638MB/s ± 0%   641MB/s ± 0%  +0.44%  (p=0.000 n=9+9)
    	Decompress4XNoTable/gettysburg/262143-8         573MB/s ± 1%   574MB/s ± 0%    ~     (p=0.353 n=10+10)
    	Decompress4XNoTable/twain/100-8                 214MB/s ± 2%   214MB/s ± 2%    ~     (p=0.853 n=10+10)
    	Decompress4XNoTable/twain/10000-8               634MB/s ± 1%   638MB/s ± 0%  +0.62%  (p=0.000 n=10+10)
    	Decompress4XNoTable/twain/262143-8              513MB/s ± 1%   517MB/s ± 0%  +0.85%  (p=0.000 n=10+10)
    	Decompress4XNoTable/low-ent.10k/100-8           195MB/s ± 0%   194MB/s ± 0%    ~     (p=0.130 n=9+9)
    	Decompress4XNoTable/low-ent.10k/10000-8         635MB/s ± 0%   642MB/s ± 0%  +1.19%  (p=0.000 n=10+10)
    	Decompress4XNoTable/low-ent.10k/262143-8        675MB/s ± 0%   685MB/s ± 0%  +1.51%  (p=0.000 n=10+10)
    	Decompress4XNoTable/superlow-ent-10k/262143-8   673MB/s ± 1%   684MB/s ± 0%  +1.70%  (p=0.000 n=10+10)
    	Decompress4XNoTable/case1/100-8                 206MB/s ± 1%   206MB/s ± 0%    ~     (p=0.189 n=10+9)
    	Decompress4XNoTable/case1/10000-8               593MB/s ± 0%   601MB/s ± 0%  +1.47%  (p=0.000 n=10+10)
    	Decompress4XNoTable/case1/262143-8              603MB/s ± 0%   613MB/s ± 0%  +1.64%  (p=0.000 n=10+10)
    	Decompress4XNoTable/case2/100-8                 201MB/s ± 0%   202MB/s ± 1%    ~     (p=0.053 n=9+10)
    	Decompress4XNoTable/case2/10000-8               610MB/s ± 0%   618MB/s ± 0%  +1.30%  (p=0.000 n=9+10)
    	Decompress4XNoTable/case2/262143-8              622MB/s ± 1%   634MB/s ± 0%  +1.90%  (p=0.000 n=9+8)
    	Decompress4XNoTable/case3/100-8                 197MB/s ± 1%   198MB/s ± 0%  +0.53%  (p=0.001 n=9+10)
    	Decompress4XNoTable/case3/10000-8               606MB/s ± 0%   615MB/s ± 0%  +1.49%  (p=0.000 n=8+10)
    	Decompress4XNoTable/case3/262143-8              613MB/s ± 1%   622MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
    	Decompress4XNoTable/pngdata.001/100-8           212MB/s ± 1%   211MB/s ± 0%    ~     (p=0.136 n=9+9)
    	Decompress4XNoTable/pngdata.001/10000-8         645MB/s ± 1%   649MB/s ± 1%  +0.65%  (p=0.000 n=9+10)
    	Decompress4XNoTable/pngdata.001/262143-8        640MB/s ± 1%   649MB/s ± 0%  +1.44%  (p=0.000 n=10+10)
    	Decompress4XNoTable/normcount2/100-8            260MB/s ± 1%   261MB/s ± 1%    ~     (p=0.211 n=10+9)
    	Decompress4XNoTable/normcount2/10000-8          584MB/s ± 1%   591MB/s ± 0%  +1.33%  (p=0.000 n=9+9)
    	Decompress4XNoTable/normcount2/262143-8         588MB/s ± 1%   596MB/s ± 1%  +1.39%  (p=0.000 n=10+9)
    	Decompress4XNoTableTableLog8/digits-8           583MB/s ± 1%   592MB/s ± 0%  +1.48%  (p=0.000 n=10+10)
    	Decompress4XTable/digits-8                      580MB/s ± 0%   588MB/s ± 0%  +1.33%  (p=0.000 n=8+10)
    	Decompress4XTable/gettysburg-8                  368MB/s ± 1%   370MB/s ± 0%  +0.59%  (p=0.017 n=10+9)
    	Decompress4XTable/twain-8                       510MB/s ± 0%   515MB/s ± 0%  +0.99%  (p=0.000 n=9+10)
    	Decompress4XTable/low-ent.10k-8                 657MB/s ± 0%   665MB/s ± 0%  +1.24%  (p=0.000 n=10+10)
    	Decompress4XTable/superlow-ent-10k-8            608MB/s ± 0%   617MB/s ± 1%  +1.48%  (p=0.000 n=8+10)
    	Decompress4XTable/case1-8                      21.1MB/s ± 1%  21.0MB/s ± 2%    ~     (p=0.223 n=10+10)
    	Decompress4XTable/case2-8                      17.6MB/s ± 0%  17.6MB/s ± 0%    ~     (p=0.199 n=9+10)
    	Decompress4XTable/case3-8                      18.7MB/s ± 0%  18.7MB/s ± 0%    ~     (p=0.557 n=10+8)
    	Decompress4XTable/pngdata.001-8                 633MB/s ± 1%   645MB/s ± 0%  +1.90%  (p=0.000 n=9+10)
    	Decompress4XTable/normcount2-8                 49.9MB/s ± 1%  49.5MB/s ± 1%  -0.64%  (p=0.002 n=10+10)
    	[Geo mean]                                      270MB/s        271MB/s       +0.36%
    greatroar committed Jul 8, 2022
    Configuration menu
    Copy the full SHA
    4b3cc06 View commit details
    Browse the repository at this point in the history
  2. s2: Add Index header trim/restore (#638)

    * s2: Add Index header trim/restore
    
    Add `RemoveIndexHeaders` that will remove 20 header+trailer bytes for cases when storage can be relied upon.
    
    `RestoreIndexHeaders` will restore the index header+trailer so it can be loaded.
    klauspost committed Jul 8, 2022
    Configuration menu
    Copy the full SHA
    9d7fe70 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    08efe28 View commit details
    Browse the repository at this point in the history

Commits on Jul 12, 2022

  1. zstd: Branchless getBits for amd64 w/o BMI2 (#640)

    This produces the same number of instructions, while requiring less
    generating code. Benchmarks on the Intel Core i7-3770K show a tiny
    speedup:
    
    ```
    name                                                        old speed      new speed      delta
    Decoder_DecoderSmall/kppkn.gtb.zst-8                         430MB/s ± 1%   437MB/s ± 1%  +1.60%  (p=0.000 n=10+9)
    Decoder_DecoderSmall/geo.protodata.zst-8                    1.11GB/s ± 1%  1.13GB/s ± 0%  +1.37%  (p=0.000 n=9+9)
    Decoder_DecoderSmall/plrabn12.txt.zst-8                      334MB/s ± 1%   339MB/s ± 1%  +1.41%  (p=0.000 n=9+10)
    Decoder_DecoderSmall/lcet10.txt.zst-8                        392MB/s ± 2%   404MB/s ± 1%  +3.05%  (p=0.000 n=10+10)
    Decoder_DecoderSmall/asyoulik.txt.zst-8                      355MB/s ± 2%   357MB/s ± 1%    ~     (p=0.315 n=10+9)
    Decoder_DecoderSmall/alice29.txt.zst-8                       344MB/s ± 1%   350MB/s ± 1%  +1.69%  (p=0.000 n=10+10)
    Decoder_DecoderSmall/html_x_4.zst-8                         2.34GB/s ± 1%  2.37GB/s ± 1%  +1.10%  (p=0.000 n=10+10)
    Decoder_DecoderSmall/paper-100k.pdf.zst-8                   3.75GB/s ± 0%  3.76GB/s ± 1%    ~     (p=0.182 n=9+10)
    Decoder_DecoderSmall/fireworks.jpeg.zst-8                   8.59GB/s ± 1%  8.58GB/s ± 1%    ~     (p=0.842 n=10+9)
    Decoder_DecoderSmall/urls.10K.zst-8                          561MB/s ± 1%   556MB/s ± 1%  -0.82%  (p=0.019 n=10+10)
    Decoder_DecoderSmall/html.zst-8                              900MB/s ± 1%   913MB/s ± 1%  +1.42%  (p=0.000 n=10+9)
    Decoder_DecoderSmall/comp-data.bin.zst-8                     399MB/s ± 1%   395MB/s ± 1%  -0.99%  (p=0.000 n=10+10)
    Decoder_DecodeAll/kppkn.gtb.zst-8                            518MB/s ± 0%   526MB/s ± 0%  +1.52%  (p=0.000 n=10+9)
    Decoder_DecodeAll/geo.protodata.zst-8                       1.28GB/s ± 0%  1.27GB/s ± 2%    ~     (p=0.739 n=10+10)
    Decoder_DecodeAll/plrabn12.txt.zst-8                         427MB/s ± 1%   433MB/s ± 1%  +1.24%  (p=0.000 n=10+10)
    Decoder_DecodeAll/lcet10.txt.zst-8                           480MB/s ± 1%   490MB/s ± 1%  +2.06%  (p=0.000 n=10+10)
    Decoder_DecodeAll/asyoulik.txt.zst-8                         435MB/s ± 0%   447MB/s ± 0%  +2.70%  (p=0.000 n=7+9)
    Decoder_DecodeAll/alice29.txt.zst-8                          422MB/s ± 0%   438MB/s ± 1%  +3.96%  (p=0.000 n=8+9)
    Decoder_DecodeAll/html_x_4.zst-8                            1.60GB/s ± 0%  1.61GB/s ± 0%  +0.99%  (p=0.000 n=9+10)
    Decoder_DecodeAll/paper-100k.pdf.zst-8                      4.55GB/s ± 1%  4.44GB/s ± 1%  -2.42%  (p=0.000 n=10+10)
    Decoder_DecodeAll/fireworks.jpeg.zst-8                      9.52GB/s ± 1%  9.47GB/s ± 2%    ~     (p=0.143 n=10+10)
    Decoder_DecodeAll/urls.10K.zst-8                             678MB/s ± 1%   684MB/s ± 0%  +0.83%  (p=0.000 n=10+10)
    Decoder_DecodeAll/html.zst-8                                1.05GB/s ± 0%  1.07GB/s ± 1%  +2.11%  (p=0.000 n=10+10)
    Decoder_DecodeAll/comp-data.bin.zst-8                        397MB/s ± 1%   391MB/s ± 1%  -1.37%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/fastest-8   437MB/s ± 0%   436MB/s ± 1%  -0.21%  (p=0.025 n=9+9)
    Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/default-8   448MB/s ± 0%   451MB/s ± 0%  +0.70%  (p=0.000 n=9+9)
    Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/better-8    478MB/s ± 0%   475MB/s ± 0%  -0.53%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/Mark.Twain-Tom.Sawyer.txt/best-8      461MB/s ± 0%   470MB/s ± 0%  +2.07%  (p=0.000 n=8+9)
    Decoder_DecodeAllFiles/e.txt/fastest-8                      9.62GB/s ± 3%  9.62GB/s ± 2%    ~     (p=1.000 n=10+10)
    Decoder_DecodeAllFiles/e.txt/default-8                       391MB/s ± 0%   406MB/s ± 0%  +3.81%  (p=0.000 n=10+8)
    Decoder_DecodeAllFiles/e.txt/better-8                        438MB/s ± 0%   448MB/s ± 0%  +2.39%  (p=0.000 n=8+10)
    Decoder_DecodeAllFiles/e.txt/best-8                          500MB/s ± 0%   500MB/s ± 0%    ~     (p=0.119 n=9+9)
    Decoder_DecodeAllFiles/fse-artifact3.bin/fastest-8          1.07GB/s ± 1%  1.04GB/s ± 1%  -2.61%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/fse-artifact3.bin/default-8          1.21GB/s ± 1%  1.19GB/s ± 1%  -1.33%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/fse-artifact3.bin/better-8            994MB/s ± 0%   990MB/s ± 0%  -0.42%  (p=0.002 n=10+9)
    Decoder_DecodeAllFiles/fse-artifact3.bin/best-8              389MB/s ± 0%   381MB/s ± 0%  -2.00%  (p=0.000 n=8+10)
    Decoder_DecodeAllFiles/gettysburg.txt/fastest-8              274MB/s ± 1%   274MB/s ± 1%    ~     (p=1.000 n=10+10)
    Decoder_DecodeAllFiles/gettysburg.txt/default-8              224MB/s ± 1%   223MB/s ± 1%  -0.64%  (p=0.015 n=10+10)
    Decoder_DecodeAllFiles/gettysburg.txt/better-8               228MB/s ± 1%   227MB/s ± 1%  -0.40%  (p=0.041 n=10+10)
    Decoder_DecodeAllFiles/gettysburg.txt/best-8                 225MB/s ± 1%   223MB/s ± 0%  -0.52%  (p=0.008 n=10+6)
    Decoder_DecodeAllFiles/html.txt/fastest-8                    599MB/s ± 1%   614MB/s ± 1%  +2.41%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/html.txt/default-8                    601MB/s ± 0%   613MB/s ± 0%  +2.01%  (p=0.000 n=8+9)
    Decoder_DecodeAllFiles/html.txt/better-8                     626MB/s ± 1%   638MB/s ± 0%  +1.99%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/html.txt/best-8                       601MB/s ± 0%   612MB/s ± 0%  +1.87%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/pi.txt/fastest-8                     9.64GB/s ± 2%  9.66GB/s ± 1%    ~     (p=0.529 n=10+10)
    Decoder_DecodeAllFiles/pi.txt/default-8                      390MB/s ± 0%   403MB/s ± 0%  +3.48%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/pi.txt/better-8                       439MB/s ± 0%   451MB/s ± 0%  +2.65%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/pi.txt/best-8                         500MB/s ± 0%   499MB/s ± 0%  -0.27%  (p=0.009 n=7+10)
    Decoder_DecodeAllFiles/pngdata.bin/fastest-8                1.70GB/s ± 1%  1.69GB/s ± 1%  -0.63%  (p=0.013 n=10+9)
    Decoder_DecodeAllFiles/pngdata.bin/default-8                1.52GB/s ± 1%  1.51GB/s ± 0%  -0.75%  (p=0.000 n=10+9)
    Decoder_DecodeAllFiles/pngdata.bin/better-8                 1.92GB/s ± 0%  1.90GB/s ± 0%  -1.02%  (p=0.000 n=10+10)
    Decoder_DecodeAllFiles/pngdata.bin/best-8                   1.47GB/s ± 0%  1.46GB/s ± 0%  -0.88%  (p=0.000 n=10+9)
    Decoder_DecodeAllFiles/sharnd.out/fastest-8                 9.60GB/s ± 1%  9.67GB/s ± 1%  +0.67%  (p=0.029 n=10+10)
    Decoder_DecodeAllFiles/sharnd.out/default-8                 9.65GB/s ± 2%  9.71GB/s ± 1%    ~     (p=0.353 n=10+10)
    Decoder_DecodeAllFiles/sharnd.out/better-8                  9.67GB/s ± 1%  9.66GB/s ± 0%    ~     (p=0.549 n=10+9)
    Decoder_DecodeAllFiles/sharnd.out/best-8                    9.70GB/s ± 1%  9.61GB/s ± 0%  -0.91%  (p=0.010 n=10+9)
    [Geo mean]                                                   935MB/s        940MB/s       +0.57%
    ```
    greatroar committed Jul 12, 2022
    Configuration menu
    Copy the full SHA
    9a048c1 View commit details
    Browse the repository at this point in the history

Commits on Jul 13, 2022

  1. gzip: fix stack exhaustion bug in Reader.Read (#641)

    Replace recursion with iteration in Reader.Read to avoid stack
    exhaustion when there are a large number of files.
    
    Fixes CVE-2022-30631
    
    Upstream: golang/go#53168
    klauspost committed Jul 13, 2022
    Configuration menu
    Copy the full SHA
    5a16edc View commit details
    Browse the repository at this point in the history
Loading