Skip to content

skandau/fp8sk

Repository files navigation

fp8sk

File Archiver

/* fp8sk file compressor/archiver. Released on August 18, 2017 by surya kandau

Copyright (C) 2008 Matt Mahoney, Serge Osnach, Alexander Ratushnyak,
Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson,
Jan Ondrus, Andreas Morphis, Pavel L. Holoborodko, Kaido Orav, Simon Berger,
Neill Corlett, Márcio Pais, surya kandau
LICENSE

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of
the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
General Public License for more details at
Visit <https://www.gnu.org/copyleft/gpl.html>.

To install and use in Windows:

  • To install, put fp8sk.exe or a shortcut to it on your desktop.
  • To compress a file or folder, drop it on the fp8sk icon.
  • To decompress, drop a .fp8sk file on the icon.

A .fp8sk extension is added for compression, removed for decompression. The output will go in the same folder as the input.

While fp8sk is working, a command window will appear and report progress. When it is done you can close the window by pressing ENTER or clicking [X].

COMMAND LINE INTERFACE

  • To install, put fp8sk.exe somewhere in your PATH.
  • To compress: fp8sk [-N] file1 [file2...]
  • To decompress: fp8sk [-d] file1.fp8sk [dir2]
  • To view contents: more < file1.fp8sk

The compressed output file is named by adding ".fp8sk" extension to the first named file (file1.fp8sk). Each file that exists will be added to the archive and its name will be stored without a path. The option -N specifies a compression level ranging from -0 (fastest) to -8 (smallest). The default is -5. If there is no option and only one file, then the program will pause when finished until you press the ENTER key (to support drag and drop). If file1.fp8sk exists then it is overwritten.

If the first named file ends in ".fp8sk" then it is assumed to be an archive and the files within are extracted to the same directory as the archive unless a different directory (dir2) is specified. The -d option forces extraction even if there is not a ".fp8sk" extension. If any output file already exists, then it is compared with the archive content and the first byte that differs is reported. No files are overwritten or deleted. If there is only one argument (no -d or dir2) then the program will pause when finished until you press ENTER.

For compression, if any named file is actually a directory, then all files and subdirectories are compressed, preserving the directory structure, except that empty directories are not stored, and file attributes (timestamps, permissions, etc.) are not preserved. During extraction, directories are created as needed. For example:

fp8sk -4 c:\tmp\foo bar

compresses foo and bar (if they exist) to c:\tmp\foo.fp8sk at level 4.

fp8sk -d c:\tmp\foo.fp8sk .

extracts foo and compares bar in the current directory. If foo and bar are directories then their contents are extracted/compared.

There are no commands to update an existing archive or to extract part of an archive. Files and archives larger than 2GB are not supported (but might work on 64-bit machines, not tested). File names with nonprintable characters are not supported (spaces are OK).

TO COMPILE

There are 2 files: fp8sk.cpp (C++) and paq7asm.asm (NASM/YASM). paq7asm.asm is the same as in paq7 and paq8x. fp8sk.cpp recognizes the following compiler options:

-DWINDOWS (to compile in Windows) -DUNIX (to compile in Unix, Linux, Solairs, MacOS/Darwin, etc) -DNOASM (to replace paq7asm.asm with equivalent C++) -DDEFAULT_OPTION=N (to change the default compression level from 5 to N).

If you compile without -DWINDOWS or -DUNIX, you can still compress files, but you cannot compress directories or create them during extraction. You can extract directories if you manually create the empty directories first.

Use -DEFAULT_OPTION=N to change the default compression level to support drag and drop on machines with less than 256 MB of memory. Use -DDEFAULT_OPTION=4 for 128 MB, 3 for 64 MB, 2 for 32 MB, etc.

Use -DNOASM for non x86-32 machines, or older than a Pentium-MMX (about 1997), or if you don't have NASM or YASM to assemble paq7asm.asm. The program will still work but it will be slower. For NASM in Windows, use the options "--prefix _" and either "-f win32" or "-f obj" depending on your C++ compiler. In Linux, use "-f elf".

Recommended compiler commands and optimizations:

MINGW g++: g++ fp8sk.cpp -DWINDOWS -lz -Wall -Wextra -O3 -static -static-libgcc -ofp8sk.exe

MinGW produces faster executables than Borland or Mars, but Intel 9 is about 4% faster than MinGW).

ARCHIVE FILE FORMAT

An archive has the following format. It is intended to be both human and machine readable. The header ends with CTRL-Z (Windows EOF) so that the binary compressed data is not displayed on the screen.

fp8sk -N CR LF size TAB filename CR LF size TAB filename CR LF ... CTRL-Z compressed binary data

-N is the option (-0 to -9), even if a default was used. Plain file names are stored without a path. Files in compressed directories are stored with path relative to the compressed directory (using UNIX style forward slashes "/"). For example, given these files:

123 C:\dir1\file1.txt 456 C:\dir2\file2.txt

Then

fp8sk archive \dir1\file1.txt \dir2

will create archive.fp8sk with the header:

fp8sk -5 123 file1.txt 456 dir2/file2.txt

The command:

fp8sk archive.fp8sk C:\dir3

will create the files:

C:\dir3\file1.txt C:\dir3\dir2\file2.txt

Decompression will fail if the first 7 bytes are not "fp8sk -". Sizes are stored as decimal numbers. CR, LF, TAB, CTRL-Z are ASCII codes 13, 10, 9, 26 respectively.

ARITHMETIC CODING

The binary data is arithmetic coded as the shortest base 256 fixed point number x = SUM_i x_i 256^-1-i such that p(<y) <= x < p(<=y), where y is the input string, x_i is the i'th coded byte, p(<y) (and p(<=y)) means the probability that a string is lexicographcally less than (less than or equal to) y according to the model, _ denotes subscript, and ^ denotes exponentiation.

The model p(y) for y is a conditional bit stream, p(y) = PROD_j p(y_j | y_0..j-1) where y_0..j-1 denotes the first j bits of y, and y_j is the next bit. Compression depends almost entirely on the ability to predict the next bit accurately.

MODEL MIXING

fp8sk uses a neural network to combine a large number of models. The i'th model independently predicts p1_i = p(y_j = 1 | y_0..j-1), p0_i = 1 - p1_i. The network computes the next bit probabilty

p1 = squash(SUM_i w_i t_i), p0 = 1 - p1 (1)

where t_i = stretch(p1_i) is the i'th input, p1_i is the prediction of the i'th model, p1 is the output prediction, stretch(p) = ln(p/(1-p)), and squash(s) = 1/(1+exp(-s)). Note that squash() and stretch() are inverses of each other.

After bit y_j (0 or 1) is received, the network is trained:

w_i := w_i + eta t_i (y_j - p1) (2)

where eta is an ad-hoc learning rate, t_i is the i'th input, (y_j - p1) is the prediction error for the j'th input but, and w_i is the i'th weight. Note that this differs from back propagation:

w_i := w_i + eta t_i (y_j - p1) p0 p1 (3)

which is a gradient descent in weight space to minimize root mean square error. Rather, the goal in compression is to minimize coding cost, which is -log(p0) if y = 1 or -log(p1) if y = 0. Taking the partial derivative of cost with respect to w_i yields (2).

MODELS

Most models are context models. A function of the context (last few bytes) is mapped by a lookup table or hash table to a state which depends on the bit history (prior sequence of 0 and 1 bits seen in this context). The bit history is then mapped to p1_i by a fixed or adaptive function. There are several types of bit history states:

  • Run Map. The state is (b,n) where b is the last bit seen (0 or 1) and n is the number of consecutive times this value was seen. The initial state is (0,0). The output is computed directly:

    t_i = (2b - 1)K log(n + 1).

    where K is ad-hoc, around 4 to 10. When bit y_j is seen, the state is updated:

    (b,n) := (b,n+1) if y_j = b, else (y_j,1).

  • Stationary Map. The state is p, initially 1/2. The output is t_i = stretch(p). The state is updated at ad-hoc rate K (around 0.01):

    p := p + K(y_j - p)

  • Nonstationary Map. This is a compromise between a stationary map, which assumes uniform statistics, and a run map, which adapts quickly by discarding old statistics. An 8 bit state represents (n0,n1,h), initially (0,0,0) where:

    n0 is the number of 0 bits seen "recently". n1 is the number of 1 bits seen "recently". n = n0 + n1. h is the full bit history for 0 <= n <= 4, the last bit seen (0 or 1) if 5 <= n <= 15, 0 for n >= 16.

    The primaty output is t_i := stretch(sm(n0,n1,h)), where sm(.) is a stationary map with K = 1/256, initialized to sm(n0,n1,h) = (n1+(1/64))/(n+2/64). Four additional inputs are also be computed to improve compression slightly:

    p1_i = sm(n0,n1,h) p0_i = 1 - p1_i t_i := stretch(p_1) t_i+1 := K1 (p1_i - p0_i) t_i+2 := K2 stretch(p1) if n0 = 0, -K2 stretch(p1) if n1 = 0, else 0 t_i+3 := K3 (-p0_i if n1 = 0, p1_i if n0 = 0, else 0) t_i+4 := K3 (-p0_i if n0 = 0, p1_i if n1 = 0, else 0)

    where K1..K4 are ad-hoc constants.

    h is updated as follows: If n < 4, append y_j to h. Else if n <= 16, set h := y_j. Else h = 0.

    The update rule is biased toward newer data in a way that allows n0 or n1, but not both, to grow large by discarding counts of the opposite bit. Large counts are incremented probabilistically. Specifically, when y_j = 0 then the update rule is:

    n0 := n0 + 1, n < 29 n0 + 1 with probability 2^(27-n0)/2 else n0, 29 <= n0 < 41 n0, n = 41. n1 := n1, n1 <= 5 round(8/3 lg n1), if n1 > 5

    swapping (n0,n1) when y_j = 1.

    Furthermore, to allow an 8 bit representation for (n0,n1,h), states exceeding the following values of n0 or n1 are replaced with the state with the closest ratio n0:n1 obtained by decrementing the smaller count: (41,0,h), (40,1,h), (12,2,h), (5,3,h), (4,4,h), (3,5,h), (2,12,h), (1,40,h), (0,41,h). For example: (12,2,1) 0-> (7,1,0) because there is no state (13,2,0).

  • Match Model. The state is (c,b), initially (0,0), where c is 1 if the context was previously seen, else 0, and b is the next bit in this context. The prediction is:

    t_i := (2b - 1)Kc log(m + 1)

    where m is the length of the context. The update rule is c := 1, b := y_j. A match model can be implemented efficiently by storing input in a buffer and storing pointers into the buffer into a hash table indexed by context. Then c is indicated by a hash table entry and b can be retrieved from the buffer.

CONTEXTS

High compression is achieved by combining a large number of contexts. Most (not all) contexts start on a byte boundary and end on the bit immediately preceding the predicted bit. The contexts below are modeled with both a run map and a nonstationary map unless indicated.

  • Order n. The last n bytes, up to about 16. For general purpose data. Most of the compression occurs here for orders up to about 6. An order 0 context includes only the 0-7 bits of the partially coded byte and the number of these bits (255 possible values).

  • Sparse. Usually 1 or 2 of the last 8 bytes preceding the byte containing the predicted bit, e.g (2), (3),..., (8), (1,3), (1,4), (1,5), (1,6), (2,3), (2,4), (3,6), (4,8). The ordinary order 1 and 2 context, (1) or (1,2) are included above. Useful for binary data.

  • Text. Contexts consists of whole words (a-z, converted to lower case and skipping other values). Contexts may be sparse, e.g (0,2) meaning the current (partially coded) word and the second word preceding the current one. Useful contexts are (0), (0,1), (0,1,2), (0,2), (0,3), (0,4). The preceding byte may or may not be included as context in the current word.

  • Formatted text. The column number (determined by the position of the last linefeed) is combined with other contexts: the charater to the left and the character above it.

  • Fixed record length. The record length is determined by searching for byte sequences with a uniform stride length. Once this is found, then the record length is combined with the context of the bytes immediately preceding it and the corresponding byte locations in the previous one or two records (as with formatted text).

  • Context gap. The distance to the previous occurrence of the order 1 or order 2 context is combined with other low order (1-2) contexts.

  • FAX. For 2-level bitmapped images. Contexts are the surrounding pixels already seen. Image width is assumed to be 1728 bits (as in calgary/pic).

  • Image. For uncompressed 24-bit color BMP, TIFF and TGA images. Contexts are the high order bits of the surrounding pixels and linear combinations of those pixels, including other color planes. The image width is detected from the file header. When an image is detected, other models are turned off to improve speed.

  • JPEG. Files are further compressed by partially uncompressing back to the DCT coefficients to provide context for the next Huffman code. Only baseline DCT-Huffman coded files are modeled. (This ia about 90% of images, the others are usually progresssive coded). JPEG images embedded in other files (quite common) are detected by headers. The baseline JPEG coding process is:

    • Convert to grayscale and 2 chroma colorspace.
    • Sometimes downsample the chroma images 2:1 or 4:1 in X and/or Y.
    • Divide each of the 3 images into 8x8 blocks.
    • Convert using 2-D discrete cosine transform (DCT) to 64 12-bit signed coefficients.
    • Quantize the coefficients by integer division (lossy).
    • Split the image into horizontal slices coded independently, separated by restart codes.
    • Scan each block starting with the DC (0,0) coefficient in zigzag order to the (7,7) coefficient, interleaving the 3 color components in order to scan the whole image left to right starting at the top.
    • Subtract the previous DC component from the current in each color.
    • Code the coefficients using RS codes, where R is a run of R zeros (0-15) and S indicates 0-11 bits of a signed value to follow. (There is a special RS code (EOB) to indicate the rest of the 64 coefficients are 0).
    • Huffman code the RS symbol, followed by S literal bits. The most useful contexts are the current partially coded Huffman code (including S following bits) combined with the coefficient position (0-63), color (0-2), and last few RS codes.
  • Match. When a context match of 400 bytes or longer is detected, the next bit of the match is predicted and other models are turned off to improve speed.

  • Exe. When a x86 file (.exe, .obj, .dll) is detected, sparse contexts with gaps of 1-12 selecting only the prefix, opcode, and the bits of the modR/M byte that are relevant to parsing are selected. This model is turned off otherwise.

  • Indirect. The history of the last 1-3 bytes in the context of the last 1-2 bytes is combined with this 1-2 byte context.

  • DMC. A bitwise n-th order context is built from a state machine using DMC, described in https://plg.uwaterloo.ca/~ftp/dmc/dmc.c The effect is to extend a single context, one bit at a time and predict the next bit based on the history in this context. The model here differs in that two predictors are used. One is a pair of counts as in the original DMC. The second predictor is a bit history state mapped adaptively to a probability as as in a Nonstationary Map.

ARCHITECTURE

The context models are mixed by several of several hundred neural networks selected by a low-order context. The outputs of these networks are combined using a second neural network, then fed through several stages of adaptive probability maps (APM) before arithmetic coding.

For images, only one neural network is used and its context is fixed.

An APM is a stationary map combining a context and an input probability. The input probability is stretched and divided into 32 segments to combine with other contexts. The output is interpolated between two adjacent quantized values of stretch(p1). There are 2 APM stages in series:

p1 := (p1 + 3 APM(order 0, p1)) / 4. p1 := (APM(order 1, p1) + 2 APM(order 2, p1) + APM(order 3, p1)) / 4.

PREPROCESSING

fp8sk uses preprocessing transforms on certain data types to improve compression. To improve reliability, the decoding transform is tested during compression to ensure that the input file can be restored. If the decoder output is not identical to the input file due to a bug, then the transform is abandoned and the data is compressed without a transform so that it will still decompress correctly.

The input is split into blocks with the format where is 1 byte (0 = no transform), is the size of the data after decoding, which may be different than the size of . Blocks do not span file boundaries, and have a maximum size of 4MB to 2GB depending on compression level. Large files are split into blocks of this size. The preprocessor has 3 parts:

  • Detector. Splits the input into smaller blocks depending on data type.

  • Coder. Input is a block to be compressed. Output is a temporary file. The coder determines whether a transform is to be applied based on file type, and if so, which one. A coder may use lots of resources (memory, time) and make multiple passes through the input file. The file type is stored (as one byte) during compression.

  • Decoder. Performs the inverse transform of the coder. It uses few resorces (fast, low memory) and runs in a single pass (stream oriented). It takes input either from a file or the arithmetic decoder. Each call to the decoder returns a single decoded byte.

The following transforms are used:

  • EXE: CALL (0xE8) and JMP (0xE9) address operands are converted from relative to absolute address. The transform is to replace the sequence E8/E9 xx xx xx 00/FF by adding file offset modulo 2^25 (signed range, little-endian format). Data to transform is identified by trying the transform and applying a crude compression test: testing whether the byte following the E8/E8 (LSB of the address) occurred more recently in the transformed data than the original and within 4KB 4 times in a row. The block ends when this does not happen for 4KB.

  • JPEG: detected by SOI and SOF and ending with EOI or any nondecodable data. No transform is applied. The purpose is to separate images embedded in execuables to block the EXE transform, and for a future place to insert a transform.

IMPLEMENTATION

Hash tables are designed to minimize cache misses, which consume most of the CPU time.

Most of the memory is used by the nonstationary context models. Contexts are represented by 32 bits, possibly a hash. These are mapped to a bit history, represented by 1 byte. The hash table is organized into 64-byte buckets on cache line boundaries. Each bucket contains 7 x 7 bit histories, 7 16-bit checksums, and a 2 element LRU queue packed into one byte. Each 7 byte element represents 7 histories for a context ending on a 3-bit boundary plus 0-2 more bits. One element (for bits 0-1, which have 4 unused bytes) also contains a run model consisting of the last byte seen and a count (as 1 byte each).

Run models use 4 byte hash elements consisting of a 2 byte checksum, a repeat count (0-255) and the byte value. The count also serves as a priority.

Stationary models are most appropriate for small contexts, so the context is used as a direct table lookup without hashing.

The match model maintains a pointer to the last match until a mismatching bit is found. At the start of the next byte, the hash table is referenced to find another match. The hash table of pointers is updated after each whole byte. There is no checksum. Collisions are detected by comparing the current and matched context in a rotating buffer.

The inner loops of the neural network prediction (1) and training (2) algorithms are implemented in MMX assembler, which computes 4 elements at a time. Using assembler is 8 times faster than C++ for this code and 1/3 faster overall. (However I found that SSE2 code on an AMD-64, which computes 8 elements at a time, is not any faster).

DIFFERENCES FROM PAQ7

An .exe model and filter are added. Context maps are improved using 16-bit checksums to reduce collisions. The state table uses probabilistic updates for large counts, more states that remember the last bit, and decreased discounting of the opposite count. It is implemented as a fixed table. There are also many minor changes.

DIFFERENCES FROM PAQ8A

The user interface supports directory compression and drag and drop. The preprocessor segments the input into blocks and uses more robust EXE detection. An indirect context model was added. There is no dictionary preprocesor like PAQ8B/C/D/E.

DIFFERENCES FROM PAQ8F

Different models, usually from paq8hp*. Also changed rate from 8 to 7. A bug in Array was fixed that caused the program to silently crash upon exit.

DIFFERENCES FROM PAQ8J

  1. Slightly improved sparse model.
  2. Added new family of sparse contexts. Each byte mapped to 3-bit value, where different values corresponds to different byte classes. For example, input byte 0x00 transformed into 0, all bytes that less then 16 -- into 5, all punctuation marks (ispunct(c)!=0) -- into 2 etc. Then this flags from 11 previous bytes combined into 32-bit pseudo-context.

All this improvements gives only 62 byte on BOOK1, but on binaries archive size reduced on 1-2%.

DIFFERENCES FROM PAQ8JA

Introduced distance model. Distance model uses distance to last occurence of some anchor char (0x00, space, newline, 0xff), combined with previous charactes as context. This slightly improves compression of files with variable-width record data.

DIFFERENCES FROM PAQ8JB

Restored recordModel(), broken in paq8hp*. Slightly tuned indirectModel().

DIFFERENCES FROM PAQ8JC

Changed the APMs in the Predictor. Up to a 0.2% improvement for some files.

DIFFERENCES FROM PAQ8JD

Added DMCModel. Removed some redundant models from SparseModel and other minor tuneups. Changes introduced in PAQ8K were not carried over.

PAQ8L v.2

Changed Mixer::p() to p() to fix a compiler error in Linux (patched by Indrek Kruusa, Apr. 15, 2007).

DIFFERENCES FROM PAQ8L, PAQ8M

Modified JPEG model by Jan Ondrus (paq8fthis2). The new model improves compression by using decoded pixel values of current and adjacent blocks as context. PAQ8M was an earlier version of the new JPEG model (from paq8fthis).

DIFFERENCES FROM PAQ8N

Improved bmp model. Slightly faster.

DIFFERENCES FROM PAQ8O

Modified JPEG model by Jan Ondrus (paq8fthis4). Added PGM (grayscale image) model form PAQ8I. Added grayscale BMP model to PGM model. Ver. 2 can be compiled using either old or new "for" loop scoping rules. Added APM and StateMap from LPAQ1 Code optimizations from Enrico Zeidler Detection of BMP 4,8,24 bit and PGM 8 bit images before compress On non BMP,PGM,JPEG data mem is lower Fixed bug in BMP 8-bit detection in other files like .exe 15. oct 2007 Updates JPEG model by Jan Ondrus PGM detection bug fix 22. oct 2007 improved JPEG model by Jan Ondrus 16. feb 2008 fixed bmp detection bug added .rgb file support (uncompressed grayscale)

DIFFERENCES FROM PAQ8O9

Added wav Model. Slightly improved bmp model.

DIFFERENCES FROM PAQ8P

Added nestModel from paq8p3 Modified wordModel from paq8p3 Modified .pbm, .pgm, .ppm, .bmp, .rgb detection (from paq8p3) Modified WAV model (from paq8p_) Modified JPEG model (from paq8p2) Added im1bitModel (1-bit) (from paq8p3) Added compression of PPM, PBM images (from paq8p3) Removed pic model Modified EXE transformation (e8/e9) Changed image and audio data handling (separated in blocks) Added compression of (8-bit, 24-bit) TGA image data Improved TIFF image detection Added zlib stream recompression Added base64 transform (from paq8pxd) Added gif recompression

DIFFERENCES FROM PAQ8PX

Reduced number of models Only 2 APM stages Reduced nubmer of prediction added to mixer per context (from 6 to 2)

*/

Releases

No releases published

Packages

No packages published