Skip to content

nathanielltaylor/data-compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Data Compression

Huffman Compression

  • Compresses text by building a binary tree of all of the characters used in the input text based on their frequency, which it then uses to assign each of these characters a prefix-free bit code
  • The compressed text is generated by encoding each character as a string sequence of 1s and 0s unique to it; the full sequence is then unpacked into binary data
  • Since none of these codes are prefixes of each other, decompression can be achieved by packing the binary data back into a string, parsing it, and translating each recognizable code back to its character using the character-code pairs from the binary tree

LZW Compression

  • Compresses text by assigning each character subsequence in it to integers, starting with the 266 single ASCII characters. Each subsequent subsequence is then assigned incrementing integers, beginning with 266 (ASCII numbering starts at 0)
  • The process saves this dictionary so subsequences are assigned the same code each time they reappear
  • Fortunately, and in contrast to Huffman compression, the dictionary does not need to be stored in order to later decompress the text. The decompression process rebuilds it from scratch, again using the ASCII characters as basic building blocks, to invert the process

About

Text compression algorithms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages