Skip to content

SteamedGit/huffman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Huffman Encoding in Python

This is an implementation of Huffman Encoding written in Python 3 from scratch. It makes use of techniques like Canonical Encoding and Lookup Tables in order to make file headers smaller and decoding faster.

Note that the code also explains each step of the process.

Usage

  • Compress: -c input file output binary file
  • Decompress: -d input file output text file
  • Validate: -v original decompressed

Performance

The code was mainly tested on files from the Cantebury Corpus, specifically the original 1997 version and the Large Corpus.

The bible provided in the Large Corpus is compressed from 3 953KB to 2 167KB. Alice in Wonderland is compressed from from 149KB to 83KB.

Other considerations

Due to different newline endings on Linux(\n) and Windows(\r\n) decompressed files may be a different size to the original despite their content being identical.

Example Usage

python huffman.py -c bible.txt compressed
python huffman.py -d compressed decompressed.txt
python huffman.py -v bible.txt decompressed.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages