Skip to content

Project in which we explore different algorithms and compression approaches such as Huffman coding or entropy change. The distribution of the symbols of the compressed files (image, electronic book, etc.) is also analyzed.

License

Notifications You must be signed in to change notification settings

ansegura7/DataCompression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

99 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Compression

Project in which we explore different algorithms and compression approaches such as Huffman coding or entropy change. The distribution of the symbols of the compressed files (image, electronic book, etc.) is also analyzed.

Example

1. Symbols Distribution
  • Bytes Distribution
  • Entropy
2. Text Compression
  • Semantic Compression
  • NLP Preprocessing Approach
  • Decompression and Validation
3. File Compression
  • Huffman Code from Scratch
  • Compress Image with Huffman Code
  • Compress Text file with Huffman Code
  • Decompress file with Huffman Code
    • Simple approach
    • Probabilistic approach
    • Pseudo-random approach
4. Compression & Entropy
  • Compression with current Entropy
  • Changing Entropy for higher Compression
  • Restoring Entropy to Decompression

Data

PNG images and plain textbook of different sizes and different languages (English and Spanish).

Python Dependencies

    import io
    import math
    import timeit
    import numpy as np
    import pandas as pd
    from collections import Counter
    from PIL import Image
    from scipy.stats import entropy
    import seaborn as sns
    import matplotlib.pyplot as plt

Acknowledgment

To Ramses Coraspe for his good ideas and the validation of the compression / decompression processes used in this project.

Contributing and Feedback

Any kind of feedback/criticism would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc...).

Authors

  • Created by Andrés Segura Tinoco
  • Created on June 17, 2019

License

This project is licensed under the terms of the MIT license.

About

Project in which we explore different algorithms and compression approaches such as Huffman coding or entropy change. The distribution of the symbols of the compressed files (image, electronic book, etc.) is also analyzed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published