Skip to content

juliuskunze/shuffle-coding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is accompanying code for our workshop paper "Entropy Coding of Unordered Data Structures", which was accepted for oral presentation at the ICML 2023 Neural Compression Workshop.

Shuffle Coding

Shuffle coding is a general method for optimal compression of sequences of unordered objects using bits-back coding. Data structures that can be compressed using shuffle coding include multisets, graphs, hypergraphs, and others. This implementation can easily be adapted to different data types and statistical models, and achieves state-of-the-art compression rates on a range of graph datasets including molecular data.

Run

Apart from Rust, you need a C compiler to build this project due to dependency on nauty and Traces.

Use the following commands to replicate all experiments from the paper:

cargo run --release -- --symstats --stats --er --shuffle AIDS..mit_ct2 reddit_threads..TRIANGLES
cargo run --release -- --symstats --stats --er --eru --pu --pur --shuffle MUTAG PTC_MR
cargo run --release -- --symstats --stats --er --eru --pu --pur --shuffle --plain MUTAG PTC_MR ZINC_full..ZINC_val PROTEINS IMDB-BINARY IMDB-MULTI
cargo run --release -- --symstats --stats --er --pu --shuffle SZIP

Graph datasets are downloaded automatically as needed (from TU and SZIP). Full results can be found here.

To retrieve standard deviations for the stochastic Polya urn models, you can set a random seed (0, 1 and 2 were used for the paper):

cargo run --release -- MUTAG PTC_MR ZINC_full..ZINC_val PROTEINS IMDB-BINARY IMDB-MULTI --plain --pu --shuffle --seed=0
cargo run --release -- MUTAG PTC_MR --pu --shuffle --seed=0 
cargo run --release -- SZIP --pu --shuffle --seed=0

Use cargo run --help to see all available options.

About

Entropy coding of unordered data structures

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages