Tile-based Lightweight Integer Compression in GPU

A key constraint of GPU-based data analytics today is the limited memory capacity in GPU devices. Data compression is a powerful technique that can mitigate the capacity limitation in two ways:

Fitting more data into GPU memory
Speeding up data transfer between CPU and GPU.

This package implements three bit-packing-based optimized data compression formats and their decompression routines for GPUs: GPU-FOR, GPU-DFOR, and GPU-RFOR. The work was presented at SIGMOD '22. Please read the paper for more details.

@inproceedings{gpubitpacking,
  author = {Shanbhag, Anil and Yogatama, Bobbi W. and Yu, Xiangyao and Madden, Samuel},
  title = {Tile-Based Lightweight Integer Compression in GPU},
  year = {2022},
  isbn = {9781450392495},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3514221.3526132},
  doi = {10.1145/3514221.3526132},
  booktitle = {Proceedings of the 2022 International Conference on Management of Data},
  pages = {1390–1403},
  numpages = {14},
  keywords = {GPU data analytics, GPU data compression, bit-packing},
  location = {Philadelphia, PA, USA},
  series = {SIGMOD '22}
}

Usage

The decompression routines are implemented as device functions. Use the routine LoadBinPack / LoadDBinPack / LoadRBinPack in place of a BlockLoad routine and point it to the appropriate compressed column. As these are device functions, you can directly use them in your own program too.

To generate the test distributions:

For uniform distribution, distribution d1 and d2

mkdir -p bin/bench
mkdir -p obj/bench
make bench/gen bench/gen_d1 bench/gen_d2
./bin/bench/gen <num_bits>
./bin/bench/gen_d1 <num_bits>
./bin/bench/gen_d2 <num_bits>

For d3, run the bench/gen_d3.py file

Note these will written out the DATA_DIR defined in ssb/ssb_utils.h as flat files.

To generate Star Schema Benchmark data:

Follow the instructions here

Before Starting the Experiment

mkdir -p bin/bench
mkdir -p obj/bench
mkdir -p bin/ssb
mkdir -p obj/ssb

To encode the data to GPU-* format

The above two steps will generate flat files which contain 4-byte integer arrays. To generate the encoded columns:

# For test distributions
make bench/binpack
make bench/deltabinpack
make bench/rlebinpack

./bin/bench/binpack <num_bits>
./bin/bench/deltabinpack <num_bits>
./bin/bench/rlebinpack <num_bits>

# For SSB columns
make ssb/binpack
make ssb/deltabinpack
make ssb/rlebinpack

./bin/ssb/binpack <col_name>
./bin/ssb/deltabinpack <col_name>
./bin/ssb/rlebinpack <col_name>

To compile and run test_perf_rle and test_match_rle

make bin/ssb/test_perf_rle
make bin/ssb/test_match_rle

./bin/ssb/test_perf_rle <col_name>
./bin/ssb/test_match_rle <col_name>

To compile and run SSB queries

make bin/ssb/q11r
make bin/ssb/q21r
make bin/ssb/q31r
make bin/ssb/q41r

./bin/ssb/q11r
./bin/ssb/q21r
./bin/ssb/q31r
./bin/ssb/q41r

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.vscode		.vscode
includes/utils		includes/utils
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
compress.sh		compress.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tile-based Lightweight Integer Compression in GPU

Usage

About

Releases

Packages

Contributors 2

Languages

License

anilshanbhag/gpu-compression

Folders and files

Latest commit

History

Repository files navigation

Tile-based Lightweight Integer Compression in GPU

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages