Fast Exact Leverage Score Sampling from Khatri-Rao Products with Applications to Tensor Decomposition
This repository contains the code for the paper Fast Exact Leverage Score Sampling from Khatri-Rao Products, to appear at Neurips 2023 (main conference, poster).
You can draw samples from the Khatri-Rao product of matrices with tens of millions of rows each, provided the rank is at most 100-200 (and you have enough memory to store the inputs). These samples are drawn according to the exact distribution of leverage scores on the rows of the Khatri-Rao product, which have attractive properties for randomized linear least-squares.
On LBNL Perlmutter CPU nodes, our code can draw 50,000 samples from
a Khatri-Rao product with
Here's a smaller test case that can run on a laptop:
import numpy as np
import cppimport.import_hook
from cpp_ext.als_module import Sampler
if __name__=='__main__':
# Take J=10,000 samples from the KRP of N=4 matrices,
# each with dimensions I x R, where I = 1000 and R = 8.
I, N, J, R = 1000, 4, 10000, 8
matrices = [np.random.normal(size=(I, R)) for i in range(N)]
sampler = Sampler(matrices, J, R, "efficient")
samples = np.zeros((N, J), dtype=np.uint64)
sampler.KRPDrawSamples(N+1, samples)
The raw data is stored in json
format in the folder
submission_data
. These files were output directly by the
scripts in the benchmarks
folder. You can reproduce the
figures using the material in the plotting
folder. From
within the folder, run
[zsh]> python generate_images.py
which will output all figures in PDF format to
plotting/paper_images
. This script will also print tables
used in the work to STDOUT
. Alternatively,
use the Jupyter file generate_images.ipynb
to generate
the plots inside a notebook.
If you would like to generate the data yourself, use the
scripts in the benchmarks
folder. The accuracy and runtime
benchmarks are easy to reproduce, but you may need a machine
with several hundred gigabytes of RAM to reproduce
experiments involving the largest sparse tensors
(on our benchmark platform, each LBNL Perlmutter
node has 512GB of RAM).
This repository contains two implementations of the data
structure in the paper. The first, available in the
folder reference_implementation
, is written entirely in
Python with Numpy. It is slow, but the structure of the
code matches the pseudocode in our paper closely. You
can use it to verify the correctness of our algorithm
or pick apart how the data structure works. You only need
Python, Numpy, and Matplotlib to test the reference
implementation - see the README inside the folder for
more instructions.
The second implementation is a fast version written in C++, with Pybind11 Python bindings and compatibility with Numpy for easy testing. All benchmarks were conducted with the fast version, and all instructions detail how to build the fast version.
This repository contains a copy of json.hpp
from Niels Lohmann's repository
https://github.com/nlohmann/json. It is published
under an MIT license.
You need GCC 11.2+, OpenMP, and an install of the BLAS and LAPACK. This code has been tested with OpenBLAS, and we make no guarantees for Intel MKL or other BLAS implementations. If you decide to use MKL, either a) try using the Intel C++ compiler instead of GCC or b) if you use GCC, link MKL in sequential mode. We have observed segfaults when OpenMP threading is enabled in conjunction with MKL (when compiled with GCC).
We strongly recommended that you install Intel Thread Building Blocks (TBB), but this is not required. We rely on Intel TBB for fast parallel sorting during sparse tensor decomposition.
Our C++ code could stand alone, but right now, you need Python as well to run it.
Clone the repository and cd
into it.
[zsh]> git clone https://github.com/vbharadwaj-bk/fast_tensor_leverage.git
[zsh]> cd fast_tensor_leverage
Install Python dependencies with the following command:
[zsh]> pip install -r requirements.txt
We rely on the Pybind11 and cppimport packages. We use the HDF5 format to store sparse tensors, so you need the h5py package if you want to perform sparse tensor decomposition.
Within the repository, run the following command:
[zsh]> python configure.py
This will create two files in the repository root:
config.json
and env.sh
. Edit the configuration
JSON file with include / link flags for your LAPACK
install. If you have TBB installed (strongly
recommended for good performance), fill in the appropriate
entries. If you do not have TBB installed (or if you
are using a compiler that automatically links the BLAS),
set those JSON entries to empty lists.
The file env.sh
sets up the runtime environment,
and must be called every time you start a new shell
to run our code. First, set the variables CC
and CXX to your C and C++ compilers. The C++ extension
module is compiled with
these when it is imported by Python at runtime.
You're ready to test! The C++ extension compiles automatically the first time you run the code, and is not compiled subsequently. Run the following code:
[zsh]> python compare_distributions.py
If all goes well, you should see a new graph in
the plotting
folder called
distribution_comparison_test_generated.png
(ignore the file distribution_comparison.png
, which
is the control image for the README).
The graph should look like this:
This image compares the true probability
distribution of leverage scores of the
Khatri-Rao product