DSNALS: Distributed Sketched Non-Negative Matrix Factorization

This repo is an C implementation of the following paper:

Yuqiu Qian, Conghui Tan, Nikos Mamoulis, David Cheung. DSANLS: Accelerating Distributed Non-Negative Matrix Factorization via Sketching. WSDM 2018.

Please cite this paper if you use this code in your published research project.

Prerequisite

This implementation requires:

Message Passing Interface (MPI)
Intel Math Kernel Library (MKL)

Before compilation, make sure the environment variable "MKLROOT" is properly set, and MPI can be found by your compiler.

Usage

Basic usage:

./distNMF <input file name> k [optional arguments]

Optional arguments may be:

-i <number of iterations>
-s <row sketch ratio> <column sketch ratio>, sketch ratio is a float between 0 - 1
-o <verbose frequency>
-m <sketch method>, choices include
- 0 - Subsample
- 1 - Gaussian
- 2 - Uniform
-g, use projected gradient descent to solve subproblem (default solver is regularized coordinate descent)
-t <alpha> <beta>, where alpha and beta are parameters to decide the change rate of step sizes
- For coordinate descent, mu = beta * iter ^ alpha
- For gradient descent, eta = alpha / (1.0 + iter * beta)
-u <bound>, upper bound of the random numbers in the initalization of U and V

Format of the input matrix:

The input matrix is a binary file and stored on the root node (the program will automatically distribute the data to other nodes).

Desne matrix:

Dense matrix is grouped as follows:

a byte of integer 0
two 32-bits integers: m, n
m*n of float numbers, indicating the entries of the input matrix, stored in row-majored order

Sparse matrix:

The sparse matrix is stored in zero-based CSR format:

a byte of integer 1
three 32-bits integers: m, n, nnz, where nnz is the number of non-zero elements
(m+1) of 32-bits integers, indicating the row index
nnz of 32-bits integers: the column numbers of each non-zero entries
nnz of 32-bits integers: the values of the non-zero entries

Precision of float number

Precision of the floats (single-precision or double-precision) can be set by changing the macro "DOUBLE_PRECISION" in "common.h". Note that the precision of the float numbers in the input file must be consistent with this.

Updated by Yuqiu Qian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DSNALS: Distributed Sketched Non-Negative Matrix Factorization

Prerequisite

Usage

Format of the input matrix:

Desne matrix:

Sparse matrix:

Precision of float number

Files

README.md

Latest commit

History

README.md

File metadata and controls

DSNALS: Distributed Sketched Non-Negative Matrix Factorization

Prerequisite

Usage

Format of the input matrix:

Desne matrix:

Sparse matrix:

Precision of float number