Skip to content

joshivanhoe/sparsely

Repository files navigation

CI Version

⚡ sparsely ⚡

sparsely is a sklearn-compatible Python module for sparse linear regression and classification. It uses an efficient cutting-plane algorithm to optimize feature selection, which scales to thousands of samples and features. This implementation follows Bertsimas & Van Parys (2017) for regression, and Bertsimas, Pauphilet & Van Parys (2021) for classification.

Full API documentation can be found here.

Quick start

You can install sparsely using pip as follows:

pip install sparsely

Here is a simple example of how use a sparsely estimator:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sparsely import SparseLinearRegressor

X,y = make_regression(n_samples=1000, n_features=100, n_informative=10, random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

estimator = SparseLinearRegressor(k=10)  # k is the max number of non-zero coefficients
estimator.fit(X_train, y_train)
print(estimator.score(X_test, y_test))

Development

Clone the repository using git:

git clone https://github.com/joshivanhoe/sparsely

Create a fresh virtual environment using venv or conda. Activate the environment and navigate to the cloned halfspace directory. Install a locally editable version of the package using pip:

pip install -e .

To check the installation has worked, you can run the tests (with coverage metrics) using pytest as follows:

pytest --cov=sparsely tests/

Contributions are welcome! To see our development priorities, refer to the open issues. Please submit a pull request with a clear description of the changes you've made.