Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH - Add Pinball datafit #134

Merged
merged 31 commits into from
Dec 9, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
413ef54
remove sqrt n_samples
Badr-MOUFAD Nov 30, 2022
2ef5eb7
update unittest
Badr-MOUFAD Nov 30, 2022
5c0bedc
info comment statsmodels
Badr-MOUFAD Dec 1, 2022
ca6ece7
add prox subdiff to sqrt df
Badr-MOUFAD Dec 1, 2022
a6303e5
implement ``PDCD_WS``
Badr-MOUFAD Dec 1, 2022
e8fcee3
r sqrt_n from CB
Badr-MOUFAD Dec 1, 2022
339e98f
Merge branch 'r-sqrt-n' of https://github.com/Badr-MOUFAD/skglm into …
Badr-MOUFAD Dec 1, 2022
19a0ea9
bug w and subdiff
Badr-MOUFAD Dec 1, 2022
e01451d
unittest sqrt
Badr-MOUFAD Dec 1, 2022
dd36b88
add docs
Badr-MOUFAD Dec 1, 2022
523419b
fix docs SqrtQuadratic
Badr-MOUFAD Dec 1, 2022
71de179
Merge branch 'main' of https://github.com/scikit-learn-contrib/skglm …
Badr-MOUFAD Dec 2, 2022
63a547b
subdiff --> fixed_point
Badr-MOUFAD Dec 4, 2022
f78d17d
efficient prox conjugate && fix tests
Badr-MOUFAD Dec 5, 2022
d0ae3a4
remove go
Badr-MOUFAD Dec 5, 2022
ad36485
MM remarks
Badr-MOUFAD Dec 5, 2022
f60bd59
fix test && clean ups
Badr-MOUFAD Dec 5, 2022
5a5f1ba
MM round 2 remarks
Badr-MOUFAD Dec 5, 2022
4f27c56
CI Trigger
Badr-MOUFAD Dec 5, 2022
fe45faa
implement pinball
Badr-MOUFAD Dec 6, 2022
3ce886f
unittest
Badr-MOUFAD Dec 6, 2022
6928502
fix pinball value && ST step
Badr-MOUFAD Dec 6, 2022
1271288
more unittest
Badr-MOUFAD Dec 6, 2022
bd1984a
fix bug prox pinball
Badr-MOUFAD Dec 6, 2022
36100c7
Merge branch 'main' of https://github.com/scikit-learn-contrib/skglm …
Badr-MOUFAD Dec 8, 2022
1a03c60
MM remarks
Badr-MOUFAD Dec 8, 2022
4b3ea45
Update skglm/experimental/quantile_regression.py
mathurinm Dec 8, 2022
9cf2216
pinball expression
Badr-MOUFAD Dec 8, 2022
626b71d
Merge branch 'pinball-df' of https://github.com/Badr-MOUFAD/skglm int…
Badr-MOUFAD Dec 8, 2022
8e93720
sqrt --> pinball
Badr-MOUFAD Dec 8, 2022
0a247f0
quantile --> quantile_level
Badr-MOUFAD Dec 9, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
MM round 2 remarks
  • Loading branch information
Badr-MOUFAD committed Dec 5, 2022
commit 5a5f1baf2fee50ad584eebece34066ec36abf26a
37 changes: 27 additions & 10 deletions skglm/experimental/pdcd_ws.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import numpy as np
from numpy.linalg import norm
from scipy.sparse import issparse

from numba import njit
from skglm.utils.jit_compilation import compiled_clone
Expand All @@ -13,20 +14,24 @@ class PDCD_WS:

It solves::

\min_w F(Xw) + G(w) \Leftrightarrow \min_w \max_z <Xw, z> + G(w) - F^*(z)
\min_w F(Xw) + G(w)

where :math:`F` is the datafit term (:math:`F^*` it's Fenchel conjugate)
using a primal-dual method on the saddle point problem::

\min_w \max_z <Xw, z> + G(w) - F^*(z)

where :math:`F` is the datafit term (:math:`F^*` its Fenchel conjugate)
and :math:`G` is the penalty term.

The datafit is required to be convex and proximable. Also, the penalty
is also required to be convex, separable, and proximable.
is required to be convex, separable, and proximable.

The solver is inspired by [1] and uses working sets [2].
The solver is an adaptation of algorithm [1] to working sets [2].
The working sets are built using a fixed point distance strategy
where each feature is assigned a score based how much it maps
to itself when performing a primal update::
where each feature is assigned a score based how much its coefficient varies
when performing a primal update::

\text{score}_j = \abs{w_j - prox_{G_j}(w_j - \tau_j <X_j, z>)}
\text{score}_j = \abs{w_j - prox_{\tau_j, G_j}(w_j - \tau_j <X_j, z>)}

where :maths:`\tau_j` is the primal step associated with the j-th feature.

Expand All @@ -39,6 +44,10 @@ class PDCD_WS:
max_epochs : int, optional
Maximum number of primal CD epochs on each subproblem.

dual_init : array, shape (n_samples,) default None
The initialization of dual variables.
If None, they are initialized as the 0 vector ``np.zeros(n_samples)``.

p0 : int, optional
First working set size.

Expand All @@ -62,15 +71,19 @@ class PDCD_WS:
https://arxiv.org/abs/2204.07826
"""

def __init__(self, max_iter=1000, max_epochs=1000,
def __init__(self, max_iter=1000, max_epochs=1000, dual_init=None,
p0=100, tol=1e-6, verbose=False):
self.max_iter = max_iter
self.max_epochs = max_epochs
self.dual_init = dual_init
self.p0 = p0
self.tol = tol
self.verbose = verbose

def solve(self, X, y, datafit_, penalty_, w_init=None, Xw_init=None):
if issparse(X):
raise ValueError("Sparse matrices are not yet support in PDCD_WS solver.")

datafit, penalty = PDCD_WS._validate_init(datafit_, penalty_)
n_samples, n_features = X.shape

Expand All @@ -86,8 +99,12 @@ def solve(self, X, y, datafit_, penalty_, w_init=None, Xw_init=None):
Xw = np.zeros(n_samples) if Xw_init is None else Xw_init

# dual vars
z = y.copy()
z_bar = y.copy()
if self.dual_init is None:
z = np.zeros(n_samples)
z_bar = np.zeros(n_samples)
else:
z = self.dual_init.copy()
z_bar = self.dual_init.copy()

p_objs = []
stop_crit = 0.
Expand Down
2 changes: 1 addition & 1 deletion skglm/experimental/sqrt_lasso.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def prox(self, w, step, y):
return y - BST(y - w, step)

def prox_conjugate(self, z, step, y):
"""Prox of ||y - . ||^* with step."""
"""Prox of ||y - . ||^* with step `step`."""
return proj_L2ball(z - step * y)

def subdiff_distance(self, Xw, z, y):
Expand Down
7 changes: 5 additions & 2 deletions skglm/experimental/tests/test_sqrt_lasso.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,14 +59,17 @@ def test_prox_newton_cp():
np.testing.assert_allclose(clf.coef_, w)


def test_PDCD_WS():
@pytest.mark.parametrize('with_dual_init', [True, False])
def test_PDCD_WS(with_dual_init):
n_samples, n_features = 50, 10
X, y, _ = make_correlated_data(n_samples, n_features, random_state=0)

alpha_max = norm(X.T @ y, ord=np.inf) / norm(y)
alpha = alpha_max / 10

w = PDCD_WS().solve(X, y, SqrtQuadratic(), L1(alpha))[0]
dual_init = y / norm(y) if with_dual_init else None

w = PDCD_WS(dual_init=dual_init).solve(X, y, SqrtQuadratic(), L1(alpha))[0]
clf = SqrtLasso(alpha=alpha, tol=1e-12).fit(X, y)
np.testing.assert_allclose(clf.coef_, w, atol=1e-6)

Expand Down