Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH - Add Pinball datafit #134

Merged
merged 31 commits into from
Dec 9, 2022

Conversation

Badr-MOUFAD
Copy link
Collaborator

@Badr-MOUFAD Badr-MOUFAD commented Dec 6, 2022

Provided that we now have a solver that handles non-smooth datafits #131

This adds a Pinball datafit that can be used to fit a QuantileRegressor
and hence as a by-product fit a LAD-Lasso (quantile=0.5)

closes #133

@Badr-MOUFAD
Copy link
Collaborator Author

Badr-MOUFAD commented Dec 8, 2022

The benchmarks results on L1- Quantile regression problem for the following setups:

  • datasets: MEG, Simulated
  • regularization parameter: [0.5, 0.1, 0.05] * lambda_max
  • solvers: PDCD-WS, scipy.linprog solvers

@lorentzenchr, your feedback is valuable 🙏


The datafit reads::

quantile * max(y - Xw, 0) + (1 - quantile) * max(Xw - y, 0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: the real value does not involve np.max, it is np.maximum(...). sum().
Maybe rewrite as a sum and use _i to denote sample indices ? check how sklearn does it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked up scikit-learn source code, but they don't specify the expression.

I am more with the usage of sum and _ to indicate samples.

Copy link

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't comment on the core functions like prox as I'm not too familiar with skglm.
A few more tests could not hurt.


The datafit reads::

sum_i quantile * max(y_i - Xw_i, 0) + (1 - quantile) * max(Xw_i - y_i, 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming "quantile" is unfortunate. It should be "quantile level", i.e. the quantile at level 50% is the median. The "quantile" in the formula is not the quantile, but Xw_i is an estimation of the quantile.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lorentzenchr for the insightful remark, we'll change that

@mathurinm mathurinm merged commit 4bee85f into scikit-learn-contrib:main Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEAT add Pinball loss
3 participants