[Question] Why `gamma * beta` stand for ` L2 in LogisticRegression._NLL_grad #53

eromoe · 2020-07-13T09:54:31Z

Hello, this is a great project , I am learning how to implement model without sklearn/tensorflow , it really help me a lot .

I have a question on

numpy-ml/numpy_ml/linear_models/lm.py

Line 252 in 4f37707

d_penalty = gamma * beta if p == "l2" else gamma * l1norm(beta) * np.sign(beta)

Since P-norm is defined as

l1norms(self.beta) means the sum of all absulote value of each element in self.beta . I don't quite understand why the simple gamma * beta stand for `L2 ?

PS: May I ask what IDE and code document plugin you are using ? I see some annotation don't beyond to latex , it would be nice to see beautiful math symbols than raw latex :)

The text was updated successfully, but these errors were encountered:

ddbourgin · 2020-07-17T22:09:01Z

For linear regression, the l2-regularization term is gamma * np.sqrt(beta @ beta)
The gradient of l2 penalty wrt beta is then simply gamma * beta

Keep in mind that d_penality is the gradient of the penalty term wrt the coefficients, not the penalty itself :)

I don't use a special IDE, unfortunately. the equations are formatted for display as Sphinx reStructuredText. You can see the rendered equations in the online documentation, or build it yourself from the source in the docs directory. There may also be IDE plugins that will try to render them, but I am not aware of any :)

eromoe · 2020-07-18T02:31:50Z

@ddbourgin Thank you for reply .

From https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261

l1-regularization term is gamma * np.absolute(beta)
l2-regularization term is gamma * np.power(np.sqrt(beta @ beta), 2) (I think you miswrote in previous comment )

The gradient of l1 penalty wrt beta is then gamma * np.sign(beta)
The gradient of l2 penalty wrt beta is then gamma * 2beta proportional to gamma * beta .

Actually I thought l2-regularization term was gamma * np.sqrt(beta @ beta) , so the gradient of l2 term is +- 1 too .Because sometimes I thought L2 norm was beta^2 , sometimes it was np.sqrt(beta^2) in my brain , l2 norm and l2-regularization term` are so likely and mess up , now I have figure it clear .

But there is a left problem : why you multiply l1norm(beta) in L1 case ? since the gradient of l1 penalty is gamma * np.sign(beta) , this confused me .

ddbourgin · 2020-07-25T03:42:01Z

Whoops, yup, that's what I get for being hasty! The regularization penalty is (gamma / 2) * np.sqrt(beta @ beta) ** 2, which gives a gradient of gamma * beta.

In the L1 case, I'd recommend explicitly writing down the L1 penalty (not just the l1 norm) and then trying to derive the gradient wrt beta. It should quickly become clear why there is an l1norm term in the calc :)

eromoe · 2020-07-27T07:28:12Z

@ddbourgin Sorry but I don't quite understand why penalty in L1 case need square as L2 does

penalty = 0.5 * self.gamma * np.linalg.norm(self.beta, ord=order) ** 2   #  remaid square under l1 case

All ariticles I saw was using a L1 term (penalty) like

And the derivative is +-\lambda .
Now I am very confusing .

ddbourgin · 2020-07-27T15:38:15Z

Oh! I see what you're saying. You're right, the square of the L1 norm is not what we want. The proper L1 penalty is

gamma * np.abs(beta).sum()

which gives a gradient of

gamma * np.sign(beta)

I'll make a PR to fix this. Thank you very much for pointing this out :)

This reverts commit b537fac.

ddbourgin closed this as completed Jul 17, 2020

ddbourgin reopened this Jul 27, 2020

ddbourgin added the bug Something isn't working label Jul 27, 2020

ddbourgin closed this as completed in b537fac Jul 27, 2020

RaulMurillo added a commit to RaulMurillo/numpy-ml that referenced this issue Mar 4, 2021

Revert "Fix ddbourgin#53"

d06adb5

This reverts commit b537fac.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Why `gamma * beta` stand for ` L2 in LogisticRegression._NLL_grad #53

[Question] Why `gamma * beta` stand for ` L2 in LogisticRegression._NLL_grad #53

eromoe commented Jul 13, 2020 •

edited

Loading

ddbourgin commented Jul 17, 2020

eromoe commented Jul 18, 2020 •

edited

Loading

ddbourgin commented Jul 25, 2020 •

edited

Loading

eromoe commented Jul 27, 2020 •

edited

Loading

ddbourgin commented Jul 27, 2020

[Question] Why gamma * beta stand for ` L2 in LogisticRegression._NLL_grad #53

[Question] Why gamma * beta stand for ` L2 in LogisticRegression._NLL_grad #53

Comments

eromoe commented Jul 13, 2020 • edited Loading

ddbourgin commented Jul 17, 2020

eromoe commented Jul 18, 2020 • edited Loading

ddbourgin commented Jul 25, 2020 • edited Loading

eromoe commented Jul 27, 2020 • edited Loading

ddbourgin commented Jul 27, 2020

[Question] Why `gamma * beta` stand for ` L2 in LogisticRegression._NLL_grad #53

[Question] Why `gamma * beta` stand for ` L2 in LogisticRegression._NLL_grad #53

eromoe commented Jul 13, 2020 •

edited

Loading

eromoe commented Jul 18, 2020 •

edited

Loading

ddbourgin commented Jul 25, 2020 •

edited

Loading

eromoe commented Jul 27, 2020 •

edited

Loading