Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Why gamma * beta stand for ` L2 in LogisticRegression._NLL_grad #53

Closed
eromoe opened this issue Jul 13, 2020 · 5 comments
Closed
Labels
bug Something isn't working

Comments

@eromoe
Copy link

eromoe commented Jul 13, 2020

Hello, this is a great project , I am learning how to implement model without sklearn/tensorflow , it really help me a lot .

I have a question on

d_penalty = gamma * beta if p == "l2" else gamma * l1norm(beta) * np.sign(beta)

Since P-norm is defined as
image

l1norms(self.beta) means the sum of all absulote value of each element in self.beta . I don't quite understand why the simple gamma * beta stand for `L2 ?

PS: May I ask what IDE and code document plugin you are using ? I see some annotation don't beyond to latex , it would be nice to see beautiful math symbols than raw latex :)

@ddbourgin
Copy link
Owner

For linear regression, the l2-regularization term is gamma * np.sqrt(beta @ beta)
The gradient of l2 penalty wrt beta is then simply gamma * beta

Keep in mind that d_penality is the gradient of the penalty term wrt the coefficients, not the penalty itself :)

I don't use a special IDE, unfortunately. the equations are formatted for display as Sphinx reStructuredText. You can see the rendered equations in the online documentation, or build it yourself from the source in the docs directory. There may also be IDE plugins that will try to render them, but I am not aware of any :)

@eromoe
Copy link
Author

eromoe commented Jul 18, 2020

@ddbourgin Thank you for reply .

From https://towardsdatascience.com/intuitions-on-l1-and-l2-regularisation-235f2db4c261

image

l1-regularization term is gamma * np.absolute(beta)
l2-regularization term is gamma * np.power(np.sqrt(beta @ beta), 2) (I think you miswrote in previous comment )

The gradient of l1 penalty wrt beta is then gamma * np.sign(beta)
The gradient of l2 penalty wrt beta is then gamma * 2beta proportional to gamma * beta .

Actually I thought l2-regularization term was gamma * np.sqrt(beta @ beta) , so the gradient of l2 term is +- 1 too .Because sometimes I thought L2 norm was beta^2 , sometimes it was np.sqrt(beta^2) in my brain , l2 norm and l2-regularization term` are so likely and mess up , now I have figure it clear .

But there is a left problem : why you multiply l1norm(beta) in L1 case ? since the gradient of l1 penalty is gamma * np.sign(beta) , this confused me .

@ddbourgin
Copy link
Owner

ddbourgin commented Jul 25, 2020

Whoops, yup, that's what I get for being hasty! The regularization penalty is (gamma / 2) * np.sqrt(beta @ beta) ** 2, which gives a gradient of gamma * beta.

In the L1 case, I'd recommend explicitly writing down the L1 penalty (not just the l1 norm) and then trying to derive the gradient wrt beta. It should quickly become clear why there is an l1norm term in the calc :)

@eromoe
Copy link
Author

eromoe commented Jul 27, 2020

@ddbourgin Sorry but I don't quite understand why penalty in L1 case need square as L2 does

penalty = 0.5 * self.gamma * np.linalg.norm(self.beta, ord=order) ** 2   #  remaid square under l1 case

All ariticles I saw was using a L1 term (penalty) like
image
And the derivative is +-\lambda .
Now I am very confusing .

@ddbourgin
Copy link
Owner

Oh! I see what you're saying. You're right, the square of the L1 norm is not what we want. The proper L1 penalty is

gamma * np.abs(beta).sum()

which gives a gradient of

gamma * np.sign(beta)

I'll make a PR to fix this. Thank you very much for pointing this out :)

@ddbourgin ddbourgin reopened this Jul 27, 2020
@ddbourgin ddbourgin added the bug Something isn't working label Jul 27, 2020
RaulMurillo added a commit to RaulMurillo/numpy-ml that referenced this issue Mar 4, 2021
This reverts commit b537fac.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants