Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikhonov_from_prior treats incorrectly zero_indices #356

Open
ivarzap opened this issue Jan 14, 2020 · 7 comments
Open

tikhonov_from_prior treats incorrectly zero_indices #356

ivarzap opened this issue Jan 14, 2020 · 7 comments

Comments

@ivarzap
Copy link

ivarzap commented Jan 14, 2020

S_inv[nonzero_indices] = 1. / S_inv[nonzero_indices]

Should be:
S_inv = 1. / S_inv

Otherwise, small singular values would be close to not being regularized

@jasmainak
Copy link
Member

this is standard way to invert a matrix, it's the moore-penrose pseudoinverse. Why do you see a problem with this?

@ivarzap
Copy link
Author

ivarzap commented Jan 17, 2020

What is the reason to use Moore-Penrose pseudo-inverse to regularize?

In that case, say you have a singular value a bit over threshold (say 0.0002 with current defaults). Then, the projected space will regularize heavily this value (as expected). Other singular values that do not survive the cutoff (<=threshold) will not be regularized at all (except for a tiny quantity =threshold).
I see that this could be an undesirable behavior for regularization for close to singular "priov_cov" matrices

@jasmainak
Copy link
Member

do you observe this problem in your data? what's the standard way to deal with this? If you just invert a 0 in a singular matrix, it will blow up.

@ivarzap
Copy link
Author

ivarzap commented Jan 17, 2020

I am not using yet your code to regularize an elasticnet model, but I'm planning to implement it very soon.

My issue is of a theoretical character: I wonder what is the the case in which small singular value (SV) directions are close to not being regularized versus the larger SV directions.

In my current understanding, one introduces a ridge regularizer (Thikonov matrix proportional to identity) to shift the Moore-Penrose inverse of the X.T.dot(X) in order to being able to stabilize the regression by strongly suppressing small SV.

On the other hand, the computation proposed in tikhonov_from_prior inverts the sufficiently large singular values of prior_cov, which amounts to adding a bias against the small SV directions, but those that are really small are not regularized at all (those <= threshold). I would like to understand a basis for this use of the regularizers.

In my proposal, due to the small SV being shifted to threshold, the inversion of those indices too, would make the Tikhonov matrix very large (~1e4) in those directions and, to all purposes, they would disappear from the regression.

@jasmainak
Copy link
Member

jasmainak commented Jan 22, 2020

@pavanramkumar do you have any comments here?

@pavanramkumar
Copy link
Collaborator

pavanramkumar commented Feb 23, 2020

@ivarzap thanks for your question.

  • What we are calling the Tikhonov matrix is the matrix square root of the inverse of the prior covariance matrix. In other words, the inner product of the Tikhonov matrix is the inverse of the covariance matrix. Notation from Wikipedia: https://en.wikipedia.org/wiki/Tikhonov_regularization

  • We are using SVD to compute the inverse of the covariance matrix. if the covariance matrix is not full rank, there are going to be some singular values very close to zero that will prevent the inversion of the diagonal matrix S by simply reciprocating all singular values. As @jasmainak said above, the standard practice to invert a diagonal matrix under these circumstances is to only reciprocate the singular values above a threshold. Here is a blog post that walks through the inversion step by step: https://www.johndcook.com/blog/2018/05/05/svd/

  • Approximating the inverse in this way is equivalent to approximating a low rank covariance matrix (and its inverse) by throwing away the left and right singular vectors for singular values below a threshold.

Hope this helps!

@jasmainak
Copy link
Member

@pavanramkumar I do think what we are doing is a bit non-standard. The threshold should probably be 0 by default? And in any case we should use scipy pinv instead of inventing our own.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants