Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Modification of GaussianMixture class. #7123

Merged
merged 5 commits into from
Aug 10, 2016

Conversation

tguillemot
Copy link
Contributor

The PR is to simplify the integration of the BayesianGaussianMixture class #6651.
I've simplify the GaussianMixture class by integrating a function which computes the determinant of the cholesky decomposition of the precision matrix (which will be usefull for BayesianGaussianMixture).

I've also corrected a bug during the EM process: normally, the lower bound is computed after the M-step not after the E-step. It's not a problem for GMM but that creates some problem for BayesianGaussianMixture.

@agramfort @ogrisel @amueller Can you have a look ?

The purpose here is to prepare the integration of BayesianGaussianMixture.
for init in range(n_init):
self._print_verbose_msg_init_beg(init)

if do_init:
self._initialize_parameters(X)
current_log_likelihood, resp = self._e_step(X)
self.lower_bound_ = np.infty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't you document the new attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, sorry for that mistake.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know np.infty also returns np.inf... Interesting...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be self.lower_bound_ = -np.infty ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has been merged but I re-ask the question: shouldn't it be -np.infty ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngoix Sorry I forgot that - indeed. I don't know why I've missed your comment.
Sorry for that. I've solved that on #7180

@agramfort
Copy link
Member

that's it for me

@tguillemot tguillemot changed the title Modification of GaussianMixture class. [MRG] Modification of GaussianMixture class. Aug 1, 2016
@tguillemot tguillemot changed the title [MRG] Modification of GaussianMixture class. [MRG+1] Modification of GaussianMixture class. Aug 2, 2016
@tguillemot
Copy link
Contributor Author

@amueller @ogrisel @raghavrv If you have time to do another review :)

@@ -136,7 +136,7 @@ def _initialize_parameters(self, X):
----------
X : array-like, shape (n_samples, n_features)
"""
n_samples = X.shape[0]
n_samples, _ = X.shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this was suggested before...

Copy link
Contributor Author

@tguillemot tguillemot Aug 2, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just I prefer like this and it also check that the shape of X is a tuple.
Sorry for these useless little modifications.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Thx!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it also checks implicitly that X.ndim == 2. It do the same in my code.

@@ -563,6 +553,9 @@ class GaussianMixture(BaseMixture):

n_iter_ : int
Number of step used by the best fit of EM to reach the convergence.

lower_bound_ : float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lower_bound_ -> best_log_likelihood_ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact for GMM this is the best log likelihood but not for VBGMM which is lower bound.
I've chosen lower_bound_ because it was the most understandable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@tguillemot
Copy link
Contributor Author

@amueller @ogrisel I really need this to be merged before you review #6651.

@tguillemot
Copy link
Contributor Author

@ngoix This PR is related to Bayesian Gaussian Mixture, if you have some time to review it.
Thx in advance :)

@agramfort
Copy link
Member

one more +1 and we're good here...


for n_iter in range(self.max_iter):
prev_log_likelihood = current_log_likelihood
prev_log_likelihood = self.lower_bound_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_log_likelihood -> prev_lower_bound ?

@ngoix
Copy link
Contributor

ngoix commented Aug 8, 2016

By testing the code, I realized that there is a problem which was here before this PR:
In gaussian_mixture._set_parameters(), self.covariances_ is not updated. This causes that with n_init > 1, the covariances outputed do not correspond to the means.


def _estimate_log_weights(self):
return np.log(self.weights_)

def _compute_lower_bound(self, _, log_prob_norm):
return log_prob_norm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _set_parameters below, compute self.covariances_ from self.precisions_.

@ngoix
Copy link
Contributor

ngoix commented Aug 8, 2016

That's all from me!

@tguillemot
Copy link
Contributor Author

@ngoix Indeed you're right. I've missed that. Thanks.
I've corrected that and added a test to check it.

@tguillemot
Copy link
Contributor Author

@agramfort merge ?

@tguillemot tguillemot changed the title [MRG+1] Modification of GaussianMixture class. [MRG+2] Modification of GaussianMixture class. Aug 8, 2016
@tguillemot
Copy link
Contributor Author

@amueller Can you merge please ?

@agramfort agramfort merged commit 65b7d7a into scikit-learn:master Aug 10, 2016
@agramfort
Copy link
Member

Thanks @tguillemot

@tguillemot
Copy link
Contributor Author

@agramfort @ngoix Thanks

TomDLT pushed a commit to TomDLT/scikit-learn that referenced this pull request Oct 3, 2016
* Modification of GaussianMixture class.

The purpose here is to prepare the integration of BayesianGaussianMixture.

* Fix comments.

* Modification of the Docstring.

* Add license and author.

* Fix review and add tests for init.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants