-
Notifications
You must be signed in to change notification settings - Fork 602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #2272 +/- ##
==========================================
+ Coverage 73.12% 73.25% +0.12%
==========================================
Files 111 111
Lines 12127 12200 +73
==========================================
+ Hits 8868 8937 +69
- Misses 3259 3263 +4
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for getting this together!
Questions
If we want the default to be using highly_variable
if it's present, how do we let people not use a mask if "highly_variable"
is in var
?
I'm thinking the default may need to be the _Empty
singleton, so we can explicitly check if mask=None
was passed.
Requested changes
Collapse mask arguments
Instead of having two mask related arguments, you can collapse them into one:
mask: Union[np.ndarray, str, None] = None,
Which is validated via some function, kinda like:
def check_mask(
adata: AnnData,
mask: Union[str, np.ndarray],
axis: int = 0,
) -> np.ndarray: # Could also be a series, but should be one or the other
"""
Validate mask argument
Params
------
adata
mask
The mask. Either an appropriatley sized boolean array, or name of a column which will be used to mask.
axis
The axis being masked
"""
if isinstance(mask, str):
annot = ("obs", "var")[axis]
mask_array = annot[mask].values
else:
if len(mask) != adata.shape[axis]:
raise ValueError(
... # Shapes don't match
)
mask_array = mask
if not pd.api.types.is_bool_dtype(mask_array):
raise ValueError(
... # mask array must be boolean, was {whatever dtype is was}
)
return mask_array
Tests
I think a number of the tests may have broken from this PR since the logic for the masks isn't quite right. A good way to help debugging this is to split up the cases from the mask test into multiple ones. E.g., one that checks for the default behavior.
Extra data files
Some extra data files got added. Could you remove those?
Co-authorship
@tothmarcella should be a co-author on this PR, right? Could you make a commit with her listed as a co-author? Here're some docs on how to do that.
Co-authored-by: tothmarcella [email protected]
Co-authored-by: tothmarcella [email protected]
Co-authored-by: tothmarcella [email protected]
merge updates
Possible TODO:
@ivirshup I reverted the change in a6290ee where you changed -X_pca = np.zeros((X.shape[0], n_comps), X.dtype)
+X_pca = np.zeros((adata_comp.shape[0], n_comps), adata.X.dtype) the commit message is “Fix up pca tests”, but that change doesn’t seem to impact tests and it takes properties from several different object without reasoning. |
mask parameter added to pca method in _pca.py
test_pca_mask added to test_pca.py
Deprecation warning on use_highly_variable parameter added to test_deprecations.py
rendered docs