Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272

Merged
merged 71 commits into from
Nov 21, 2023

Conversation

chelseabright96
Copy link
Contributor

@chelseabright96 chelseabright96 commented Jun 14, 2022

mask parameter added to pca method in _pca.py
test_pca_mask added to test_pca.py
Deprecation warning on use_highly_variable parameter added to test_deprecations.py

rendered docs

@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

Merging #2272 (7839a73) into master (05dcf68) will increase coverage by 0.12%.
The diff coverage is 89.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2272      +/-   ##
==========================================
+ Coverage   73.12%   73.25%   +0.12%     
==========================================
  Files         111      111              
  Lines       12127    12200      +73     
==========================================
+ Hits         8868     8937      +69     
- Misses       3259     3263       +4     
Files Coverage Δ
scanpy/experimental/pp/_normalization.py 95.12% <100.00%> (+1.29%) ⬆️
scanpy/get/__init__.py 100.00% <100.00%> (ø)
scanpy/preprocessing/_docs.py 100.00% <100.00%> (ø)
scanpy/preprocessing/_utils.py 46.87% <100.00%> (+1.71%) ⬆️
scanpy/tools/_rank_genes_groups.py 94.35% <100.00%> (+0.13%) ⬆️
scanpy/tools/_umap.py 71.64% <ø> (ø)
scanpy/tools/_utils.py 71.95% <ø> (ø)
scanpy/_utils/__init__.py 65.24% <50.00%> (-0.09%) ⬇️
scanpy/get/get.py 92.63% <94.44%> (+0.18%) ⬆️
scanpy/preprocessing/_simple.py 82.89% <94.73%> (+0.67%) ⬆️
... and 3 more

... and 1 file with indirect coverage changes

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this together!

Questions

If we want the default to be using highly_variable if it's present, how do we let people not use a mask if "highly_variable" is in var?

I'm thinking the default may need to be the _Empty singleton, so we can explicitly check if mask=None was passed.

Requested changes

Collapse mask arguments

Instead of having two mask related arguments, you can collapse them into one:

    mask: Union[np.ndarray, str, None] = None,

Which is validated via some function, kinda like:

def check_mask(
    adata: AnnData,
    mask: Union[str, np.ndarray],
    axis: int = 0,
) -> np.ndarray:  # Could also be a series, but should be one or the other
    """
    Validate mask argument

    Params
    ------
    adata
    mask
        The mask. Either an appropriatley sized boolean array, or name of a column which will be used to mask.
    axis
        The axis being masked
    """
    if isinstance(mask, str):
        annot = ("obs", "var")[axis]
        mask_array = annot[mask].values
    else:
        if len(mask) != adata.shape[axis]:
            raise ValueError(
                ... # Shapes don't match
            )
        mask_array = mask

    if not pd.api.types.is_bool_dtype(mask_array):
        raise ValueError(
            ... # mask array must be boolean, was {whatever dtype is was}
        )

    return mask_array

Tests

I think a number of the tests may have broken from this PR since the logic for the masks isn't quite right. A good way to help debugging this is to split up the cases from the mask test into multiple ones. E.g., one that checks for the default behavior.

Extra data files

Some extra data files got added. Could you remove those?

Co-authorship

@tothmarcella should be a co-author on this PR, right? Could you make a commit with her listed as a co-author? Here're some docs on how to do that.

scanpy/preprocessing/_pca.py Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved
scanpy/tests/test_deprecations.py Outdated Show resolved Hide resolved
scanpy/tests/test_pca.py Outdated Show resolved Hide resolved
@flying-sheep
Copy link
Member

flying-sheep commented Nov 13, 2023

Possible TODO:

  • normalize_pearson_residuals_pca

@ivirshup I reverted the change in a6290ee where you changed

-X_pca = np.zeros((X.shape[0], n_comps), X.dtype)
+X_pca = np.zeros((adata_comp.shape[0], n_comps), adata.X.dtype)

the commit message is “Fix up pca tests”, but that change doesn’t seem to impact tests and it takes properties from several different object without reasoning.

@flying-sheep flying-sheep changed the title Mask for pca, scatterplot, scale Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale Nov 14, 2023
@flying-sheep flying-sheep merged commit 973c4c3 into scverse:master Nov 21, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants