Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272

chelseabright96 · 2022-06-14T06:40:19Z

mask parameter added to pca method in _pca.py
test_pca_mask added to test_pca.py
Deprecation warning on use_highly_variable parameter added to test_deprecations.py

rendered docs

codecov · 2022-06-14T06:55:26Z

Codecov Report

Merging #2272 (7839a73) into master (05dcf68) will increase coverage by 0.12%.
The diff coverage is 89.65%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2272      +/-   ##
==========================================
+ Coverage   73.12%   73.25%   +0.12%     
==========================================
  Files         111      111              
  Lines       12127    12200      +73     
==========================================
+ Hits         8868     8937      +69     
- Misses       3259     3263       +4

Files	Coverage Δ
scanpy/experimental/pp/_normalization.py	`95.12% <100.00%> (+1.29%)`	⬆️
scanpy/get/__init__.py	`100.00% <100.00%> (ø)`
scanpy/preprocessing/_docs.py	`100.00% <100.00%> (ø)`
scanpy/preprocessing/_utils.py	`46.87% <100.00%> (+1.71%)`	⬆️
scanpy/tools/_rank_genes_groups.py	`94.35% <100.00%> (+0.13%)`	⬆️
scanpy/tools/_umap.py	`71.64% <ø> (ø)`
scanpy/tools/_utils.py	`71.95% <ø> (ø)`
scanpy/_utils/__init__.py	`65.24% <50.00%> (-0.09%)`	⬇️
scanpy/get/get.py	`92.63% <94.44%> (+0.18%)`	⬆️
scanpy/preprocessing/_simple.py	`82.89% <94.73%> (+0.67%)`	⬆️
... and 3 more

... and 1 file with indirect coverage changes

ivirshup

Thanks for getting this together!

Questions

If we want the default to be using highly_variable if it's present, how do we let people not use a mask if "highly_variable" is in var?

I'm thinking the default may need to be the _Empty singleton, so we can explicitly check if mask=None was passed.

Requested changes

Collapse mask arguments

Instead of having two mask related arguments, you can collapse them into one:

    mask: Union[np.ndarray, str, None] = None,

Which is validated via some function, kinda like:

def check_mask(
    adata: AnnData,
    mask: Union[str, np.ndarray],
    axis: int = 0,
) -> np.ndarray:  # Could also be a series, but should be one or the other
    """
    Validate mask argument

    Params
    ------
    adata
    mask
        The mask. Either an appropriatley sized boolean array, or name of a column which will be used to mask.
    axis
        The axis being masked
    """
    if isinstance(mask, str):
        annot = ("obs", "var")[axis]
        mask_array = annot[mask].values
    else:
        if len(mask) != adata.shape[axis]:
            raise ValueError(
                ... # Shapes don't match
            )
        mask_array = mask

    if not pd.api.types.is_bool_dtype(mask_array):
        raise ValueError(
            ... # mask array must be boolean, was {whatever dtype is was}
        )

    return mask_array

Tests

I think a number of the tests may have broken from this PR since the logic for the masks isn't quite right. A good way to help debugging this is to split up the cases from the mask test into multiple ones. E.g., one that checks for the default behavior.

Extra data files

Some extra data files got added. Could you remove those?

Co-authorship

@tothmarcella should be a co-author on this PR, right? Could you make a commit with her listed as a co-author? Here're some docs on how to do that.

scanpy/preprocessing/_pca.py

scanpy/tests/test_deprecations.py

scanpy/tests/test_pca.py

Co-authored-by: tothmarcella [email protected]

merge updates

scanpy/preprocessing/_pca.py

…anpy into mask_plotting

scanpy/plotting/_tools/scatterplots.py

flying-sheep · 2023-11-13T09:26:23Z

Possible TODO:

normalize_pearson_residuals_pca

@ivirshup I reverted the change in a6290ee where you changed

-X_pca = np.zeros((X.shape[0], n_comps), X.dtype)
+X_pca = np.zeros((adata_comp.shape[0], n_comps), adata.X.dtype)

the commit message is “Fix up pca tests”, but that change doesn’t seem to impact tests and it takes properties from several different object without reasoning.

addressed

chelseabright96 added 3 commits June 12, 2022 17:19

add mask to PCA

f0cc7a8

add tests for mask param

c075516

fix bugs on test_pca_mask

cf7cc19

chelseabright96 added 2 commits June 14, 2022 10:08

run pre-commit

142631f

fix failing tests

c3c9b6f

ivirshup previously requested changes Jun 15, 2022

View reviewed changes

chelseabright96 and others added 6 commits June 16, 2022 10:09

fix failing tests

a9a7327

Co-authored-by: tothmarcella [email protected]

add changes from review comments

45ab2e0

Co-authored-by: tothmarcella [email protected]

add changes from review comments

40a14b2

Co-authored-by: tothmarcella [email protected]

Merge pull request #1 from scverse/master

865f78b

merge updates

Merge branch 'master' into mask

6de3161

Merge remote-tracking branch 'origin/mask' into mask

dbef528

ivirshup reviewed Jun 17, 2022

View reviewed changes

scanpy/preprocessing/_pca.py Outdated Show resolved Hide resolved

tothmarcella and others added 5 commits June 17, 2022 20:43

fix test, add None option to mask

03522d3

fix test and add none option for mask

92d622c

fix bugs for tests

da5033b

fix bugs for tests

2fc1f1b

fix bugs for tests

b7e0de3

chelseabright96 force-pushed the mask branch from 92d622c to b7e0de3 Compare June 18, 2022 05:45

chelseabright96 and others added 10 commits June 18, 2022 11:59

fix tests

f35fa26

fix tests for mask

2c47282

add tests for umap mask

3e5a733

add tests for umap mask

0b08f44

add mask to umap

a54a2a3

none option for mask

9d20fd2

none option for mask

f3976f8

trying to remove files

271968f

Merge branch 'mask_plotting' of https://github.com/chelseabright96/sc…

9a72b84

…anpy into mask_plotting

fixes, checks, tests

b75a78b

remove duplicated piece of code

b5c007b

flying-sheep reviewed Nov 10, 2023

View reviewed changes

scanpy/plotting/_tools/scatterplots.py Outdated Show resolved Hide resolved

flying-sheep added 3 commits November 10, 2023 15:13

dedupe warning msg

7afbeb7

Dedupe scale_array

ae2426e

simplify _check_mask

65b6f75

flying-sheep added 17 commits November 13, 2023 10:55

extract mask param handling

d10c6a9

revert weird change to X_pca

6d690de

update docs

b3446e6

Merge branch 'master' into mask

2418c6a

empty repr

6b9eb91

fix 3.9 compat

f303727

Merge branch 'master' into mask

4d3417e

remove duplicated parameter set

f73b523

more compact tests

20afeeb

PCA test improvements

71828ca

Streamline rank_genes_groups tests

cea8e11

File extensions

4f86e8b

Merge branch 'master' into mask

e2b48a1

no need for _empty in rank_genes_groups

1a0d2b7

Add mask support to experimental.normalize_pearson_residuals_pca

0b5a54d

oops

e3456be

relnotes update

7e7b9c9

flying-sheep changed the title ~~Mask for pca, scatterplot, scale~~ Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale Nov 14, 2023

flying-sheep added 4 commits November 14, 2023 13:59

Better tests

60016bb

prepare tests for mask

6c4fdb0

add mask test

643a84f

clean up changed import lines

7839a73

flying-sheep merged commit 973c4c3 into scverse:master Nov 21, 2023
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272

Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272

chelseabright96 commented Jun 14, 2022 •

edited by flying-sheep

Loading

codecov bot commented Jun 14, 2022 •

edited

Loading

ivirshup left a comment

flying-sheep commented Nov 13, 2023 •

edited

Loading

Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272

Mask for pca, normalize_pearson_residuals_pca, scatterplot, scale #2272

Conversation

chelseabright96 commented Jun 14, 2022 • edited by flying-sheep Loading

rendered docs

codecov bot commented Jun 14, 2022 • edited Loading

Codecov Report

ivirshup left a comment

Choose a reason for hiding this comment

Questions

Requested changes

Collapse mask arguments

Tests

Extra data files

Co-authorship

flying-sheep commented Nov 13, 2023 • edited Loading

chelseabright96 commented Jun 14, 2022 •

edited by flying-sheep

Loading

codecov bot commented Jun 14, 2022 •

edited

Loading

flying-sheep commented Nov 13, 2023 •

edited

Loading