Normalization and gene selection by analytical Pearson residuals #1715

jlause · 2021-03-03T22:21:20Z

Hi everyone,

a while back, @giovp asked me to create a pull request that integrates analytical Pearson residuals in scanpy. We already discussed a bit with @LuckyMD and @ivirshup over at berenslab/umi-normalization#1 how to structure it, and now I made a version that should be ready for review.

As discussed earlier, this pull request implements two core methods:

sc.pp.normalize_pearson_residuals(), which applies the method to adata.X. Overall, the function is very similar in structure to sc.pp.normalize_total() (support for layers, inplace operation etc).
sc.pp.highly_variable_genes(flavor='pearson_residuals'), which selects genes based on Pearson residual variance. The "inner" function _highly_variable_pearson_residuals() is structured similarly to _highly_variable_seurat_v3() (support for multiple batches, median ranks for tie breaking). It includes the chunksize argument to allow for memory-efficient computation of the residual variance.

We discussed quite a lot how to implement a third function that would bundle gene selection, normalization by analytical residuals and PCA. This PR includes the two options that emerged at the end of that discussion, so now we have to choose ;)

sc.pp.recipe_pearson_residuals() which does HVG selection and normalization both via Pearson residuals prior to PCA
sc.pp.normalize_pearson_residuals_pca() which applies any HVG selection if the user previously added one to the adata object, and then normalizes via Pearson residuals and does PCA.

Both functions retain the raw input counts as adata.X and add fields for PCA/Normalization/HVG selection results (or return them) as applicable, most importantly the X_pca in adata.obsm['pearson_residuals_X_pca']. I hope this addresses some of the issues we discussed over at the other repo in a scanpy-y way.

Let me know what you think and where you think improvements are needed!

Cheers, Jan.

giovp · 2021-03-04T10:55:12Z

thanks @jlause ! Really excited for this!
shall be able to start reviewing during weekend. quick q to @ivirshup , does #1667 somehow potentially conflict with this?

thank you!

jlause · 2021-03-04T14:17:38Z

thanks @jlause ! Really excited for this!
shall be able to start reviewing during weekend. quick q to @ivirshup , does #1667 somehow potentially conflict with this?

thank you!

cool, looking forward to your feedback!
I had a brief look at #1667 - since I use the same layer management that is now to be changed in normalize_total(), it would make sense if I mirror the change in normalize_pearson_residuals(), right? I believe doing that will even simplify the function further. If @ivirshup agrees, I could quickly do that :)

ivirshup · 2021-03-05T04:15:35Z

it would make sense if I mirror the change in normalize_pearson_residuals(), right? I believe doing that will even simplify the function further. If ivirshup agrees, I could quickly do that :)

Sounds good to me!

jlause · 2021-03-05T16:19:54Z

I finished the edits to conform with #1667! Now looking forward to your thoughts :)

One note on the coding style checks (which I'm not very experienced with): When I activate pre-commit locally, it finds quite a number of style violations in parts of the code that I did not touch and automatically fixes them. This causes many changes that are unrelated to the code I wrote. That's why I disabled pre-commit again (so you don't have to go over all these changes), and tried to follow the style guide manually as good as possible. Hope that is ok for now..

Do you have any advice how to handle that? Should I just do one "style" commit (that fixes all these issues throughout the files I work on here) once you've checked the new parts of the code I wrote? Or should the style be ok in all "old" parts of the code, implying that I set up pre-commit wrong? I'm new to it so that could very well be the case as well..

ivirshup · 2021-03-07T04:17:27Z

One note on the coding style checks (which I'm not very experienced with): When I activate pre-commit locally, it finds quite a number of style violations in parts of the code that I did not touch and automatically fixes them

Could you share the output you're getting here? The output I'm seeing on the pre-commit CI looks pretty normal, so it'd be helpful to know more exactly what was happening locally. We've also just activated pre-commit, so it's a bit new to us too!

jlause · 2021-03-09T16:48:07Z

Hey @ivirshup,
below is the output after making a small test edit to _highly_variable_genes.py and trying to commit.
Could it be that the file has not been style-checked so far (or not on the branch/version I was forking from?), leading to all these automatic changes/style violations in the old part of the code?

hope this helps, let me know if you need anything else.

jlause@8b38045532aa:~/libs/scanpy/scanpy/preprocessing$ git commit -m "test commit"
black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted scanpy/preprocessing/_highly_variable_genes.py
All done! ✨ 🍰 ✨
1 file reformatted.

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

scanpy/preprocessing/_highly_variable_genes.py:33:80: E501 line too long (84 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:37:80: E501 line too long (81 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:49:80: E501 line too long (102 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:51:80: E501 line too long (89 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:108:80: E501 line too long (82 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:123:80: E501 line too long (83 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:204:27: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:219:5: F821 undefined name 'view_to_actual'
scanpy/preprocessing/_highly_variable_genes.py:220:9: F821 undefined name '_get_obs_rep'
scanpy/preprocessing/_highly_variable_genes.py:244:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:257:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:282:80: E501 line too long (88 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:291:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:299:80: E501 line too long (84 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:305:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:335:80: E501 line too long (86 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:341:80: E501 line too long (80 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:350:80: E501 line too long (81 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:365:80: E501 line too long (84 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:392:80: E501 line too long (82 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:417:80: E501 line too long (83 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:424:80: E501 line too long (87 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:440:80: E501 line too long (86 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:449:80: E501 line too long (88 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:460:23: F541 f-string is missing placeholders
scanpy/preprocessing/_highly_variable_genes.py:463:80: E501 line too long (82 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:512:80: E501 line too long (81 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:518:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:519:80: E501 line too long (81 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:520:80: E501 line too long (81 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:521:80: E501 line too long (87 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:529:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:531:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:531:86: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:534:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:535:79: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:538:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:539:79: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:542:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:543:79: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:546:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:547:79: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:550:80: E501 line too long (87 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:556:67: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:559:80: E501 line too long (87 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:560:80: E501 line too long (88 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:560:89: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:563:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:564:1: W293 blank line contains whitespace
scanpy/preprocessing/_highly_variable_genes.py:565:80: E501 line too long (81 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:569:1: W293 blank line contains whitespace
scanpy/preprocessing/_highly_variable_genes.py:571:80: E501 line too long (89 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:572:80: E501 line too long (88 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:572:89: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:575:80: E501 line too long (83 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:576:80: E501 line too long (83 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:579:80: E501 line too long (83 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:584:80: E501 line too long (97 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:585:80: E501 line too long (86 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:586:80: E501 line too long (84 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:587:80: E501 line too long (88 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:588:80: E501 line too long (90 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:589:76: W291 trailing whitespace
scanpy/preprocessing/_highly_variable_genes.py:590:80: E501 line too long (89 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:592:1: W293 blank line contains whitespace
scanpy/preprocessing/_highly_variable_genes.py:596:80: E501 line too long (85 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:608:80: E501 line too long (84 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:613:80: E501 line too long (93 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:618:80: E501 line too long (80 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:621:80: E501 line too long (89 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:623:80: E501 line too long (93 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:656:80: E501 line too long (80 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:693:80: E501 line too long (80 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:713:80: E501 line too long (88 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:735:80: E501 line too long (82 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:737:80: E501 line too long (83 > 79 characters)
scanpy/preprocessing/_highly_variable_genes.py:742:80: E501 line too long (80 > 79 characters)

git status and git diff show the automatic changes pre-commit makes:

jlause@8b38045532aa:~/libs/scanpy/scanpy/preprocessing$ git status
On branch pearson_residuals_1.7
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   _highly_variable_genes.py

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   _highly_variable_genes.py

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	../../.pre-commit-config.yaml

jlause@8b38045532aa:~/libs/scanpy/scanpy/preprocessing$ git diff _highly_variable_genes.py 
diff --git a/scanpy/preprocessing/_highly_variable_genes.py b/scanpy/preprocessing/_highly_variable_genes.py
index 03b01940..e2851f50 100644
--- a/scanpy/preprocessing/_highly_variable_genes.py
+++ b/scanpy/preprocessing/_highly_variable_genes.py
@@ -15,7 +15,8 @@ from ._utils import _get_mean_var
 from ._distributed import materialize_as_ndarray
 from ._simple import filter_genes
 
-#testedit
+# testedit
+
 
 def _highly_variable_genes_seurat_v3(
     adata: AnnData,
@@ -98,7 +99,9 @@ def _highly_variable_genes_seurat_v3(
         else:
             clip_val_broad = np.broadcast_to(clip_val, batch_counts.shape)
             np.putmask(
-                batch_counts, batch_counts > clip_val_broad, clip_val_broad,
+                batch_counts,
+                batch_counts > clip_val_broad,
+                clip_val_broad,
             )
 
         if sp_sparse.issparse(batch_counts):
@@ -173,6 +176,7 @@ def _highly_variable_genes_seurat_v3(
             df = df.drop(['highly_variable_nbatches'], axis=1)
         return df
 
+
 def _highly_variable_pearson_residuals(
     adata: AnnData,
     layer: Optional[str] = None,
@@ -182,7 +186,7 @@ def _highly_variable_pearson_residuals(
     clip: Union[Literal['auto', 'none'], float] = 'auto',
     chunksize: int = 100,
     subset: bool = False,
-    inplace: bool = True
+    inplace: bool = True,
 ) -> Optional[pd.DataFrame]:
     """\
     See `highly_variable_genes`.
@@ -212,7 +216,7 @@ def _highly_variable_pearson_residuals(
         in all batches
     """
 
-    view_to_actual(adata)        
+    view_to_actual(adata)
     X = _get_obs_rep(adata, layer=layer)
     computed_on = layer if layer else 'adata.X'
 
@@ -328,8 +332,7 @@ def _highly_variable_pearson_residuals(
     df = df.loc[adata.var_names]
 
     if inplace or subset:
-        adata.uns['hvg'] = {'flavor': 'pearson_residuals',
-                            'computed_on':computed_on}
+        adata.uns['hvg'] = {'flavor': 'pearson_residuals', 'computed_on': computed_on}
         logg.hint(
             'added\n'
             '    \'highly_variable\', boolean vector (adata.var)\n'

...skipping 1 line
                 ['highly_variable_nbatches', 'highly_variable_intersection'], axis=1
             )
         return df
-    
-    
-    
+
 
 def _highly_variable_genes_single_batch(
     adata: AnnData,
@@ -479,7 +480,6 @@ def _highly_variable_genes_single_batch(
     return df
 
 
-
 def highly_variable_genes(
     adata: AnnData,
     layer: Optional[str] = None,
@@ -493,7 +493,9 @@ def highly_variable_genes(
     theta: float = 100,
     clip: Union[Literal['auto', 'none'], float] = 'auto',
     chunksize: int = 1000,
-    flavor: Literal['seurat', 'cell_ranger', 'seurat_v3', 'pearson_residuals'] = 'seurat',
+    flavor: Literal[
+        'seurat', 'cell_ranger', 'seurat_v3', 'pearson_residuals'
+    ] = 'seurat',
     subset: bool = False,
     inplace: bool = True,
     batch_key: Optional[str] = None,
@@ -651,21 +653,20 @@ def highly_variable_genes(
     if flavor == 'pearson_residuals':
         if n_top_genes is None:
             raise ValueError(
-            "`pp.highly_variable_genes` requires the argument `n_top_genes`"
-            " for `flavor='pearson_residuals'`"
+                "`pp.highly_variable_genes` requires the argument `n_top_genes`"
+                " for `flavor='pearson_residuals'`"
             )
         return _highly_variable_pearson_residuals(
             adata,
-            layer = layer,
-            n_top_genes = n_top_genes,
-            batch_key = batch_key,
-            theta = theta,
-            clip = clip,
-            chunksize= chunksize,
-            subset = subset,
-            inplace = inplace,
+            layer=layer,
+            n_top_genes=n_top_genes,
+            batch_key=batch_key,
+            theta=theta,
+            clip=clip,
+            chunksize=chunksize,
+            subset=subset,
+            inplace=inplace,
         )
-        
 
     if batch_key is None:
         df = _highly_variable_genes_single_batch(

...skipping 1 line
 
             # Add 0 values for genes that were filtered out
             missing_hvg = pd.DataFrame(
-                np.zeros((np.sum(~filt), len(hvg.columns))), columns=hvg.columns,
+                np.zeros((np.sum(~filt), len(hvg.columns))),
+                columns=hvg.columns,
             )
             missing_hvg['highly_variable'] = missing_hvg['highly_variable'].astype(bool)
             missing_hvg['gene'] = gene_list[~filt]

LuckyMD · 2021-03-09T18:20:50Z

Just a quick note as I just read up on this... I think we have set up black with a line length of 88 chars, and not 79 at the moment.

ivirshup · 2021-03-10T03:06:03Z

Ah I think I see the issue! Feature branches should be based off master and directing the pull request there! I think what's happening is that a pre-commit hook was installed, but the config only exists on the master branch.

I think this should largely be manageable by rebasing onto master (e.g. git rebase --onto master 1.7.x) and changing the branch the PR is targeting via the github interface:

Side note: We're considering separating the highly_variable_genes interface into multiple functions, since the arguments to the different methods don't always overlap in meaningful or intuitive ways. There's nothing you need to do about this right now, but just a heads up to keep the logic for this method separate from the main function.

…ation and hvg selection

…al() from scverse#1667

…rson_residuals() as well

jlause · 2021-03-10T14:14:22Z

Ah I think I see the issue! Feature branches should be based off master and directing the pull request there! I think what's happening is that a pre-commit hook was installed, but the config only exists on the master branch.

I think this should largely be manageable by rebasing onto master (e.g. git rebase --onto master 1.7.x) and changing the branch the PR is targeting via the github interface:

Thanks a lot, I rebased and changed the PR target to master so I hope everything is on track now!
The pre-commit style checks were working as expected now (auto-edits only in the files / parts I edited).

Side note: We're considering separating the highly_variable_genes interface into multiple functions, since the arguments to the different methods don't always overlap in meaningful or intuitive ways. There's nothing you need to do about this right now, but just a heads up to keep the logic for this method separate from the main function.

Sounds good!

codecov · 2021-03-10T14:22:13Z

Codecov Report

Merging #1715 (970b0fa) into master (0728d55) will increase coverage by 0.48%.
The diff coverage is 96.56%.

@@            Coverage Diff             @@
##           master    #1715      +/-   ##
==========================================
+ Coverage   71.46%   71.95%   +0.48%     
==========================================
  Files          92       98       +6     
  Lines       11305    11531     +226     
==========================================
+ Hits         8079     8297     +218     
- Misses       3226     3234       +8

Impacted Files	Coverage Δ
scanpy/preprocessing/__init__.py	`100.00% <ø> (ø)`
scanpy/preprocessing/_recipes.py	`92.59% <ø> (ø)`
scanpy/experimental/pp/_normalization.py	`93.82% <93.82%> (ø)`
scanpy/experimental/pp/_highly_variable_genes.py	`97.08% <97.08%> (ø)`
scanpy/__init__.py	`100.00% <100.00%> (ø)`
scanpy/experimental/__init__.py	`100.00% <100.00%> (ø)`
scanpy/experimental/_docs.py	`100.00% <100.00%> (ø)`
scanpy/experimental/pp/__init__.py	`100.00% <100.00%> (ø)`
scanpy/experimental/pp/_recipes.py	`100.00% <100.00%> (ø)`
scanpy/preprocessing/_highly_variable_genes.py	`94.27% <100.00%> (+0.02%)`	⬆️
... and 1 more

giovp

thank you @jlause for the PR! This is really exciting and super useful!
This is a first round of review, most comments are re types, args and function behviour. I think it looks really good overall and maybe it's time to start writing tests ?
please let me know if anything unclear and also thanks in advance for code explanations!

scanpy/preprocessing/_highly_variable_genes.py

scanpy/preprocessing/_normalization.py

giovp · 2021-03-12T10:38:29Z

scanpy/preprocessing/_normalization.py

+def normalize_pearson_residuals(
+ adata: AnnData,
+ theta: float = 100,
+ clip: Union[Literal['auto', 'none'], float] = 'auto',


same as above re clipping

giovp · 2021-03-12T10:39:20Z

scanpy/preprocessing/_normalization.py

+def normalize_pearson_residuals_pca(
+ adata: AnnData,
+ theta: float = 100,
+ clip: Union[Literal['auto', 'none'], float] = 'auto',


scanpy/preprocessing/_normalization.py

scanpy/preprocessing/_recipes.py

jlause · 2021-03-12T11:37:14Z

thank you @jlause for the PR! This is really exciting and super useful!
This is a first round of review, most comments are re types, args and function behviour. I think it looks really good overall and maybe it's time to start writing tests ?
please let me know if anything unclear and also thanks in advance for code explanations!

Hey @giovp ,
thanks a lot for the review, this looks very helpful! I'll address the single points above one-by-one and make the required changes over the next few days! Will also add some first tests - are there formal guidelines what you expect to be tested? After looking at the tests for highly_variable_genes, from my naive perspective I'd test the following:

tests that check if combinations of input arguments lead to expected output (in terms of returned shapes/columns/...) and don't break the function
tests that check if warnings/errors are raised for "common mistakes" (inappropriate data, nonsense input argument combinations..)

Any advice/ideas what else should be tested?

giovp · 2021-03-12T12:32:25Z

tests that check if combinations of input arguments lead to expected output (in terms of returned shapes/columns/...) and don't break the function
tests that check if warnings/errors are raised for "common mistakes" (inappropriate data, nonsense input argument combinations..)

yes both makes sense, it would also be useful to come up with a dummy example for which the actual output could be tested against. This is done in seurat_v3 for instance, but in that case it's kind of straightforward because the "expected" is the output computed with original implementation (and as you catched in #1732 it's still might not be enough 😄 ).
another random thing that comes to mind re this specific case is to make sure that indexing etc. is consistent and robust, as you seem to have to sort and resort a fair bit in the hvg implementation.

on another note, I was thinking if it makes sense to also release a short tutorial together with the PR (that would be on theislab/scanpy_tutorials) ? I think that for a lot of people the term "pearson residuals" could be alienating, and so they'd rather stick to normalize_total for comfort (but they shouldn't!). So maybe just something easy like pearson res norm + umap and hvg plots ? curious to hear what you and the others @ivirshup @LuckyMD think about it.

…m and warning as in scverse#1642

giovp · 2022-02-28T15:29:52Z

for normalize_pearson_residual, i think it makes sense to keep normalize in, as it's not the same type of transformation compared to log1p.

Isn't this quite similar to what log1p does though? In that it's a transformation of the matrix?

I think it should stay normalize_pearson_residuals because it mirrors normalize_total.

for the rest, I think we are at a good stage, I'd ask @jlause to build docs locally cd scanpy/docs and then make clean and make html see https://scanpy.readthedocs.io/en/stable/dev/documentation.html#building-the-docs
and check that:

arguments and doc params match
typo and other minor issues still present (e.g. difficult phrasing).

if this gets approval, before merging to master todo:

add release note
go over scanpy_tutorials and re run tutorial and merge
link tutorial in docs

p.s. docs are failing for reasons I have haven't figured out yet

jlause · 2022-03-11T17:52:13Z

Hi!

for normalize_pearson_residual, i think it makes sense to keep normalize in, as it's not the same type of transformation compared to log1p.

Isn't this quite similar to what log1p does though? In that it's a transformation of the matrix?

I think it should stay normalize_pearson_residuals because it mirrors normalize_total.

I agree.

for the rest, I think we are at a good stage, I'd ask @jlause to build docs locally cd scanpy/docs and then make clean and make html see https://scanpy.readthedocs.io/en/stable/dev/documentation.html#building-the-docs and check that:
* arguments and doc params match

* typo and other minor issues still present (e.g. difficult phrasing).

I started doing that and will finish up tomorrow - there Qs in advance if you happen to look at this before:

sometimes we have math expressions like var = mean * mean^2 etc. in the docs. Is there a convention for scanpy docs if those should be in code format or just plain text? e.g. in the adata docstring the matrix shape is described as n_obs x n_var, but elsewhere we say "clipping is done by sqrt(n). I can consistently format them into code if you agree.
I think the .._pca function is missing from the release note. should I add it there?
The ..pca function also did not use shared docs params yet. I started adding them and can commit tomorrow - is that okay if I just do it like that?

if this gets approval, before merging to master todo:
* [x]  add release note

* [ ]  go over scanpy_tutorials and re run tutorial and merge

I've looked at that and commented in the respective github :)

Very happy we are getting this wrapped up now :)
Best! Jan

giovp · 2022-03-11T18:28:47Z

sometimes we have math expressions like var = mean * mean^2 etc. in the docs. Is there a convention for scanpy docs if those should be in code format or just plain text?

I'm not sure, I think with math is nicer but not aware of any convention. @ivirshup ?

I think the .._pca function is missing from the release note. should I add it there?
The ..pca function also did not use shared docs params yet. I started adding them and can commit tomorrow - is that okay if I just do it like that?

must say I missed those sorry, feel free to add and I'll take a look again tomorrow and wrap it up.

jlause · 2022-03-12T12:56:08Z

Hi, I worked over the docs completely once but still need to do a few small things (release note, final spell check) but need to leave my desk now - will do the final bits either later today or tomorrow!
Best jan

jlause · 2022-03-13T00:48:30Z

Hi @giovp ,
I'm done from my side of things: I have re-worded some parts of the docstrings (hopefully to better readability ;) ), added the missing function to the release note and tried to make the returns sections of the docs a bit more consistent.

Also, it seems that building the docs is failing again on github (locally it works with some warnings). Again I'm not sure why / if it is even related to my changes 🤔

Let me know if I can help with fixing that or if anything else comes up!

Best, Jan.

giovp · 2022-03-16T18:54:44Z

ok tutorial is merged, you can have a look how it renders here: https://scanpy-tutorials.readthedocs.io/en/latest/tutorial_pearson_residuals.html

I've fixed the tutorial.rst page and the release note. To me it looks good, I'd like to get @ivirshup approval on this before merging.

I'm done from my side of things: I have re-worded some parts of the docstrings (hopefully to better readability ;) ), added the missing function to the release note and tried to make the returns sections of the docs a bit more consistent.

really clear and coincise btw, great job

ivirshup · 2022-03-29T17:01:00Z

LGTM

Thanks for pushing and getting this through! And thanks for picking up the review on this @giovp!

giovp · 2022-03-29T17:27:31Z

thank you @ivirshup !

jlause · 2022-03-30T06:16:09Z

nice! thanks to both of you @giovp @ivirshup :)

Starlitnightly · 2023-06-06T10:33:30Z

Hi, @jlause

There's a issue when using normalize_pearson_residuals, it seems that we can't calculated the log2foldchange in rank_genes_groups will be failed. That's because np.expm1 can't restore the adata.X after normalize_pearson_residuals. Could you solve this issue that completed the downstream currently?

jlause mentioned this pull request Mar 4, 2021

Any interest in getting SCTransform up on Scanpy? #1643

Open

5 tasks

jlause force-pushed the pearson_residuals_1.7 branch from b9f0424 to 76073cc Compare March 5, 2021 14:24

jlause mentioned this pull request Mar 9, 2021

sc.pp.highly_variable_genes with flavor='seurat_v3' returns wrong means/variances when batch_key argument is used #1725

Closed

3 tasks

jlause added 7 commits March 10, 2021 14:42

adding core functions and documentation for pearson residual normaliz…

12e751d

…ation and hvg selection

adding Pearson residual+PCA bundles, minor bug fixes

5d57961

some style cleanup, minor fixes

fced3f2

adapting _normalize_pearson_residuals() to cleaned-up _normalized_tot…

977b6cf

…al() from scverse#1667

updating layer management as in scverse#1667 for _highly_variable_pea…

d8d724c

…rson_residuals() as well

slight performance improvement for sparse input

e23ea6c

style cleanup

fc49c25

jlause force-pushed the pearson_residuals_1.7 branch from 5b7dd99 to fc49c25 Compare March 10, 2021 14:07

jlause changed the base branch from 1.7.x to master March 10, 2021 14:08

gokceneraslan mentioned this pull request Mar 11, 2021

More hierarchical API #1739

Open

giovp requested changes Mar 12, 2021

View reviewed changes

giovp mentioned this pull request Mar 12, 2021

Drop python 3.6 for 1.8 #1697

Closed

jlause added 3 commits March 12, 2021 16:18

fixing import issue, fixing docstring style, adding check_values para…

f91f2fe

…m and warning as in scverse#1642

fixed small NameError, simplified clip argument

60de21d

remove pd.categorical()

1f86989

giovp added 2 commits February 28, 2022 15:43

fix precommit

bdd37cd

fix docs

aba3906

giovp added 6 commits March 9, 2022 21:30

fix reference and add notebook to tutorials

c9dbf48

add release note

e335966

add release note

bf7fb25

fix release note

1045d98

typo

f7d4c49

remove duplicate reference

5f76cdf

fixing black flake etc requirements

19b018c

jlause added 2 commits March 13, 2022 00:42

add _pca function to release note

ce9ee43

last edits to docs

7ffdec3

jlause mentioned this pull request Mar 15, 2022

Gene selection plot for Pearson Residuals #2176

Open

fix release and tutorial image

a0aaf96

giovp added 2 commits March 16, 2022 19:56

Merge branch 'master' into pearson_residuals_1.7

e7e92a1

try fix pre-commit

ad81e29

giovp approved these changes Mar 16, 2022

View reviewed changes

giovp and others added 3 commits March 16, 2022 20:43

minor docs

d74a0e6

Merge branch 'master' into pearson_residuals_1.7

92d675c

Remove accidentally included files from merge

970b0fa

ivirshup merged commit 636316d into scverse:master Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Normalization and gene selection by analytical Pearson residuals #1715

Normalization and gene selection by analytical Pearson residuals #1715

jlause commented Mar 3, 2021 •

edited

Loading

giovp commented Mar 4, 2021

jlause commented Mar 4, 2021

ivirshup commented Mar 5, 2021

jlause commented Mar 5, 2021 •

edited

Loading

ivirshup commented Mar 7, 2021

jlause commented Mar 9, 2021 •

edited by ivirshup

Loading

LuckyMD commented Mar 9, 2021

ivirshup commented Mar 10, 2021 •

edited

Loading

jlause commented Mar 10, 2021 •

edited

Loading

codecov bot commented Mar 10, 2021 •

edited

Loading

giovp left a comment

giovp Mar 12, 2021

jlause Mar 12, 2021

giovp Mar 12, 2021

jlause Mar 12, 2021

jlause commented Mar 12, 2021

giovp commented Mar 12, 2021

giovp commented Feb 28, 2022 •

edited

Loading

jlause commented Mar 11, 2022

giovp commented Mar 11, 2022

jlause commented Mar 12, 2022

jlause commented Mar 13, 2022

giovp commented Mar 16, 2022 •

edited

Loading

ivirshup commented Mar 29, 2022

giovp commented Mar 29, 2022

jlause commented Mar 30, 2022

Starlitnightly commented Jun 6, 2023 •

edited

Loading

Normalization and gene selection by analytical Pearson residuals #1715

Normalization and gene selection by analytical Pearson residuals #1715

Conversation

jlause commented Mar 3, 2021 • edited Loading

giovp commented Mar 4, 2021

jlause commented Mar 4, 2021

ivirshup commented Mar 5, 2021

jlause commented Mar 5, 2021 • edited Loading

ivirshup commented Mar 7, 2021

jlause commented Mar 9, 2021 • edited by ivirshup Loading

LuckyMD commented Mar 9, 2021

ivirshup commented Mar 10, 2021 • edited Loading

jlause commented Mar 10, 2021 • edited Loading

codecov bot commented Mar 10, 2021 • edited Loading

Codecov Report

giovp left a comment

Choose a reason for hiding this comment

giovp Mar 12, 2021

Choose a reason for hiding this comment

jlause Mar 12, 2021

Choose a reason for hiding this comment

giovp Mar 12, 2021

Choose a reason for hiding this comment

jlause Mar 12, 2021

Choose a reason for hiding this comment

jlause commented Mar 12, 2021

giovp commented Mar 12, 2021

giovp commented Feb 28, 2022 • edited Loading

jlause commented Mar 11, 2022

giovp commented Mar 11, 2022

jlause commented Mar 12, 2022

jlause commented Mar 13, 2022

giovp commented Mar 16, 2022 • edited Loading

ivirshup commented Mar 29, 2022

giovp commented Mar 29, 2022

jlause commented Mar 30, 2022

Starlitnightly commented Jun 6, 2023 • edited Loading

jlause commented Mar 3, 2021 •

edited

Loading

jlause commented Mar 5, 2021 •

edited

Loading

jlause commented Mar 9, 2021 •

edited by ivirshup

Loading

ivirshup commented Mar 10, 2021 •

edited

Loading

jlause commented Mar 10, 2021 •

edited

Loading

codecov bot commented Mar 10, 2021 •

edited

Loading

giovp commented Feb 28, 2022 •

edited

Loading

giovp commented Mar 16, 2022 •

edited

Loading

Starlitnightly commented Jun 6, 2023 •

edited

Loading