Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Seurat/MULTI-seq #6168

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

Add Seurat/MULTI-seq #6168

wants to merge 37 commits into from

Conversation

mari-ga
Copy link
Contributor

@mari-ga mari-ga commented Aug 13, 2024

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@mari-ga mari-ga requested a review from a team as a code owner August 13, 2024 13:56
@mari-ga mari-ga requested review from LeuThrAsp and removed request for a team August 13, 2024 13:56
@mari-ga
Copy link
Contributor Author

mari-ga commented Aug 13, 2024

  1. MULTI-seq is a hashing demultiplexing module from Seurat. I created the structure Seurat/Multi-seq because I intend to add another hashing demultiplexing module to the folder, which is also the property of Seurat (HTODemux), so both tools from the same library are located in the same place. Is that right?
  2. These modules need many arguments to run its different functionalities, which I added in the nextflow.config file, any suggestion about this? especially, regarding the tests
  3. also the code used to generate the plots for both tools is very very similar, any suggestion about it?

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly recommend some tidying up.

I always worry a little about smushing lots of custom stuff (the plots here) into a module that should really be as thin-as-possible a wrapper around some underlying function. But might be a me thing.

Also just wanted to flag the efforts that have gone on to build CLI parts for Seurat, in case you're interested.


# All values from ext.args are stored as strings
# Function to transform strings to the correct class
convert_element <- function(x) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the functions to the top, with the other function? Would help with readability.

Also, add the proper roxygen-style function documentation to all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with you on having a wrapper as thin as possible, I'll check if some parts of the CLI can be applied here, I also thought from the beginning that the block of code for the plots could be too large for the module, however splitting the code would imply adding an "intermediate layer" in which we would obtain the results from demultiplexing as an RDS object and produce plots, the problem is that not all plots included in the tool are available under the CLI

modules/nf-core/seurat/multiseq/templates/MULTIseq.R Outdated Show resolved Hide resolved
modules/nf-core/seurat/multiseq/templates/MULTIseq.R Outdated Show resolved Hide resolved
modules/nf-core/seurat/multiseq/templates/MULTIseq.R Outdated Show resolved Hide resolved
modules/nf-core/seurat/multiseq/templates/MULTIseq.R Outdated Show resolved Hide resolved
modules/nf-core/seurat/multiseq/templates/MULTIseq.R Outdated Show resolved Hide resolved
then {
assertAll(
{ assert process.success },
{ assert path(process.out.assignment.get(0).get(1)).exists() },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there nothing stable in this file to check other than its existence?

Copy link
Contributor Author

@mari-ga mari-ga Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a csv file that contains different values for each sample, I tried with { assert path(process.out.classification.get(0).get(1)).exists() } but any idea of another test that might be useful here?

modules/nf-core/seurat/multiseq/tests/main.nf.test Outdated Show resolved Hide resolved
modules/nf-core/seurat/multiseq/tests/main.nf.test Outdated Show resolved Hide resolved
Comment on lines 3 to 33
assay = "HTO"
nfeatures = 2000
quantile = 0.7
autoThresh = false
maxiter = 5
qrange_from = 0.1
qrange_to = 0.9
qrange_by = 0.05
verbose = true
selection_method = "mean.var.plot"
normalization_method = "CLR"

// Parameters to generate plots
group_cells_feature_scatter = "MULTI_ID"
feature_scatter_feature_1 = "MS-11"
feature_scatter_feature_2 = "MS-12"
number_of_features_ridge_plot = 2
number_of_cols_ridge_plot = 2
group_cells_violin_plot = "MULTI_classification"
features_violin_plot = "nCount_RNA"
pt_size = 0.1
log = true
subset_idents = "Negative"
subset_invert = true
tsne_scale_data_verbose = false
run_pca_approx = false
run_tsne_dim_max = 2
run_tsne_perplexity = 100
check_duplicates_tsne = false
resolution = 0.6
singlet_identities_tsne = "MULTI_classification"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these all optional? Does it work without this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all the parameters before // Parameters to generate plots, the tool has the values by default.
After // Parameters to generate plots, some names such as feature_scatter_feature_1 = "MS-11" depend on the data and, therefore, must be given, if the option produce_plots is true, should I include those parameters as inputs in the main.nf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants