Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new input to multiqc module for use with --rename_samples #5973

Merged
merged 6 commits into from
Jul 12, 2024

Conversation

pinin4fjords
Copy link
Member

@pinin4fjords pinin4fjords commented Jul 12, 2024

Supplying a TSV to multiqc with --rename-samples allows you to substitute sample identifiers to allow for consistency across a report.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@MatthiasZepper
Copy link
Member

MatthiasZepper commented Jul 12, 2024

Given how frequently some form of name cleaning is required, I think it is very useful to have a separate channel to supply the alternative names in a straightforward manner. (In contrast to collecting a Nextflow channel with the sample names into a file, referencing that path in a config file and mixing that file into the extra_multiqc_config channel).

However, I think that the --sample_names argument will be used almost as frequently as --rename_samples.

Hence, I wonder if it would be advisable to change the channel from a path channel to a map like [ val(argument), path(config) ]?

Inside the module, instead of

def replace = replace_names ? "--replace-names ${replace_names}" : ''

we would have something along the line

def replace = sampleMap ? sampleMap.collect { argument, value -> "--${argument} ${value}"}.join(' ') : ''

which would work on an input channel like

def sampleMap = [
    replace_names: '/path/to/sample_aliases.tsv'
]

In that case, when using the MultiQC module, the pipeline authors could .map the path to the argument they wish to use?

Basically what we already did in the rnaseq pipeline for the extra_args, just this time inside the module?

@pinin4fjords
Copy link
Member Author

@MatthiasZepper I'm not convinced, that's not a pattern I've seen elsewhere in nf-core. Can you imagine explaining that in the meta.yml? If we need another file input I think we should have another file input- I'll add it now

@MatthiasZepper
Copy link
Member

I'm not convinced, that's not a pattern I've seen elsewhere in nf-core.

But maybe it should become a more common pattern? I agree that this requires a larger discussion, e.g. on the #modules channel prior to merging it into such an important module, but I have always felt that the ext.args pattern is hard to wrap your head around, in particular for end users of the pipelines.

I do not recall how often I have explained how to add extra arguments to a tool on various Slack channels (many times) and the extra_star_align_args & Co. parameters are really popular in the rnaseq pipeline - at least I have seen them quite often being used in param files.

Having a key -value map that is consolidated on a module level with the ext.args could be really neat for pipeline developers to support such extra_[...]_arg pipeline parameters without much headache. It would also help to avoid duplication, if e.g. the same module is used in two different subworkflows that represent two routes of a pipeline. It would become possible to mix in their specific settings into a default map that applies to both.

Can you imagine explaining that in the meta.yml? If we need another file input I think we should have another file input- I'll add it now

I have not yet thought about that problem, but it seems solvable to me. If the code itself is brief and works reliably, it should not be a dealbreaker to describe that functionality schematically.

What worries me more is duplicating the boilerplate consolidation code in every module. I guess that should then be some central function like check_max(). for the configs.

I'll add it now

Fair enough. Hardcoding too many input channels to me also doesn't seem to be desirable, but I guess two additional ones would work.

@pinin4fjords
Copy link
Member Author

Yep, I agree there are things we could improve in the conventions, but that that's a discussion for elsewhere.

Merging for now.

@pinin4fjords pinin4fjords added this pull request to the merge queue Jul 12, 2024
Merged via the queue into master with commit b80f5fd Jul 12, 2024
12 checks passed
@pinin4fjords pinin4fjords deleted the multiqc_rename_input branch July 12, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants