Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow index format specification in samtools view #4326

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

CharlotteAnne
Copy link
Contributor

adding an option so that if write_indexes is used the user can choose what index type they want generated.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

adding an option so that if write_indexes is used the user can choose what index type they want generated.
@CharlotteAnne
Copy link
Contributor Author

tests are failing, but not sure how they passed before, because of the naming of the test files??

Artifact name is valid!
Error: Artifact path is not valid: /pytest_workflow_d1yqn3zt/bam_split_by_region_test_bam_split_by_region/work/23/0a9a0152fc1b39ea5074ad4ee5717a/test_chr21:1-23[35](https://github.com/nf-core/modules/actions/runs/6853404603/job/18634280320?pr=4326#step:15:36)4000.bam. Contains the following character:  Colon :
          
Invalid characters include:  Double quote ", Colon :, Less than <, Greater than >, Vertical bar |, Asterisk *, Question mark ?, Carriage return \r, Line feed \n
          
The following characters are not allowed in files that are uploaded due to limitations with certain file systems such as NTFS. To maintain file system agnostic behavior, these characters are intentionally not allowed to prevent potential problems with downloads on different file systems.

nvnieuwk
nvnieuwk previously approved these changes May 15, 2024
Copy link
Contributor

@nvnieuwk nvnieuwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't really like this solution, I can't think of a better way to handle this (this is such a weird syntax)

@maxulysse
Copy link
Member

not sure about the ext.index_type, why not go with ext.suffix?

@mahesh-panchal
Copy link
Member

I think we should avoid making any ext.* unless it's really needed. This should be an input: parameter if possible.

@nvnieuwk
Copy link
Contributor

I think we should avoid making any ext.* unless it's really needed. This should be an input: parameter if possible.

I like this solution way better!

@nvnieuwk nvnieuwk dismissed their stale review May 15, 2024 07:43

A better solution has been presented

@SPPearce
Copy link
Contributor

Mmm, I'm generally happy to define additional ext. values when relevant. I agree that it doesn't want to be ext.suffix though.

@mahesh-panchal
Copy link
Member

I guess the question here is, does the user need to be able to set this, or should this be fixed in the workflow?
To me it looks like the latter, and so I would say best not expose anything unnecessarily.

@SPPearce
Copy link
Contributor

I also wondered about whether it should be e.g.${prefix}.bam.baior just ${prefix}.bai.

@mahesh-panchal
Copy link
Member

I also wondered about whether it should be e.g.${prefix}.bam.baior just ${prefix}.bai.

I would say ${prefix}.bam.bai

However I also just realized that index_type should be based on file_type right? It doesn't make sense to have bam.csi

@CharlotteAnne
Copy link
Contributor Author

CharlotteAnne commented May 15, 2024 via email

@mahesh-panchal
Copy link
Member

My last question though was do we even need to have an input for this? Isn't it determined by the file_type?

@CharlotteAnne
Copy link
Contributor Author

CharlotteAnne commented May 15, 2024 via email

@SPPearce SPPearce changed the title Update main.nf Allow index format specification in samtools view May 15, 2024
@SPPearce
Copy link
Contributor

Currently it is set to have a default, that will just generate a .csi index. You don't need to use ext.args.index_type if you don't want to, but it lets you have the choice. It isn't a mandatory input value, and is only used in pretty niche circumstances.

@maxulysse
Copy link
Member

I'd say the basic default should be bai, as csi are mainly used for LARGE bam file index only as far as I know

@CharlotteAnne
Copy link
Contributor Author

+1 on default being bai for bam

@fellen31
Copy link
Contributor

At the same time, the default for --write-index is csi. Just curious, is there any downside using csi for bam?

@CharlotteAnne
Copy link
Contributor Author

CharlotteAnne commented May 15, 2024 via email

@fellen31
Copy link
Contributor

No, not at all. But I was thinking about adding --write-index to some bcftools modules as well, where it would be csi vs tbi.

@SPPearce
Copy link
Contributor

I've never used csi, only bai for everything.

@fellen31
Copy link
Contributor

I tried using csi a bit but just encountered a tool that expects and needs bai, and I'm sure there might be others like it. Maybe bai is the best default.

@SPPearce
Copy link
Contributor

Yeah, bai is best

@fellen31
Copy link
Contributor

I added a val bam_index_extensionto the minimap2/align module, together with:

def bam_index = bam_index_extension ? "${prefix}.bam##idx##${prefix}.bam.${bam_index_extension} --write-index" : "${prefix}.bam"

In the samtools modules it would feel a bit strange to specify the output type in the config and then the index type in the workflow, but I don't really see a way around it without using ext.

Would be great to find a solution though, so the same can be added to other samtools modules.

@SPPearce
Copy link
Contributor

I'm leaning towards the idea of just adding another value channel which determines if the index is made, and if so what type; if empty no index is generated.

@SPPearce
Copy link
Contributor

I'm leaning towards the idea of just adding another value channel which determines if the index is made, and if so what type; if empty no index is generated.

How does this sound to everyone? Using val(index_format) to specify the generation of the index explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

6 participants