Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add validation to BAM/FASTQ input #64

Open
bbimber opened this issue Jan 24, 2024 · 0 comments
Open

Add validation to BAM/FASTQ input #64

bbimber opened this issue Jan 24, 2024 · 0 comments
Assignees

Comments

@bbimber
Copy link
Contributor

bbimber commented Jan 24, 2024

My comment is primary a concern for BAMs, but if it can be added to both code paths, great:

  • The input data fed to nimble should be unique pairs of sequence reads, where each readname is represented once per pair.
  • A BAM file is a format to store alignments, which are not strictly the same as reads. A given read can in theory be present more than once in a BAM, if there are two alignments. I dont think cellranger does this, but the BAM would be technically valid if it did.
  • Nimble already sorts the BAMs prior to input based on UMI and read name. It then iterates the BAM.

Would it be practical for the reader code of nimble to remember the name of the last read it encountered, and throw an exception if the next read has the same name? If simple to implement, this would provide cheap insurance against a category of issue that would be easy to have, and hard to identify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants