Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of de-duplicated reads indicated by filterdup log does not match number of lines in output BED file. #650

Open
callum-b opened this issue Jun 20, 2024 · 0 comments

Comments

@callum-b
Copy link

Unsure if this should be listed as bug or question, but it feels like something isn't working properly to me so I listed it as bug.

From the docs:
"The filterdup command takes an input alignment file and produces an output file in BED format with duplicate reads removed according to the setting."

I ran macs3 filterdup -f BAM --keep-dup=1 -i my/file.bam -o my/file_filterdup.bed

The logs indicate that there are 41436468 , but the output BED file is 41436476 lines long (using wc -l). As I understand, these two values should match.

BAM file used (expires 5th of July 2024): https://filesender.renater.fr/?s=download&token=4490ed9b-04d3-4e4a-aafa-60afca608c9c
Its index was generated with default params by samtools index.

Where are these 8 extra lines coming from?

  • OS: Ubuntu 20.04.6 LTS
  • Python version 3.10.14
  • Numpy version 1.26.4
  • MACS Version 3.0.1

PS: I just ran the command on two other BAM files, getting 6 and 8 lines difference respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant