-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multimapper filter vs malformed SAM records - how many alignments is "too many"? #243
Comments
Removing too many/few alignments in the context of malformed alignments means getting rid of alignments which Arriba cannot match properly. For example, in paired-end sequencing there should always be a first and a second mate. When Arriba finds two first mates, but only one second mate, the alignments are considered malformed. Another example is a split read with a missing supplementary alignment or two supplementary alignments. There should be exactly one. This has nothing to do with multimapping reads (i.e., secondary alignments). Multimapping reads pass the removal of malformed alignments provided that Arriba can always match the paired-end mates and supplementary alignments. Is this explanation clear? |
Perfectly clear, thank you!
…On Fri, Jun 14, 2024, 4:26 PM suhrig ***@***.***> wrote:
Removing too many/few alignments in the context of malformed alignments
means getting rid of alignments which Arriba cannot match properly. For
example, in paired-end sequencing there should always be a first and a
second mate. When Arriba finds two first mates, but only one second mate,
the alignments are considered malformed. Another example is a split read
with a missing supplementary alignment or two supplementary alignments.
There should be exactly one.
This has nothing to do with multimapping reads (i.e., secondary
alignments). Multimapping reads pass the removal of malformed alignments
provided that Arriba can always match the paired-end mates and
supplementary alignments.
Is this explanation clear?
—
Reply to this email directly, view it on GitHub
<#243 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AXWGOOTKA4GLINI2R35BTK3ZHNGX5AVCNFSM6AAAAABJK5EKKOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRYG4YTMMRYHE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
*The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you received
this in error, please contact the sender and delete the material. Thank
you.*
|
Actually I did have one further question regarding the malformed alignments; in the samples I'm currently running, I believe Arriba is reporting an abnormal amount of malformed SAM entries; around 38M malformed entries per 100M reads (~85% of these reads are properly mapped with STAR). Firstly, is this an expected number of malformed entries, and second, is there some way to determine where the malformed alignments are coming from? I'm using a custom reference that has a few additional contigs with heavily duplicated sequence, could reads mapping to these contigs be throwing Arriba off? |
If you're using STAR, there should be only a small percentage of malformed alignments. Which version do you use? What are your alignment parameters? I can prepare a debug version of Arriba to get some stats about why the reads are discarded as malformed. |
I'm running STAR with the same parameters as in the demo workflow, but outputting to a separate BAM. I can provide the STAR/Arriba log files if that would be helpful. |
If you're using the parameters from the demo, then that's not the problem. Which version of STAR do you use? |
I'm on STAR version: 2.7.11b |
I see, thank you. Indeed, Arriba reports a negative |
Yes, probably. It should certainly make the warning about malformed alignments go away and could improve the detection rate slightly. |
Adapter trimming with skewer successfully reduced the number of malformed alignments reported by Arriba to <0.01% of the aligned reads. Thank you for your help! |
Sorry if this isn't the right place to ask this, but in the Event-level Filters section of the Arriba readthedocs, it states for multimappers that "Multi-mapping reads are reduced to a single alignment. For this purpose, only the alignment with the best alignment score is retained."
However, in read_chimeric_alignments.cpp, the comment above remove_malformed_alignments states "remove alignments when supplementary flags are missing or when there are too many/few alignment records".
My question is: how many is "too many alignment records" for a SAM record to be considered malformed, and thus not considered, vs getting passed to the multimapper filter?
Thank you for the help!
The text was updated successfully, but these errors were encountered: