-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possibly incorrect determination of the frame #190
Comments
What's the transcript that Arriba reports in column |
ENST00000530298 is the transcript_id2 which is a noted in Ensembl as a retained intron...which is weird because arriba also says that site2 is CDS. is this an instance of incorrect transcript selection, as you mentioned in your comment? how is a particular transcript prioritized? |
I have taken a close look at this case. The reason why Arriba chooses the intron retention transcript is because there is some intron retention going on. Namely, the bases Arriba chooses the transcript based on the expression level and splice patterns. Basically, it assembles a transcript sequence from the fusion-supporting reads. When there are multiple transcripts being expressed at the same time, it picks the one with the highest expression. Then, it tries to match this transcript sequence to the most closely matching annotated transcript ID. In this case, Arriba comes to the conclusion that it's the transcript ID with intron retention, because of a small stretch of intron being transcribed. This is a bit of an ambiguous case, because there is both intronic transcription and splicing. The transcribed bases do not match any annotated transcript perfectly. And although it's not entirely wrong of Arriba to choose an intron retention transcript, it could have chosen one of the transcripts without intron retention, and it would be as good of a match. I suspect that Arriba did not determine the highest expressed transcript correctly. This may be the real mistake here. Can you send me an IGV screenshot of the region |
arriba.zip is this igv report ok for checking out the reads? from what i can see in the alignment, several of the split reads from the fusion are skipping the intronic regions. but i'm not practiced at looking at these. |
Apologies for the long delay and also for the fact that I still have not come up with a solution yet despite the long time. However, your provided data was very helpful. I can confirm that Arriba did indeed pick the wrong transcript and that this is the underlying cause of the wrong reading frame. I will need some time to play with this example and see if I can improve the transcript selection. My problem is that I have hardly any spare time for this at the moment due to many other obligations. I will get back to you as soon as I can. |
We have a fusion called SCP2-NTRK1 and we are curious as to why the fusion is being called as out of frame. The nt sequence as reported by arriba is as follows:
The sequence between the breakpoint
|
the last___
before it is actually intron sequence (site1 == intron and site2 == CDS). The aa sequence that arriba predicts is as follows:we are a bit confused because the protein sequence after the breakpoint looks to be in-frame. so i'm just wondering what about this causes arriba to say it is out-of-frame.
by the way, we are using 2.3.0
The text was updated successfully, but these errors were encountered: