-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistencies across intron/exon boundaries #655
Comments
@cassiemk @reece can likely explain this better, but I will try to give it a shot! You are correct; the package does return no protein change for converted sequence variants based on the nomenclature when offsets are provided, like in your top 3 examples. I believe this is working as designed, because we cannot guarantee that every splice site/region will be unaffected/remain intact by intronic variants. I am unsure if there are plans to change this or add edge cases for specific variants where the coding regions are not affected. I will defer to Reece to comment on that. |
@katiestahl Currently, the behavior is inconsistent. the 4 examples from the original issue are all insertions right at the boundary. 3 of them are treated as intronic, and 1 is treated as CDS. And the difference seems to be arbitrary, based on whether the cdot nomenclature includes an intronic position or not. So a conscious decision has not yet been made. We have a developer working on updating the logic so that insertions at the boundary are treated as CDS, and were planning a pull request sometime soon once we get all of our tests passing. This seems to be the more common choice, and is the choice that our users seem to expect. So the immediate task would be to see if we can come to alignment about which decision is most appropriate for insertions right at the boundary. As far as I can tell, HGVS (the society) doesn't have any guidance on this situation (they don't talk much about the right decisions to make for edge cases when projecting DNA changes onto transcripts). Reasons we think these insertions should be treated as part of the coding region:
|
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
This issue was closed because it has been stalled for 7 days with no activity. |
Would it be possible to re-open this issue? It is a flaw with a PR out to fix it. |
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days. |
We have a number of variants at the intron/exon or exon/intron boundary that return no protein change that we believe should be treated as coding because the splice site & region remain completely intact but return no var_p.
In [1]: hgvs_c = "NM_004380.2:c.3251-1dup"
In [2]: var_c = parse(hgvs_c)
In [3]: c_to_p(var_c)
Out[3]: SequenceVariant(ac=NP_004371.2, type=p, posedit=None, gene=None)
In [4]: hgvs_c = "NM_004380.2:c.3250_3250+1insT"
In [5]: var_c = parse(hgvs_c)
In [6]: c_to_p(var_c)
Out[6]: SequenceVariant(ac=NP_004371.2, type=p, posedit=None, gene=None)
In [7]: hgvs_c = "NM_004380.2:c.3251-1_3251insA"
In [8]: var_c = parse(hgvs_c)
In [9]: c_to_p(var_c)
Out[9]: SequenceVariant(ac=NP_004371.2, type=p, posedit=None, gene=None)
While other variants at the boundary return a protein change.
In [10]: hgvs_c = "NM_004380.2:c.3251dup"
In [11]: var_c = parse(hgvs_c)
In [12]: c_to_p(var_c)
Out[12]: SequenceVariant(ac=NP_004371.2, type=p, posedit=(Phe1085LeufsTer2), gene=None)
It seems like it's deciding if it's coding or not based on the var_c nomenclature (the presence of +/-1 in this case) rather than biology.
The text was updated successfully, but these errors were encountered: