Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies across intron/exon boundaries #655

Open
cassiemk opened this issue May 23, 2023 · 6 comments · May be fixed by #748
Open

Inconsistencies across intron/exon boundaries #655

cassiemk opened this issue May 23, 2023 · 6 comments · May be fixed by #748
Labels
keep alive exempt issue from staleness checks

Comments

@cassiemk
Copy link

We have a number of variants at the intron/exon or exon/intron boundary that return no protein change that we believe should be treated as coding because the splice site & region remain completely intact but return no var_p.

In [1]: hgvs_c = "NM_004380.2:c.3251-1dup"
In [2]: var_c = parse(hgvs_c)
In [3]: c_to_p(var_c)
Out[3]: SequenceVariant(ac=NP_004371.2, type=p, posedit=None, gene=None)

In [4]: hgvs_c = "NM_004380.2:c.3250_3250+1insT"
In [5]: var_c = parse(hgvs_c)
In [6]: c_to_p(var_c)
Out[6]: SequenceVariant(ac=NP_004371.2, type=p, posedit=None, gene=None)

In [7]: hgvs_c = "NM_004380.2:c.3251-1_3251insA"
In [8]: var_c = parse(hgvs_c)
In [9]: c_to_p(var_c)
Out[9]: SequenceVariant(ac=NP_004371.2, type=p, posedit=None, gene=None)

While other variants at the boundary return a protein change.
In [10]: hgvs_c = "NM_004380.2:c.3251dup"
In [11]: var_c = parse(hgvs_c)
In [12]: c_to_p(var_c)
Out[12]: SequenceVariant(ac=NP_004371.2, type=p, posedit=(Phe1085LeufsTer2), gene=None)

It seems like it's deciding if it's coding or not based on the var_c nomenclature (the presence of +/-1 in this case) rather than biology.

@katiestahl
Copy link
Contributor

@cassiemk @reece can likely explain this better, but I will try to give it a shot!

You are correct; the package does return no protein change for converted sequence variants based on the nomenclature when offsets are provided, like in your top 3 examples.

I believe this is working as designed, because we cannot guarantee that every splice site/region will be unaffected/remain intact by intronic variants.

I am unsure if there are plans to change this or add edge cases for specific variants where the coding regions are not affected. I will defer to Reece to comment on that.

@gostachowiak
Copy link

@katiestahl
when there's an insertion right at the intron/exon boundary, there is a choice to make. Should the inserted material be treated as part of the coding region (because the canonical splice site and in fact the entire intron is intact), or as part of the intron (because it is adjacent to the canonical splice site).

Currently, the behavior is inconsistent.

the 4 examples from the original issue are all insertions right at the boundary. 3 of them are treated as intronic, and 1 is treated as CDS. And the difference seems to be arbitrary, based on whether the cdot nomenclature includes an intronic position or not. So a conscious decision has not yet been made.

We have a developer working on updating the logic so that insertions at the boundary are treated as CDS, and were planning a pull request sometime soon once we get all of our tests passing. This seems to be the more common choice, and is the choice that our users seem to expect.

So the immediate task would be to see if we can come to alignment about which decision is most appropriate for insertions right at the boundary. As far as I can tell, HGVS (the society) doesn't have any guidance on this situation (they don't talk much about the right decisions to make for edge cases when projecting DNA changes onto transcripts).

Reasons we think these insertions should be treated as part of the coding region:

  • the canonical splice site, and the entire intron are fully intact
  • it's very difficult to say what actually happens in the cell, and is certainly context dependent, context that we don't have. So it should default to the most "visible" change, i.e. one with an aa-change. For example, to avoid any filters that eliminate non-coding variants

Copy link

github-actions bot commented Dec 8, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Dec 8, 2023
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 16, 2023
@gostachowiak
Copy link

Would it be possible to re-open this issue? It is a flaw with a PR out to fix it.

@reece reece reopened this Dec 18, 2023
@github-actions github-actions bot removed closed-by-stale stale Issue is stale and subject to automatic closing labels Dec 19, 2023
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Mar 19, 2024
@jsstevenson jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing labels Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
5 participants