Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAF2.2 Documentation, issue for col 17 #1838

Open
hattrill opened this issue Apr 21, 2022 · 1 comment
Open

GAF2.2 Documentation, issue for col 17 #1838

hattrill opened this issue Apr 21, 2022 · 1 comment

Comments

@hattrill
Copy link
Contributor

The documentation for column 17 is somewhat incomplete and, in places, ambiguous.

I think that the explicit intention for this field is to show that that biological property is associated with a specific isoform/proteoform, which is how thw GAF2.2 specs read

In the column 17 wiki doc , there are some confusing and unfinished statements:

"Col 17: Spliceform

This is the meat of the proposal. A new column is introduced for representing the specific spliceform of the gene product to which the annotation in col 5 (GO ID) applies to.

This column is optional. Where no information is known about the specific gene product spliceform, the column may be blank. [I think this should say "should be blank" as it needs to be a statement relating to a specific isoform, that can be clearly identified and labelled]

If this column is present, then the referenced spliceform must be a gene product of the gene referenced in col 2 (if col 2 has a reference to an abstract protein then the spliceform must be a gene product of the gene that bears a 1-1 relation to the abstract protein)

The meaning of a line in a GAF now becomes:

The entity in col 17 has either the function or localization indicated by the GO ID
The gene referenced in col 2 encodes the gene product in col 17

This allows greater specificity in annotation. It is technically not a change in semantics, as the gene in col 2 was always intended as a proxy for referencing the gene product.

Thus col 17 can be ignored with only a loss of specificity, not correctness

The identifier used in col 17 must be a standard 2-part global identifier, see Identifiers

This identifier should be stable and dereferenceable in the usual way. For example, if UniProtKB:QVCS5-1 is the spliceform ID then there must be a stable UniProtKB record with this ID, with its own web page

[this section with 'issues' is redundant and confusing]
**Issues:

Can this be blank? YES. If we don't know the isoform involved. _[should be in bulk text]_
Can the generic UniProtKB protein ID go in here? YES. If we don't know the specific isoform but we know the parent UniProtKB we can put this in here.** [_really, that should be 'no' - the gpi file can be used for mapping to the parent UniProtKB ID - this is only be used for isoforms, modified forms]_
**Can non-UniProtKB IDs go in here?"**  _[this is left unanswered, in the [GAF2.2 doc](https://github.com/geneontology/geneontology.github.io/blob/issue-go-annotation-2917-gaf-2_2-doc/_docs/go-annotation-file-gaf-format-22.md), it states:

"The identifier used must be a standard 2-part global identifier, e.g. UniProtKB:OK0206-2

When the Gene Product Form ID is filled with a protein identifier, the value in DB Object Type (column 12) must be protein. Protein identifiers can include UniProtKB accession numbers, NCBI NP identifiers or Protein Ontology (PRO) identifiers.
When the Gene Product Form ID is filled with a functional RNA identifier, the DB Object Type (column 12) must be either ncRNA, rRNA, tRNA, snRNA, or snoRNA"]_

Also, further down the doc, there is another hanging Q: "**Issues

col 17 - can it be left blank**"
@kltm
Copy link
Member

kltm commented Apr 21, 2022

Tagging @vanaukenk , @cmungall , and @balhoff .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants