-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RNAC RNA types are getting mangled by the pipeline (tested by gorule-0000001) #2246
Comments
@pgaudet Could we add this to the GORULEs project, as it may be due to a filter? |
You mean there is is a GORULE that changes the entity type? Thanks, Pascale |
(I dont have permissions to add this to the GO-rules project; @kltm would you please do it?) |
@pgaudet To clarify, I suspect the issue is that there is a "silent rule" that is converting (or dropping and re-adding information) such that field value
With that definition, this would then include http:https://www.sequenceontology.org/browser/current_release/term/SO:0000655, which is My guess for whatever is going on is that the parser for col 12 is mistakenly bumping |
Can we come up with a fixed static list of types. Saying any subtype of ncRNA is not good; there are 20 subtypes of tRNA, no one should be using these. There is also the issue of labels potentially changing. The number of annotatable distinct meaningful ncRNA types should be small. |
I used @cmungall's example and was able to reproduce. The parser is doing a lookup and defaulting to gene_product (as given in the specs). Currently, there is an entry for 'lnc_RNA' mapped to SO:0001877, but not for 'lncRNA'. I can add an entry for 'lncRNA' and map it to SO:0001877. @pgaudet, please create a lookup for the supported types, I want to ensure all allowed types are mapped. |
@mugitty @pgaudet According to spec, it's a limited list plus a set of entries from the SO. As a compromise (#2246), as we're not actively using SO and likely have never done so, let's pull the "used" subset from the current SO and make our used list static for the moment to prevent drift and issues like we're currently having. |
@mugitty Can you give me all the types you find? And which ones are not mapped. It seem lncRNA should simply be a synonym of lnc_RNA. I can see if I find matches that are more informative than 'gene product'. Thanks, Pascale |
After discussion with @mugitty , I am attaching the allowed entity types and the suggestions for replacement for others. We will first check errors with this list, and we can change the list if needed. |
Thanks @pgaudet , I will update to use this list and output a warning, if defaulting to gene_product. |
As part of the "gaf tests", it would be good to add something to make sure that the synonyms are mapping back to the proper ID (i.e. |
@pgaudet Clarifying that you're removing
What are these expected to map to? Without digging in, I think with the list you have |
@pgaudet , I noticed a test for MGI that was failing with the proposed code update. For example, if there is a GAF line as follows: "gene" will be converted to "gene_product". Is this expected? |
Hi @mugitty whether entity types OTHER than the following are present: protein_coding_gene SO:0001217 and spit out any entity type that doesn't match these, on a file-by-file basis. |
Alternatively - or in addition, could you give me a count of these different types: protein_coding_gene SO:0001217
Thanks, Pascale |
|
(Noting that reactome and zfin need to [obviously] fix their GAF.) |
Isn't this gorule-0000001 ? |
For reference - these are the types that GOA loads from RNA central rRNA 12894606 |
@kltm should I make a new GO rule for entity types? |
@pgaudet That would be great. |
Hi @mugitty Here are repairs we should implement:
These have to be added to the GO list of CURIEs:
That should take care of many issues. However, these types are not in SO:
@mugitty and I propose to continue to change them to 'gene product' and output a warning. Thanks, Pascale |
I need to add tests for the entity types ; can we first check snapshot to see if disallowed types are being reported? |
Source:
what we end up publishing:
Aside for @alexsign should probably be it's own ticket:
Why don't we get gene symbols for RNA types? This one (Xist) clearly has one https://www.genenames.org/data/gene-symbol-report/#!/hgnc_id/HGNC:12810 - why don't we just propagate across from HGNC?
And not to overstuff this issue but there are issues with general RNCA/HGNC propagation on AGR. Recall AGR uses HGNCs:
https://www.alliancegenome.org/gene/HGNC:12810
no GO annotatuion
Even though this gene obviously has a known function:
https://amigo.geneontology.org/amigo/gene_product/RNAcentral:URS000075D95B_9606
The text was updated successfully, but these errors were encountered: