Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused GO_REFs #1329

Open
pgaudet opened this issue Jan 23, 2020 · 21 comments
Open

Remove unused GO_REFs #1329

pgaudet opened this issue Jan 23, 2020 · 21 comments
Assignees
Labels

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Jan 23, 2020

Hello,

I cannot find those references either in the ontology or in annotations. Can we delete them ?

GO_REF ID  Description
GO_REF:0000006 Gene Ontology annotation by the MGI curatorial staff, Mouse Locus Catalog
GO_REF:0000010 Gene Ontology annotation by the MGI curatorial staff, mouse gene nomenclature
GO_REF:0000018 dictyBase 'Inferred from Electronic Annotation (BLAST method)'
GO_REF:0000019 Automatic transfer of experimentally verified manual GO annotation data to orthologs using Ense
GO_REF:0000023 Gene Ontology annotation based on UniProtKB Subcellular Location vocabulary mapping.
GO_REF:0000026 Improving the representation of muscle biology in the biological process and cellular component
GO_REF:0000029 Gene Ontology annotation based on information extracted from curated UniProtKB entries
GO_REF:0000030 Portable Annotation Rules
GO_REF:0000031 NIAID Cell Ontology Workshop
GO_REF:0000032 Inference of Biological Process annotations from inter-ontology links
GO_REF:0000033 (Obsolete) Annotation inferences using phylogenetic trees
GO_REF:0000035 Automatic transfer of experimentally verified manual GO annotation data to plant orthologs usin
GO_REF:0000042 Gene Ontology annotation through association of InterPro records with GO terms, accompanied by
GO_REF:0000043 Gene Ontology annotation based on UniProtKB/Swiss-Prot keyword mapping, accompanied by conserva
GO_REF:0000044 Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary mapping,
GO_REF:0000045 Gene Ontology annotation based on UniProtKB/TrEMBL entries keyword mapping, accompanied by cons
GO_REF:0000046 Gene Ontology annotation based on UniProtKB/TrEMBL Subcellular Location vocabulary mapping, acc
GO_REF:0000047 Gene Ontology annotation based on absence of key sequence residues.
GO_REF:0000048 TIGR's Eukaryotic Manual Gene Ontology Assignment Method
GO_REF:0000049 Automatic transfer of experimentally verified manual GO annotation data to fungal orthologs usi
GO_REF:0000053 Automatic classification of GO using the ELK reasoner
GO_REF:0000055 Gene Ontology Cellular Component annotation based on cellular fractionation.
GO_REF:0000056 Taxon constraints to detect inconsistencies in annotation and ontology structure.
GO_REF:0000057 Gene Ontology annotations inferred by curators' judgment using experimental data and prior know
GO_REF:0000077 Representation of transport of a cellular component as biological process in the Gene Ontology
GO_REF:0000082 Representation of plant maturation as biological process in the Gene Ontology
GO_REF:0000098 Gene Ontology annotation based on research conference abstracts
GO_REF:0000099 Gene Ontology annotation based on DNA/RNA sequence records
GO_REF:0000100 Gene Ontology annotation by SEA-PHAGE biocurators
GO_REF:0000102 Representation of cellular component binding as molecular functions in the Gene Ontology
GO_REF:0000103 Representation of cellular component organization as biological process in the Gene Ontology
GO_REF:0000106 Gene Ontology annotation based on protein sequence records.
GO_REF:0000112 Gene Ontology annotation by CACAO biocurators

@hdrabkin @PFey @sandyl27 @alexsign

Do you need any of those references ?

Thanks, Pascale

@kltm
Copy link
Member

kltm commented Jan 23, 2020

@pgaudet Don't we treat GO refs as we do ontology entities and not delete them?

@hdrabkin
Copy link
Contributor

@ukemi would GO_REF:0000010 before the RCA (Riken) annotations?

@pgaudet pgaudet self-assigned this Jan 24, 2020
@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 24, 2020

@kltm Sure - I will change the status to 'is_obsolete = true'

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 24, 2020

However can we filter obsolete go_refs from our products ? For example GO_REF:0000033 should not be exported: http:https://release.geneontology.org/2020-01-01/metadata/gorefs/index.html

Thanks, Pascale

@kltm
Copy link
Member

kltm commented Jan 24, 2020

If there's no longer a landing spot for it, that would miss the point have having the obsolete and notice--just like an ontology term, we don't want to be taking these things "away", just marking them and adding commentary.

The current best landing site would be https://github.com/geneontology/go-site/blob/master/metadata/gorefs/README.md
https://github.com/geneontology/go-site/blob/master/metadata/gorefs/README.md#goref0000010
(unfortunate that the markdown is a little wonky there)

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 27, 2020

@alexsign Are you using any one of those GO_REFs?

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 27, 2020

GO_REF:0000042, GO_REF:0000043, GO_REF:0000044
GO_REF:0000045, GO_REF:0000046, GO_REF:0000047

Used in UniProt

@alexsign
Copy link
Contributor

@pgaudet yes, I do for three of them. see bellow with the number of annotations
GO_REF:0000042 130
GO_REF:0000045 66
GO_REF:0000046 1

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 27, 2020

@alexsign
Perhaps those should be merged ??

GO_REF:0000042 InterPro -> GO_REF:0000002
GO_REF:0000043 SPKW mappings -> GO_REF:0000004
GO_REF:0000046 -> SubCell -> GO_REF:0000023

What do you think ?

@alexsign
Copy link
Contributor

@pgaudet sounds reasonable, I'll let you know when it's remapped, but you can go ahead and remove them now

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 27, 2020

thanks !

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 27, 2020

  • Do we need
    GO_REF:0000037 UniProt Keywords2GO (UniProtKB/Swiss-Prot entries)
    GO_REF:0000038 UniProt Keywords2GO (UniProtKB/TrEMBL entries)
    in addition to GO_REF:0000043 UniProt Keywords2GO (UniProtKB/Swiss-Prot entries, conservatively modified by UniProt) ?

  • Do we need
    GO_REF:0000039 UniProt Subcellular Location2GO (UniProtKB/Swiss-Prot entries)
    GO_REF:0000040 UniProt Subcellular Location2GO (UniProtKB/TrEMBL entries)
    in addition to GO_REF:0000044 Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary mapping, accompanied by conservative changes to GO terms applied by UniProt.

?

If you don't need them all, let me know which one you'd pick for each group.

Thanks, Pascale

@alexsign
Copy link
Contributor

alexsign commented Jan 27, 2020

@pgaudet the 3 GO_REFs that I posted are the only GO_REFs in the GOA database from the full list of GO_REFs above.

@alexsign
Copy link
Contributor

@pgaudet here is the annotation data for 37-40:
GO_REF:0000038 283932240
GO_REF:0000039 426283
GO_REF:0000040 23391318
GO_REF:0000037 2366409

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 27, 2020

ok thanks !

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 28, 2020

Hi @alexsign
@sylvainpoux agrees with the merge:

  • GO_REF:0000037 UniProt Keywords2GO (UniProtKB/Swiss-Prot entries)

  • GO_REF:0000038 UniProt Keywords2GO (UniProtKB/TrEMBL entries)

  • '''into''' in addition to GO_REF:0000043 UniProt Keywords2GO (UniProtKB/Swiss-Prot entries, conservatively modified by UniProt)

  • GO_REF:0000039 UniProt Subcellular Location2GO (UniProtKB/Swiss-Prot entries)

  • GO_REF:0000040 UniProt Subcellular Location2GO (UniProtKB/TrEMBL entries)

  • '''into''' REF:0000044 Gene Ontology annotation based on UniProtKB/Swiss-Prot Subcellular Location vocabulary mapping, accompanied by conservative changes to GO terms applied by UniProt.

Let me know if I can remove references 37, 38, 39 and 40 or if you need to merge them at your end first.

Thanks, Pascale

@alexsign
Copy link
Contributor

alexsign commented Jan 28, 2020

@pgaudet please hold on for now. I'll need to do all needed changes in the database and unload data to QuickGO first.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jan 28, 2020

Sounds good.

@alexsign
Copy link
Contributor

@pgaudet it's all good now in GOA and QuickGO. Please go ahead and absolute GO_REFs discussed above. Thanks.

@suzialeksander
Copy link
Contributor

Looks like trying to annotate a yeast protein to "GO:0045053 protein retention in Golgi apparatus" is getting

Annotation failed taxon constraint GO:0032507 (maintenance of protein location in cell) and its descendants should ONLY be used with gene products from the taxonomic group cellular organisms

is this something on our (GO's) end or P2GO?

@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 5, 2021

Obsolete GO_REF:0000004 - has xrefs to SGD, ZFIN and MGI

Used by ZFIN and MGI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants