Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change sources: Human, chicken, dog, pig, cow files #2214

Closed
pgaudet opened this issue Dec 21, 2023 · 9 comments
Closed

Change sources: Human, chicken, dog, pig, cow files #2214

pgaudet opened this issue Dec 21, 2023 · 9 comments

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Dec 21, 2023

Hello,

@alexsign /GOA is now producing 'combined' files for Human, chichen, dog, pig, cow files, containing all Swiss-Prot isoforms, (not the TrEMBL isoforms), complexes, and RNAs.

The links are here:

We need to change where we get this data in our 'sources'

Thanks, Pascale

@kltm
Copy link
Member

kltm commented Jan 8, 2024

Talking to @pgaudet , we'll wait for this next release to pass and then push the change. Possible locations of friction:

  • neo
  • downloads
  • stats

@kltm
Copy link
Member

kltm commented May 7, 2024

@pgaudet I noticed the existence of goa_pdb (https://ftp.ebi.ac.uk/pub/databases/GO/goa/PDB/goa_pdb.gaf.gz) in the metadata. Is this used for anything? I think we don't use that? I don't have any reference to it, except causing problems, back to 2019.

@pgaudet
Copy link
Contributor Author

pgaudet commented May 7, 2024

The files in the first comment are correct. GOA produces various files for various groups; we can ignore these.

@kltm
Copy link
Member

kltm commented May 8, 2024

Initial changes have been made and we're waiting on a snapshot run to test.

kltm added a commit to geneontology/pipeline that referenced this issue May 8, 2024
kltm added a commit to geneontology/pipeline that referenced this issue May 8, 2024
kltm added a commit to geneontology/pipeline that referenced this issue May 8, 2024
kltm added a commit to geneontology/pipeline that referenced this issue May 8, 2024
kltm added a commit to geneontology/pipeline that referenced this issue May 8, 2024
kltm added a commit to geneontology/pipeline that referenced this issue May 8, 2024
kltm added a commit to geneontology/pipeline that referenced this issue May 13, 2024
@kltm
Copy link
Member

kltm commented May 15, 2024

Talking @pgaudet, the stats seem to be good.
Looking at the test downloads page (http:https://snapshot.geneontology.org/products/pages/downloads.html , ignoring the links), that seems to be good.

The final item to ensure is the NEO build. Building now.

@kltm
Copy link
Member

kltm commented May 21, 2024

NEO built:
1734706857 golr-index-contents.tgz
on machine:
1738730937 golr_new.tgz
Given how close these are, I think it's reasonable that nothing extreme happened. Allowing snapshot to proceed.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jun 13, 2024

Single file for human, dog, cow, chicken and pig: :)

Image

compared to 2024-04-24 release:

Image

@suzialeksander
Copy link
Contributor

I think this is complete? The only concern I see now is the entity is incorrect, currently is "protein" when it's a mix of protein, various RNAs, "gene_product", etc. But I think the requirements of this actual ticket are complete.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jun 20, 2024

Right, next, we need to fix the downalods page and the documentation,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 2024-06-17 Snapshot
Development

No branches or pull requests

3 participants