Skip to content
This repository has been archived by the owner on Sep 25, 2018. It is now read-only.

Add a HPOA tab parser #27

Open
cmungall opened this issue Mar 22, 2016 · 10 comments
Open

Add a HPOA tab parser #27

cmungall opened this issue Mar 22, 2016 · 10 comments

Comments

@cmungall
Copy link
Member

Redundant functionality with
monarch-initiative/dipper#277

But useful to have in one library

@balhoff
Copy link
Member

balhoff commented Jul 5, 2016

@cmungall I started implementing this in pxftools. Should I move it here?

Also, some questions about the format. I have been putting the disease as the entity in the phenopacket phenotype profile. So if there is a value in the row for the "Gene ID", where does it go in the phenopacket? Likewise, how do I use a value for "Sex ID"?

cc @drseb

@cmungall
Copy link
Member Author

cmungall commented Jul 6, 2016

There are actually two separate TSVs - HPOA and the phenote-HPO TSV.

When I wrote the ticket I believe I intended HPOA, but the phenote TSV is now higher priority for webphenote migration.

@DoctorBud had the same questions, for verifying that the d2p forms were complete w.r.t phenote functionality,

For HPOA, we don't have GeneID, SexID, GenotypeID. Where we have these in phenote, they have only been used historically, and get ignored when @drseb compiles the phenote TSVs into HPOA-TSV.

We should do the same for our purposes here. We will have the original phenote TSVs archived if we ever want to go back and use information that has been populated here.

@drseb
Copy link
Member

drseb commented Jul 6, 2016

I saw that @pnrobinson still puts information into gene field. not sure why!? can you explain?

I think it is better to pull gene-disease links from other sources (e.g. orphanet)

@balhoff
Copy link
Member

balhoff commented Jul 7, 2016

Download a pxftools release and you can convert the Phenote TSVs to phenopackets like this:

pxftools --informat=hpo-phenote --out=OMIM-300869.yaml convert OMIM-300869.tab

Creator and date are left out until we add those to the API.

@cmungall
Copy link
Member Author

cmungall commented Jul 7, 2016

Nice! @jnguyenx can we set up a jenkins job to to do this over the whole repo? Or we can do this on the charite jenkins if @drseb prefers

@jnguyenx
Copy link
Contributor

jnguyenx commented Jul 7, 2016

Sure, it's doable in the Monarch Jenkins. If you want to go ahead with this instance, let me know where to get the source files.

@balhoff
Copy link
Member

balhoff commented Jul 7, 2016

I've noticed that I'm missing some evidence code translations for IDs used in the files. The HPO TSVs use IDs like IEA, ITM, TAS, ICE, etc. I am using ECO and searching for exact synonyms. But many aren't in there. What is the source for these?

@balhoff
Copy link
Member

balhoff commented Jul 7, 2016

Aha, I've found many of them in the HPO documentation. But TAE and TEA are missing, and many others don't have ECO IRIs. So at seems like these will need IRIs from somewhere.

@drseb
Copy link
Member

drseb commented Jul 8, 2016

I fixed TAE -> TAS. Not sure about TEA. Has been used intentionally and very often. @pnrobinson ?

@pnrobinson
Copy link

If we have a lot of TEA codes, then this must be an odd Phenote Bug.
Apart from errors, the HPO should have only the five evidence codes listed here.
https://human-phenotype-ontology.github.io/documentation.html#annot
We should probably integrate with the evidence/conclusion ontology as Jim implies. Are there obvious candidates for the five codes we have been using (published clinical experience might be difficult but we can do a request)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants