Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gorule-0000027 misses some invalid ID in the with/field #2063

Open
pgaudet opened this issue Sep 6, 2023 · 9 comments
Open

gorule-0000027 misses some invalid ID in the with/field #2063

pgaudet opened this issue Sep 6, 2023 · 9 comments

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Sep 6, 2023

Hello,

@alexsign reported that some 'with' data in the exported Noctua GPADs contain "MGI" rather than "MGI:MGI".
https://github.com/geneontology/go-site/blob/master/metadata/rules/gorule-0000027.md mentions that all db prefixes should be found in the dbxref file

Note that the rule states

In all cases, the prefix MUST be in db-xrefs.yaml. The prefix SHOULD be identical (case-sensitive match) to the database field. If it does not match then it MUST be identical (case-sensitive) to one of the synonyms.

However for MGI the database field is MGI, not MGI:MGI.

@kltm do we need to change the dbxref to align with this?

@pgaudet
Copy link
Contributor Author

pgaudet commented Sep 6, 2023

Assigning @kltm because we need your input to proceed with this.

@balhoff
Copy link
Member

balhoff commented Sep 6, 2023

Isn't the prefix MGI? And the local ID values themselves also contain MGI:. Like this: (MGI:)(MGI:96182). So if the second MGI is missing, it's not a dbxrefs file problem, but a problem with the software or data? Sorry if I'm jumping into something without context!

@kltm
Copy link
Member

kltm commented Sep 6, 2023

Re: "software or data?"
The answer is both: the data is wrong according to our standards and we are not fixing it. In a perfect world, the first is not true and the failure of the second is not necessary. But, alas...
IIRC, there is an issue about examining the IDs in the "with" column in the GORULES tracker somewhere. There should be some basic checking there, although we have never used the regexps that were added after our pipeline was established (IIRC, added by Tony later on to align metadata a little).
MGI has always been a special case and, until we purge that historical choice from the data stream, it's something that we just have to deal with.

We'd have to look at the flow, but I believe all files (sans uniprot) pass through ontobio at some point and are parsed, so that would probably be the most expeditious place to catch things: python parse. Ideally, our internally produced files are not making the mistake when emitting data (i.e. minerva and PANTHER/PAINT), but as long as it doesn't make it out to end users, it doesn't matter too much. Unfortunately, that means that GO-CAM files /do/ get out as there is no QC occurring there--a running frustration.

I think that the best thing to do for the moment would be to:

  • make sure that minerva emits the correct identifier into TTL and and produced GPAD as a special case
  • that the rule is added and enforced in the python parsing
  • we do a one-time update of current TTL, if this error exists on our side

Again, any TTL/GO-CAM issues are "invisible" to us for the time being, so it's better to err on the side of caution.

@kltm
Copy link
Member

kltm commented Sep 6, 2023

Noting too that the GPAD currently emitted by minerva is a bit between specs, IIRC. That makes it a little harder to define what should happen, but that's fine for the moment as long as it is internally consistent.

@pgaudet
Copy link
Contributor Author

pgaudet commented Sep 7, 2023

Noting that GOA filters out this data (ie with that have single "MGI:" as the prefix).

@pgaudet pgaudet moved this from TODO to Discussion in GORULES (low-hanging fruit) Sep 26, 2023
@pgaudet pgaudet moved this from Discussion to TODO in GORULES (low-hanging fruit) Sep 26, 2023
@pgaudet
Copy link
Contributor Author

pgaudet commented Sep 26, 2023

Related or same as #1218

@pgaudet
Copy link
Contributor Author

pgaudet commented Sep 26, 2023

From the test GAF,
tests #4-9 are not failing.

  • test 4: Database prefix not in /db-xrefs.yaml
  • test 5: Assigned by not in groups
  • test 6-9 checks on references

@pgaudet
Copy link
Contributor Author

pgaudet commented Nov 29, 2023

@mugitty
It looks like at least the namespace of the 'with' (GAF column 8) is checked in gorule-0000001 (GORULE_TEST:0000001-19)

@pgaudet
Copy link
Contributor Author

pgaudet commented Nov 29, 2023

So we should define exactly what is checked in gorule-0000001 and narrow the scope of gorule-0000027

GORULE_TEST:0000027-1
GORULE_TEST:0000027-2
GORULE_TEST:0000027-3
GORULE_TEST:0000027-8 are failing gorule-0000001

@pgaudet pgaudet moved this from To spec out & prioritize to TODO in GORULES (low-hanging fruit) Jan 8, 2024
@pgaudet pgaudet moved this from TODO to In progress in GORULES (low-hanging fruit) Jan 11, 2024
mugitty added a commit to geneontology/pipeline that referenced this issue Feb 21, 2024
mugitty added a commit to geneontology/pipeline that referenced this issue Feb 21, 2024
mugitty added a commit to geneontology/pipeline that referenced this issue Feb 21, 2024
@pgaudet pgaudet moved this from In progress to Clearing - needs testing in GORULES (low-hanging fruit) May 6, 2024
@mugitty mugitty moved this from Clearing - needs testing to In progress in GORULES (low-hanging fruit) Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants