Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entity type 'protein' for IKR evidence #1882

Open
pgaudet opened this issue Jul 7, 2022 · 14 comments
Open

Add entity type 'protein' for IKR evidence #1882

pgaudet opened this issue Jul 7, 2022 · 14 comments

Comments

@pgaudet
Copy link
Contributor

pgaudet commented Jul 7, 2022

We want to do this to allow using MEROPs (and perhaps other resources) as evidence for IKR annotations.

Specific example from @sjm41

In that example:
Entity: A1ZBU9 (CG13423/FBgn0034513)
Qualifier: NOT|enables
GO: GO:0008233 (peptidase activity)
Evidence: IKR
Reference: GO_REF:0000047
With/from: MEROPS:C01.UNB [<- this is the bit I’d like to add, but I can’t in P2GO]

That MEROPS ID corresponds to this non-peptidase family entry at MEROPS:
https://www.ebi.ac.uk/merops/cgi-bin/pepsum?mid=C01.UNB
https://www.ebi.ac.uk/merops/cgi-bin/sequence_data?mid=C01.UNB

So in summary, I’d like to make a ’NOT peptidase activity’ annotation to proteins that have peptidase domains but are classified as ’non-peptidase homologs’ at MEROPS because they lack key catalytic residues.

PR is here:#1881

Thanks, Pascale

@sjm41
Copy link

sjm41 commented Jul 8, 2022

Thanks @pgaudet . Sounds like this would fix things for my use case. I think the alternative is to change the "type_name" of the MEROPS entry in the GO xrefs metadata to "gene/protein family" - see #1875. I'm not sure which solution is better/easier??

FYI: Looking at all MF annotations in QuickGO, I don't see any annotations using 'MEROPS' in the with/from column, so any change here won't affect any existing annotations - it will just determine how we can use MEROPS IDs in IKR annotations going forward.

@pgaudet
Copy link
Contributor Author

pgaudet commented Jul 8, 2022

Looking at all MF annotations in QuickGO, I don't see any annotations using 'MEROPS' in the with/from column, so any change here won't affect any existing annotations - it will just determine how we can use MEROPS IDs in IKR annotations going forward.

Right! since it wasn't allowed until now. Thanks for checking!

@sjm41
Copy link

sjm41 commented Jul 8, 2022

Right, but given there's a MEROPS entry in the GO xrefs metadata file (#1875) and I don't know how long that's been there, I wanted to check if there's been any usage of MEROPS IDs in any context (e.g. ISS?) to date. Seems not.

@pgaudet
Copy link
Contributor Author

pgaudet commented Sep 22, 2022

Hi @sjm41
This is working using the syntax provided by identifiers.org, see https://registry.identifiers.org/registry/merops#

which is something like C01.### (where ### is a number)

Do you need the identifiers with 3 letters after the dot? We cannot find an 'official source' for that format, but we can relax the rules if needed.

Thanks, Pascale

@sjm41
Copy link

sjm41 commented Sep 22, 2022

Hi @pgaudet
Thanks for following up. Yes, the ### bit should be relaxed to be any three letters or numbers. It's true that the MEROPS ID is defined [here](https://www.ebi.ac.uk/merops/about/glossary.shtml#MeropsID) with the ### bit being 3 numbers, but as the example at the top of this page shows (C01.UNB), that's not always the case in practice.....

@sjm41
Copy link

sjm41 commented Sep 22, 2022

Looking at the MEROPS IDs I want to use (at least), the ### bit corresponds to one of 3 regular expressions:

  • 3 numbers (e.g. 984)
  • 3 letters (e.g. UPA)
  • 1 letter followed by two numbers (e.g. A65)

@sjm41
Copy link

sjm41 commented Sep 22, 2022

This is working using the syntax provided by identifiers.org
I just checked in Protein2GO and I don't (yet) see MEROPS as an option in the 'with' drop-down menu when using IKR (the only option is still 'InterPro').

I guess this may be a separate/downstream issue, maybe specific to P2GO??
I tried reviewing the email thread "GO_REF attribution for internal NOT annotations (IKR)", but I'm still unclear what needs to be done for P2GO to allow a MEROPS ID in this use case. Do we need another chat with Alex?

pgaudet added a commit that referenced this issue Sep 29, 2022
@pgaudet
Copy link
Contributor Author

pgaudet commented Sep 29, 2022

@alexsign and I edited the regular expression again, see
#1908

Let's hope this time it works!

pgaudet added a commit that referenced this issue Sep 29, 2022
@sjm41
Copy link

sjm41 commented Sep 29, 2022

Thanks @pgaudet and @alexsign for continuing to work on this!

Good news is that I now see 'MEROPS' (and 'MEROPS_fam') in the drop-down menu on protein2GO:
Screenshot 2022-09-29 at 16 32 49

Other good news is that a ID like "S09.986" (ie. three numbers after the dot) or "S01.A18" (one letter + two numbers after the dot) is a valid entry (entered with 'MEROPS' selected in the drop-down).

Bad news is that IDs like "C85.UNA" (ie. three letters after the dot) isn't allowed. Can you check the regex again? Many thanks!!

@sjm41
Copy link

sjm41 commented Sep 29, 2022

Can we also clarify the difference between 'MEROPS' and 'MEROPS_fam' in the menu?

@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 6, 2022

Should now work

@sjm41
Copy link

sjm41 commented Oct 6, 2022

Yep, confirming that IDs like "C85.UNA" (ie. three letters after the dot) now work. Thanks!

@sjm41
Copy link

sjm41 commented Oct 6, 2022

Notes from quick chat with Pascale:

  • as we know, the "MEROPS" field is for the "MEROPS ID" - XXX.YYY for a specific peptidase
  • the 'MEROPS_fam' field seems to be for a family ID, i.e. just the XXX bit (e.g. S01 or C85), but a MEROPS family contains many different peptidases, including both active and inactive, so we can't see how/why a curator could use a family ID in a GO annotation. Pascale will look into removing this option.
  • we need to add to the documentation to explain the current, valid entry types in the 'with/from' field with example entries

@pgaudet
Copy link
Contributor Author

pgaudet commented Oct 6, 2022

Remove protein family for IKR

https://github.com/geneontology/go-site/compare/master...pgaudet-patch-71?quick_pull=1

Hopefully this should do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants