Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats for GO-CAM models #2339

Open
ValWood opened this issue Jul 1, 2024 · 5 comments
Open

Stats for GO-CAM models #2339

ValWood opened this issue Jul 1, 2024 · 5 comments
Assignees

Comments

@ValWood
Copy link
Contributor

ValWood commented Jul 1, 2024

Can we have a metric on the website about the number of genes in GO-CAM models (by species)

i.e. a non-redundant list of genes that are causally connected (obviously, some genes will be in multiple models), but it would be useful if we could have a way to quickly assess proteome coverage.

cc
@pgaudet
@vanaukenk

@pgaudet
Copy link
Contributor

pgaudet commented Jul 15, 2024

@kltm suggests using the GPAD to derive these statistics.

Is this what you had in mind?

  • Number of models by group
  • Number of models by curator
  • Number of MFs per model (activity units)
  • number of relations per model

Something like this? what else? We need to define which stats we need before we can get started

@sylvainpoux you probably have some suggestions as well.

@ValWood
Copy link
Contributor Author

ValWood commented Jul 15, 2024

For me, to track pathway curation I'm primarily interested in coverage, so the number of genes covered by models; by model I'm referring to genes to be causally connected to another gene (not just a standard annotations, or a gene connected to an activity and a process).

For example, the Reactome covers 11279 human proteins. https://reactome.org/about/statistics That's really useful to know.

@deustp01
Copy link

The two suggested statistics tally different things. Number of gene products with annotations of any sort says, sort of, what kind of coverage of the organism's genone is provided. The set of tallies earlier in the thread measure aspects of curator activity.

@sylvainpoux
Copy link

Hi @pgaudet,

I think these different propositions make sense.

Statistics are essential to measure activity, but they should not be misused: the significant over-annotation that we observe from the last 20 years is mainly due to the tendency to make numbers at the expense of the quality.

In my opinion, the real added value in GO-CAM is to connect genes together (or connect genes with small molecules). From that point of view, I would suggest to only consider high-quality models: those with connections, full annotation units/annotons (at least one MF and one BP) and evidences. Other annotations could be calculated as classic GO annotation.

@vanaukenk
Copy link
Contributor

Pascale and I suggest that we first gather more specific requirements for Noctua statistics from curators and then we can come back to the software team.

We'll plan for this discussion on an annotation call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

6 participants