Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertions not working in Biocache Store #393

Open
javier-molina opened this issue Jun 19, 2020 · 1 comment
Open

Assertions not working in Biocache Store #393

javier-molina opened this issue Jun 19, 2020 · 1 comment
Labels

Comments

@javier-molina
Copy link
Contributor

This is a placeholder to document data assertions that either don't work or do not correctly according to its intended purpose.

Full list of assertions, and decisions on what to do about them is kept in Confluence page of the same name.

@Mesibov
Copy link

Mesibov commented Nov 24, 2020

I did a series of assertions tests in December 2015 on a large sample dataset. Results and problems were reported in biocache-store #100, #101, #102, #103, #104, #105, #106, #107, #108. I would do a check like this very differently today and on a much larger dataset, but note that the problems in those 9 issues are all still open 5 years later (and like most of my ALA GitHub issues postings, have never been labelled, assigned or addressed).

Two different approaches to "assertions not working" are illustrated here in #393 (and on the Confluence page) and in my 2015 effort. One is to pick up "not workings" opportunistically. The other is to systematically analyse whether or not assertions work as intended, which sounds to me like quality control in ALA's data processing, and which does not seem to have been an ALA priority.

A related question - for which a GitHub issues page is not the appropriate place - is which (if any) of ALA's assertions have any value. Has ALA ever systematically examined whether and how data providers respond to assertions in their records? How confident can end-users be in the 2020 "data quality" filtering initiative, that the exclusions are validly "bad" and the inclusions validly "good"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants