Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing DQ checks: spatial - geography #108

Open
Mesibov opened this issue Dec 15, 2015 · 0 comments
Open

Testing DQ checks: spatial - geography #108

Mesibov opened this issue Dec 15, 2015 · 0 comments

Comments

@Mesibov
Copy link

Mesibov commented Dec 15, 2015

Tested on 457075 beetle records downloaded 2 December 2015. I had considerable trouble understanding what some of these checks actually do, and as with many of the other spatial data checks I doubt that users will find many of them helpful.

I had particular trouble with

Coordinates dont match supplied country ('Latitude and Longitude do not match the supplied country name')
454636 false
2439 true

All of the 'true' cases have passed the 'Supplied country not recognised' test, including 2028 records where the original lat/lon is '0 0' and no country has been parsed, and 65 records with blank lat/lons assigned to 'Australia'. Among the remaining 346 records,
106 are from Christmas Island: 18 of these have the country field blank, 88 have 'Christmas Island'
12 are from Heard/McDonald Islands: 2 have country blank, 10 have the island names

What's going on? It turns out that the supplied country in these cases was 'Australia', which is politically correct, but ALA apparently only accepts geographical units according to a reference list. Is that list available to data providers?

There are other oddities picked up by this flag (I haven't checked them all), including this mistake:

http:https://biocache.ala.org.au/occurrences/01c7ddd4-e61e-4512-aeba-7c361b4a1e25

The supplied geography (Location remarks) reads: "Italy, Frosinone, Vallerotonda, Monte Pagano (41° 31' 59" N, 19° 55' 59" E), Müller, Franklin(Collector), Field Collected - Terrestrial". The lat/lon is incorrect in the degrees longitude figure, which should be 03°. ALA has ignored the supplied text, found the incorrect lat/lon in Albania and assigned this record there.

(1) Supplied country not recognised ('Name of the country is not recognised')
456806 false
269 true

(2) Country inferred from coordinates ('The country name has been derrived [sic] from supplied latritude [sic] and longitude')
386681 false
70394 true

There is no overlap at all between the 2 'true' classes, leaving the user wondering how the country name was derived for those records where country name wasn't recognised, if not from lat/lon. In fact, among the 95 unique locations in the 'Supplied country not recognised' = true, 'Country inferred from coordinates' = false records,

70 had lat/lons supplied and country names were evidently derived from these
7 had lat/lons supplied but no country names were derived (but could have been)
18 had no lat/lons supplied and no country names were derived

Coordinates dont match supplied state ('Latitude and Longitude do not match the supplied state name')
455274 false
1801 true

The 284 unique values among the 1801 include State border cases (e.g. ACT/NSW) where the uncertainty in the lat/lon would cover the border. This is a common problem with old ANIC records where lat/lon was databased to the nearest minute (DM).

The 'true' cases also include a fair few mistakes like the Italy > Albania one noted above, where the data provider gives a State name in text and an incorrect lat/lon, and ALA processes the record based only on lat/lon. Here are 2 records with same taxon, same collector, same day in NSW, but ALA has taken the incorrect lat/lon in the second record and put the occurrence in Tasmania:

http:https://biocache.ala.org.au/occurrences/2c057350-afb4-4612-ac13-82e127496d1d
http:https://biocache.ala.org.au/occurrences/156a38ec-8398-461f-b9b7-326619616d2e

Here a Queensland collection was shifted by ALA to WA:

http:https://biocache.ala.org.au/occurrences/04a6d044-743f-47f2-b55c-a40f9014770a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant