Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DwC fields not being indexed #391

Closed
3 tasks done
nickdos opened this issue Jun 16, 2020 · 7 comments
Closed
3 tasks done

DwC fields not being indexed #391

nickdos opened this issue Jun 16, 2020 · 7 comments

Comments

@nickdos
Copy link
Contributor

nickdos commented Jun 16, 2020

See support ticket https://support.ehelp.edu.au/a/tickets/81984.

User flagged that some DwC fields do not appear in a download file but the fields can be seen on an individual record page.

EDIT: Outstanding tasks moved to #394

See https://biocache-ws.ala.org.au/ws/occurrences/search?q=data_resource_uid%3Adr342&facets=georeferenced_by,georeference_protocol,georeferenced_date,georeference_sources&pageSize=0

Only georeferenced_date shows values and this is also the only column populated for CSV downloads. All the georef* fields are marked as being indexed and stored - https://biocache.ala.org.au/fields?filter=georef*.

Investigate why these fields are not being added to the SOLR index.

@charvolant
Copy link
Contributor

The raw fields get indexed. https://biocache-ws.ala.org.au/ws/occurrences/search?q=data_resource_uid%3Adr342&facets=raw_georeferenced_by,raw_georeference_protocol,raw_georeferenced_date,raw_georeference_sources&pageSize=0

Looking at the cassandra table, georeferencedBy_p is not being updated from georeferencedBy. However, georeferencedDate_p is.

@nickdos
Copy link
Contributor Author

nickdos commented Jun 19, 2020

@charvolant user came back and said samplingProtocol also not showing up - should I create a new issue or leave it here?

@Mesibov
Copy link

Mesibov commented Jun 19, 2020

@nickdos wrote: "User flagged that some DwC fields do not appear in a download file but the fields can be seen on an individual record page."

From 2018 paper (https://doi.org/10.3897/zookeys.751.24791)

"identifiedBy: ...The original identifiedBy_raw data item appears on the ALA webpage as “Identified by” for the record but is missing from the standard (recommended) download."
"locality: ...The original locality_raw data item appears on the ALA webpage as “Locality” for the record but is missing from the standard (recommended) download."

These 2 were subsequently fixed, but was no automated check put in place to ensure that downloaded fields were the same as the databased fields, or at least not empty vs non-empty? Left it to users to spot, instead?

@timhicks-ala
Copy link

Additional fields to add if applicable:

  • num_identification_agreements, eg "2"
  • identification_verification_status, eg "research"

These are related to iNaturalist and the community identification of a sighting. Neither of these is currently exported in any download, making it impossible to determine the community's confidence on a record's ID in any downloaded set of iNat data.

Issue raised in helpdesk ticket 84773 as I couldn't advise the user to specifically use those fields in a download to gauge accuracy of records.

@ansell
Copy link
Contributor

ansell commented Jul 8, 2020

AtlasOfLivingAustralia/biocache-service#317 is still an issue even though it was closed at one point due to confusion about the nature of the bug.

The sampling protocol processed field is not consistently populated with the raw values, so downloads look odd and are missing values in the "samplingProtocol" column because of the bug.

@nickdos
Copy link
Contributor Author

nickdos commented Jul 28, 2020

Not yet appearing in prod SOLR. Keeping in QA

  • test on sandbox test on nectar.

@nickdos
Copy link
Contributor Author

nickdos commented Nov 5, 2020

Facets now have values.

@nickdos nickdos closed this as completed Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants