Skip to content

Managing DOIs with a postgres database and the DataCite API.

License

Notifications You must be signed in to change notification settings

NeotomaDB/neotoma_doi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lifecycle NSF-1550707 NSF-1550855 NSF-1541002

Neotoma Data DOI Generation

Overview

This repository acts as the central management point for a set of repositories that are used to generate digital object identifiers (DOIs) for datasets in the Neotoma Paleoecology Database.

DOIs are generated at the level of a dataset, which in Neotoma consists of all measurements of a given data type for a single collection unit at a site (e.g. all vertebrate fossils from a bone pile in a cave; all fossil pollen samples from a core in a lake; etc.) All DOIs are associated with a landing page.

Linked repositories include:

Contributors

This project is currently under development. All participants are expected to follow the code of conduct for this project.

NOTE: The DataCite XML validation files in the data/ folder (and include subfolder) were obtained from the DataCite GitHub Schema repository.

Background

DOIs, Dataset Versioning, and Frozen Records in Neotoma

For any single dataset, the DOI provides access to three related elements:

  • The live record (accessed from Neotoma via the various APIs)
  • The frozen record (saved one week from dataset submission)
  • The DOI metadata (posted to DataCite)

The live record lives as the relationship between elements in the database, linked to the datasets table. Thus, the live record can change over time, as taxonomies or linked chronologies change.

The frozen record is generated within a week of dataset submission. It represents the state of the record at the time of upload. This version supports journal requirements for data submissions and aligns with data-management best practices. The frozen record lives in the doi schema of the database and is stored as a (Postgres) jsonb data type, along with the datasetid, the date created and date modified (if neccessary

The DOI metadata is stored with DataCite and is generated from a script in this repository. When a new DOI is minted the DOI and related datasetid is added to the datasetdoi table.

Workflow For DOI Assignment

  1. A Neotoma data steward uploads a dataset to Neotoma (Tilia -> Tilia API -> NeotomaDB)

  2. Chron job running in data-dev checks for all records generated at least one week ago, without a "frozen" version (query in the neotoma_doi repository)

  • The script generates a frozen version of the dataset in the table doi.frozen in the database.
  • The function returns a list of aggregated datasetids along with the contact information for the dataset PI.
  • [not currently implemented] An email will be sent to each dataset PI with a listed email address. The email will confirm that a DOI or a set of DOIs have been reserved, and that the PI has one week to review the relevant data. It will also indicate that certain metadata (ORCIDs, email, site notes or descriptions) would assist in improving the usefulness of the data. Provide a link to the Explorer and Landing Pages for the data record and a link to (?something?) to facilitate adding the required metadata.
  1. The PI of record can contact the steward to update the metadata (or a token can be generated to allow the PI to update things?)

  2. The same chron job in #2 will identify records where the ndb.dataset entry is older than 14 days, the dataset has an entry in doi.frozen and no entry in ndb.datasetdoi. This assumes that PIs and stewards have had an opportunity to revise their datasets.

  • For each entry UPDATE the frozen dataset using doi.doifreeze().
  • For each entry run the function assign_doi() to build the DataCite XML file, and post the DOI metadata
  • Send an email to each dataset PI indicating the DOIs have been successfully minted.

Funding

This work has been supported by grants from the National Science Foundation: NSF-1541002, NSF-1550855 and NSF-1550707.

About

Managing DOIs with a postgres database and the DataCite API.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages