Releases: CDRH/datura
Releases · CDRH/datura
ingest documentation, text spacing, and date_standardize
This release takes v0.2.0 out of beta, and makes some minor changes:
Added
- minor test for Datura::Helpers.date_standardize
- documentation for web scraping
- documentation for CsvToEs (transforming CSV files and posting to elasticsearch)
- instructions for installing Javascript Runtime files for Saxon
Changed
- date_standardize now relies on strftime instead of manual zero padding for month, day
- minor corrections to documentation
- XPath: "text" is now ingested as an array and will be displayed delimitted by spaces
Migration
- check to make sure "text" xpath is doing desired behavior
Changes to field and xpath behavior
This is considered a beta release and it is expected that there may be some issues which come up
Added
- Fields (and therefore methods) for ES JSON, such as extent, alternative, spatial, etc
- Methods to xToES format fields to accommodate default behavior
- ES JSON
uri
now populated using default Orchid item path - Tests and fixtures for all supported formats except CustomToEs
get_elements
returns nodeset given xpath argumentsspatial
nested fieldsspatial.type
andspatial.title
Changed
- Arguments for
get_text
,get_list
, andget_xpaths
- XPaths for VRA and TEI to Elasticsearch
- Default behavior for CsvToEs for some fields
- Documentation updated
- Changed Install instructions to include RVM and gemset naming conventions
- API field
coverage_spatial
is now justspatial
Migration
- Change
coverage_spatial
nested field tospatial
get_text
,get_list
, andget_xpaths
require changing arguments to keyword (likexml
andkeep_tags
)- Recommend checking xpaths and behavior of fields after updating to this version, as some defaults have changed
- Possible to refactor previous FileCsv overrides to use new CsvToEs abilities, but not necessary
Improvements to CSV, WEBS transformers and adds Custom transformer
Added
- CsvToEs class added which imitates style of other XToEs classes for easier overriding / maintenance
- Custom formats now supported, although no functionality provided since the type of format cannot be predicted
- Adds documentation for custom format setup
Changed
- CSV to ES transformation no longer accepts default column names, but instead looks for columns matching ES fields to use
- FileType elasticsearch transform now has swappable component when reading XML-type files. Webscraping script altered to manipulate HTML instead of XML object type
Removed
- CSV to ES transformation used to automatically assume columns as ES fields, this functionality has been removed
VRA to Solr Alterations
Minimal fixes and alterations to fields in VRA to Solr XSLT transformation.
PB Update
Changed
- Removed match on
pb/@xml:id
for tei-to-html
IIIF Manifests
Added
- IIIF output format and documentation
- Changelog
Changed
- nokogiri gem restricted to moving minor version instead of patch
Removed
- pkg builds of gem
- outdated comment line
Web scraping support, post by update time fix
Added webs
format for minimal support of web scraping by specific apps
- currently collections using this feature will need to write all of their own code for process
- no defaults or recommendations about config settings implemented at this time
Fixed --update flag, which was broken
- added "today" shortcut for those who don't wish to type in the entire date
Misc other typo fixes, etc
Pre and post file transformation hooks
Adds ability to manipulate files before and after transformation
Accommodates ruby 2.6.x
Datura Gem Launch
Implements previous "data" repository functionality as a ruby gem, "datura"
Original Data Repository
This release contains code as the data repository used to be when it was a collection of scripts. After this point, it will be a gem named datura.