Skip to content
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.

Add command to flag data formatting issues #22

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fwextensions
Copy link
Member

@fwextensions fwextensions commented Apr 16, 2022

  • Add lint npm script.
  • Log detailed info for sources with with crosswalk fields that don't appear in their .csv files.
  • Add Fuse.js package to search for possible matches of missing fields.
  • Check for JSON appearing in .csv files.
  • Add support for custom ogr2ogr command in .env file, to handle Windows usage.
  • Update README.md with Windows install info.
  • Fix "numberic" and other typos.
  • Remove the double format in identity-source.js.

Running npm run download and then npm run lint will check the crosswalk fields specified in each source against its downloaded .csv file. Fields that aren't found in the header row of the file are listed, along with any column names that are similar. Example output for Escondido:

    escondido
        C:\path\to\repo\wtt_area\data\raw\escondido.csv
        ref: TREEID
        scientific: BOTANICAL => Maybe: BOTANICAL_NAME
        common: COMMON => Maybe: COMMON_NAME
        height: Height => Maybe: HEIGHT_RANGE, ACTUALHEIGHT
        health: CONDITION => Maybe: TREE_CONDITION

#20 and #21 should be merged before this PR.

@fwextensions fwextensions force-pushed the jdunning/feature/2022/data-lint branch 2 times, most recently from 069ed5c to 960f208 Compare April 17, 2022 02:20
@fwextensions fwextensions marked this pull request as ready for review April 17, 2022 02:21
@zoobot
Copy link
Member

zoobot commented Apr 17, 2022

Here's code conventions for this project.
Please separate different issues into separate PRs if you want contributions reviewed and merged, thanks.
This looks like it can be broken up into 3-5 different PRs.

@zoobot
Copy link
Member

zoobot commented Apr 27, 2022

@fwextensions Mind if I branch off this and break it up into multiple PRs?
I want to have linting separate from custom ogr2ogr and source formatting.

Use proper file URLs when dynamically importing JS files, instead of relying on absolute macOS file paths looking like parts of URLs.
Generate paths only in config.js.
@fwextensions fwextensions force-pushed the jdunning/feature/2022/data-lint branch 2 times, most recently from 70a89f7 to efd34aa Compare April 28, 2022 01:25
Remove the double `format` in identity-source.js.
Log detailed info for sources with missing fields.
Add Fuse.js package to search for possible matches of missing fields.
Fix "numberic" and other typos.
Check for JSON appearing in .csv files.
Add support for custom ogr2ogr command in .env file, to handle Windows usage.
Update README.md with Windows install info.
@fwextensions fwextensions force-pushed the jdunning/feature/2022/data-lint branch from efd34aa to 35d6417 Compare April 28, 2022 01:28
@fwextensions
Copy link
Member Author

Making ogr work on Windows is just a few lines of code, but feel free.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants