Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature/tree-sources-staging #24

Merged
merged 6 commits into from
Mar 29, 2023
Merged

Conversation

tzinckgraf
Copy link
Contributor

Added a new merge step to the ETL process. The new merge step takes all the data from the geojson file and moves it into a staging table called tree_sources_staging. This is a way to keep the raw data from the sources into a database table.

This PR uses ogr2ogr to move data into the database. By default, data is moved 20k rows at a time. Postgres is all lowercase, and javascript uses camelcase, so we cannot easily use underscores in names without some bigger changes. However, that did not seem to be an issue.

feature/tree-sources-staging
Removed unused new library

Added a new merge step to the ETL process. The new merge step takes all
the data from the geojson file and moves it into a staging table called
`tree_sources_staging`. This is a way to keep the raw data from the
sources into a database table.

This PR uses ogr2ogr to move data into the database. By default, data is
moved 20k rows at a time. Postgres is all lowercase, and javascript uses
camelcase, so we cannot easily use underscores in names without some
bigger changes. However, that did not seem to be an issue.

feature/tree-sources-staging
Removed unused new library
src/stages/merge.js Outdated Show resolved Hide resolved
src/stages/merge.js Outdated Show resolved Hide resolved
Copy link
Member

@zoobot zoobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! So happy to see this in the source pipeline!!
Left a few comments but LGTM!

- added a new PG_USE_COPY argument. this improves performance by about
2-3x.
- cleaned up command to have shorter strings
- moved PG config to a separate variable
@zoobot
Copy link
Member

zoobot commented Mar 14, 2023

Feel free to merge this when you are ready @tzinckgraf . I think you have to resolve all conversations. Then use squash and merge. Thanks for this awesome contribution!!

@zoobot
Copy link
Member

zoobot commented Mar 20, 2023

@tzinckgraf Is it ready to go in or still in draft?

@tzinckgraf
Copy link
Contributor Author

tzinckgraf commented Mar 20, 2023

@zoobot I have to go through and resolve the conversations, then this is good to go. I can have that done by Wednesday night's meeting. Working on getting the demo / PR ready for this Wednesday first.
I think the only thing left is converting that exec to a spawn, then this is good.

src/stages/merge.js Outdated Show resolved Hide resolved
@tzinckgraf tzinckgraf marked this pull request as draft March 26, 2023 22:07
- update the ogr2ogr command to transform the sql converting camelCase
to snake_case
- convert database table to treedata_staging
- use spawn instead of exec
- truncate the staging table as the first step

Added a new line for a little clean up
@tzinckgraf tzinckgraf marked this pull request as ready for review March 27, 2023 01:18
src/stages/merge.js Outdated Show resolved Hide resolved
Copy link
Member

@zoobot zoobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! nice work @tzinckgraf

src/stages/merge.js Show resolved Hide resolved
@tzinckgraf tzinckgraf merged commit 7830a73 into main Mar 29, 2023
@tzinckgraf tzinckgraf deleted the feature/tree-sources-staging branch March 29, 2023 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Help Wanted
Development

Successfully merging this pull request may close these issues.

None yet

2 participants