Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge trees to database #48

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Merge trees to database #48

wants to merge 4 commits into from

Conversation

tzinckgraf
Copy link
Contributor

@tzinckgraf tzinckgraf commented Feb 1, 2023

This is the first iteration for keeping the tree data in the database.

This PR introduces two new tables, trees and trees_staging.
The trees table would ideally replace the treedata table. The treedata table is currently a mix of time based data and reference data. There was a discussion around normalizing it further. However, we can also use the original treedata table instead of the proposed tree table with minimal updates. There is a migration file that shows how the data is interchangeable between those two.
The trees_staging table is a unlogged table. This is similar to a temporary table in that there is no write to the WAL. It differs in that the data is global. This means you can write with one connection and then use that data with another. A temporary table would need to maintain data and operations on the same connection, which is a problem when using something like ogr2ogr for writing data.
There is also a stored procedure that merges the two tables. This does the full upsert logic in three parts using a temporary table. The procedure is idempotent.

- added a script to fetch the tables dynamically and pull schemas into
files
- added the table schema for individual tables
- updated README with database update instructions
- First iteration to create a table to hold all tree data
- data was taken based on the fields pulled in the tree-source repo
- added a merge stored procedure split into 3 different pieces of the
upserts for performance
- staging table is an unlogged table to prevent WAL writes
- indexes added similar to the treedata table
- migration script to move the treedata data into the tree table
Copy link
Member

@zoobot zoobot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice start to this!!

For now we probably should use for db serial id_source_staging or something to adhere to the convention of the other table's serials so as not to be confusing. We are using id as the id that get's passed around to the front end. In the tree-sources code base id is used as the city id(basically the city name) and also id for the treedata's tree id that we create(not the db's serialized version) Basically we are currently using the db's serial for nothing at all.

Are tree and tree_staging temporary tables?
If these are temporary tables, can we name them after the tables they are merging into?

Can you add an in depth commit description for this? :)

Also just putting this here for your perusal:
https://standards.opencouncildata.org/#/trees
We've deviated from it. We should maybe submit a PR requesting more fields...

drop table if exists _tree;

create temporary table _tree as
SELECT t.id as id_tree,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id and id_tree are different. id is the id that we create with the tree-id repo, id_tree is the db serial

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_tree for the purposes of this stored procedure just let me know we are using the id that connects to the tree table, as opposed to the staging table. The difference is mainly used below for determining missing / matching data.

create temporary table _tree as
SELECT t.id as id_tree,
ts.id as id_tree_staging,
ts.ref as id_reference,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_reference is what we were using and then we moved it to ref because of the open data standards. Unfortunately it's special in react so ref kind of is confusing as an open data standard, imo.

@@ -0,0 +1,86 @@
-- drop table public.tree
CREATE TABLE public.tree (
id bigserial NOT NULL primary key,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_treedata_staging

Copy link
Contributor Author

@tzinckgraf tzinckgraf Feb 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an id based on one we create

@@ -0,0 +1,86 @@
-- drop table public.tree
CREATE TABLE public.tree (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treedata_staging

SELECT 1;

-- drop table public.tree_staging;
CREATE UNLOGGED TABLE public.tree_staging (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tree_sources_staging

country character varying(255),
neighborhood character varying(255),
health character varying(255),
dbh double precision,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbh_min and dbh_max

@zoobot
Copy link
Member

zoobot commented Jul 27, 2023

HI @tzinckgraf Are you interested in working on this anymore or should I un-assign?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Help Wanted
Development

Successfully merging this pull request may close these issues.

Sources Integration with FE and vector tiles and db
2 participants