Current workflow:

This is repo contains scripts to manipulate LookML files, and to generate dependency graphs for each explore.

Current workflow:

Script parser.py contains many file actions, including splitting up views into view files containing single views, splitting up explore files into single explores, getting joined views from explore statements.
Script parsing_model.py contains these steps:

1) Split up Model files to Explore paylods

2) Parse Explores and retrieve Explore metadata.

3) Before processing views, traverse through all child folders, move views to main view folders.

4) Split up View files to base Views.

5) Parse base Views and retrieve View metadata.
Script sourcing_explore.py generates "maps" for each view that is referenced within each explore (with specific connection that contains details about the data warehouse and schemas).
Script graphing_explore.py generates graphs for each explore, with nodes representing view names and base table names.

Tests are stored in test_folder, with sample models, explores and view files for testing src scripts.

Diagram as of 2019-10-15:

Sample dependency chart:

Ideal goals:

The deliverable will be a static webpage (something like dbt does for model lineage visualization), which parses Lookml files from an S3 bucket. The S3 bucket is being updated on a set schedule (hourly/daily), to reflect the near real-time business logics within the Looker ecosystem.

A tree-shaped diagram indicates all depencencies from the main node (parent node). The data source will be traced back to either a pdt table or a raw event table from the databases in Snowflake.

There are two parts of Lookml lineage:

Front-end: Dashboards, Looks, Explores (renamed by view_label), Explore Label groups. Content Validator is the test tool.
Back-end: Models with multiples explores (with original explore name), view names (reference view file names), and folder names.

On the Snowflake side, all lineages are considered as "Back-end". The destination event tables will be quoted as their real table paths within databases, except for the "persistent derived table (pdt)" that are generated by Looker. In this case, we will extract the human readable pdt name with the _pdt suffix, avoiding random strings that are generated at each runtime.

One question to tackle when we move on to Phase two, is that:

How do we stitch together the front-end and back-end lineage for an end-to-end monitoring and tracking process?

The even bigger picture of this project consists at least the following parts:

An end-to-end near real-time tracker system for all data lineage from the raw event sources in the BI databases (Snowflake).
Embed a unit testing system based on the main (most popular) queries to GitHub PR, together with content validator.
Alerting system for any yellow light or red light data paths that are experiencing latency, performance issue, or failure at query time.
Cost analysis, cost of query per explore, tie back to the team usage of Looker.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.circleci		.circleci
explores		explores
graphs		graphs
maps		maps
models		models
models_looker		models_looker
src		src
test_folder		test_folder
views		views
.DS_Store		.DS_Store
.coverage		.coverage
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
sample.png		sample.png
testing_ci		testing_ci
work_flow_diagram.png		work_flow_diagram.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Current workflow:

Diagram as of 2019-10-15:

Sample dependency chart:

Ideal goals:

The even bigger picture of this project consists at least the following parts:

About

Releases

Packages

Contributors 3

Languages

License

vickyjkwan/gazetteer

Folders and files

Latest commit

History

Repository files navigation

Current workflow:

Diagram as of 2019-10-15:

Sample dependency chart:

Ideal goals:

The even bigger picture of this project consists at least the following parts:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages