Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(looker): represent looker views dependencies from derived tables #21689

Merged
merged 1 commit into from
May 10, 2024

Conversation

rexledesma
Copy link
Contributor

@rexledesma rexledesma commented May 7, 2024

Summary & Motivation

Introduce sqlglot to parse Looker views that are defined with a derived_table.

The annoying part here is that the SQL here is potentially templated: https://cloud.google.com/looker/docs/sql-and-referring-to-lookml#substitution_operator_. In this case, don't give it upstream dependencies.

In a future PR, we can potentially look at a way to use regex to template the names into file, using a context dictionary assembled from the total aggregate of *.view.lkml files.

How I Tested These Changes

pytest

  • SQL with the $ operator will have an empty set of upstream dependencies
  • SQL without the $ operator but cannot be parsed by sqlglot will have an empty set of upstream dependencies

Copy link
Contributor Author

rexledesma commented May 7, 2024

@rexledesma rexledesma force-pushed the rl/represent-looker-views-with-derived-tables branch from 0ea1fd8 to a4e4013 Compare May 7, 2024 16:25
@rexledesma rexledesma force-pushed the rl/represent-looker-views branch 2 times, most recently from f24a790 to e02258d Compare May 7, 2024 18:28
Base automatically changed from rl/represent-looker-views to master May 7, 2024 18:29
@rexledesma rexledesma force-pushed the rl/represent-looker-views-with-derived-tables branch 2 times, most recently from 776a176 to 5adb9a7 Compare May 9, 2024 22:17
@rexledesma rexledesma requested review from benpankow and sryza May 9, 2024 22:21
@rexledesma rexledesma marked this pull request as ready for review May 9, 2024 22:21
@rexledesma rexledesma force-pushed the rl/represent-looker-views-with-derived-tables branch from 5adb9a7 to 3f96520 Compare May 10, 2024 15:08
Copy link
Member

@benpankow benpankow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks good to me - would be good to test some of the failure cases (unparsable SQL and also templated SQL that can't be parsed) to make sure we get the expected fallback/no dependencies & error message.

# https://cloud.google.com/looker/docs/derived-tables
derived_table_sql = view.get("derived_table", {}).get("sql")

if derived_table_sql and "$" in derived_table_sql:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can more definitively check if this is a $ operator vs e.g. a $ embedded in a string? Can't imagine it'd be common, but there's probably some cases where it's appearing in a query that's not indicative of templating.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Silly example being some column where a user is concatenating a dollar sign string to an integer value to represent a cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is the tricky part... Looker SQL is potentially parameterized by two differing mechanisms:

  1. The ${...} substitution operator: https://cloud.google.com/looker/docs/sql-and-referring-to-lookml
  2. Liquid variable referencing: https://cloud.google.com/looker/docs/liquid-variable-reference

We can probably detect the presence of (1) using regex, instead of a naive "$" in sql check.

Unsure how to resolve (2) other than actually rendering the templates with an environment of variables.

# https://cloud.google.com/looker/docs/derived-tables
derived_table_sql = view.get("derived_table", {}).get("sql")

if derived_table_sql and "$" in derived_table_sql:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can more definitively check if this is a $ operator vs e.g. a $ embedded in a string? Can't imagine it'd be common, but there's probably some cases where it's appearing in a query that's not indicative of templating.

@rexledesma rexledesma force-pushed the rl/represent-looker-views-with-derived-tables branch from 3f96520 to 89027bf Compare May 10, 2024 19:24
@rexledesma rexledesma force-pushed the rl/represent-looker-views-with-derived-tables branch from 89027bf to 2331be9 Compare May 10, 2024 20:46
@rexledesma rexledesma merged commit cb2badb into master May 10, 2024
1 check was pending
@rexledesma rexledesma deleted the rl/represent-looker-views-with-derived-tables branch May 10, 2024 20:49
danielgafni pushed a commit to danielgafni/dagster that referenced this pull request Jun 18, 2024
…dagster-io#21689)

## Summary & Motivation

Introduce `sqlglot` to parse Looker views that are defined with a
`derived_table`.

The annoying part here is that the SQL here is potentially templated:
https://cloud.google.com/looker/docs/sql-and-referring-to-lookml#substitution_operator_.
In this case, don't give it upstream dependencies.

In a future PR, we can potentially look at a way to use regex to
template the names into file, using a context dictionary assembled from
the total aggregate of `*.view.lkml` files.

## How I Tested These Changes
pytest

- SQL with the `$` operator will have an empty set of upstream
dependencies
- SQL without the `$` operator but cannot be parsed by `sqlglot` will
have an empty set of upstream dependencies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants