Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[graph diffing] Class for computing diff between parent and branch asset graphs- FOU-22 #19681

Merged
merged 20 commits into from
Feb 20, 2024

Conversation

jamiedemaria
Copy link
Contributor

@jamiedemaria jamiedemaria commented Feb 8, 2024

Summary & Motivation

Class that can compute the diff between a parent deployment asset graph and a branch deployment asset graph.

This first version can detect three types of changes: a completely new asset, an updated code version, and modified inputs.

To Dos:

  • Run a test with a completely new code location
  • Confirm if there are any duplicate key issues to consider

How I Tested These Changes

@jamiedemaria
Copy link
Contributor Author

jamiedemaria commented Feb 8, 2024

Current dependencies on/for this PR:

This stack of pull requests is managed by Graphite.

@jamiedemaria jamiedemaria changed the title Basic Graph Diff resolver [graph diffing] Basic Graph Diff resolver Feb 8, 2024
@jamiedemaria jamiedemaria changed the title [graph diffing] Basic Graph Diff resolver [graph diffing] Basic Graph Diff resolver - FOU-22 Feb 8, 2024
@jamiedemaria jamiedemaria marked this pull request as ready for review February 14, 2024 16:22
@jamiedemaria jamiedemaria changed the base branch from jamie/asset-graph-diff-simple-test to master February 14, 2024 16:25
@jamiedemaria jamiedemaria marked this pull request as draft February 14, 2024 16:25
Copy link

Deploy preview for dagit-core-storybook ready!

✅ Preview
https://dagit-core-storybook-ote12d0uv-elementl.vercel.app
https://jamie-basic-resolver.core-storybook.dagster-docs.io

Built with commit cd12333.
This pull request is being automatically deployed with vercel-action

Copy link

Deploy preview for dagit-storybook ready!

✅ Preview
https://dagit-storybook-k4pdudi5u-elementl.vercel.app
https://jamie-basic-resolver.components-storybook.dagster-docs.io

Built with commit cd12333.
This pull request is being automatically deployed with vercel-action

@erinkcochran87 erinkcochran87 removed their request for review February 14, 2024 16:30
@jamiedemaria jamiedemaria changed the title [graph diffing] Basic Graph Diff resolver - FOU-22 [graph diffing] Class for computing diff between parent and branch asset graphs- FOU-22 Feb 14, 2024
@jamiedemaria jamiedemaria force-pushed the jamie/basic-resolver branch 2 times, most recently from 5b13e97 to f70d1b5 Compare February 15, 2024 15:57
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropping some comments even though this is still in draft, overall i think this is pretty much landable as a base to get started

Comment on lines 1 to 14
from dagster import Definitions, asset


@asset
def upstream():
return 1


@asset
def downstream(upstream):
return upstream + 1


defs = Definitions(assets=[upstream, downstream])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[1]

@jamiedemaria jamiedemaria marked this pull request as ready for review February 15, 2024 21:04
class ChangeReason(Enum):
NEW = "NEW"
CODE_VERSION = "CODE_VERSION"
INPUTS = "INPUTS"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

odd to see inputs but not outputs but we can get that in a follow up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i wasn't quite sure what would count as a change in output since each asset key should only have one output. we could inspect the type of the output maybe? it seemed worthy of more consideration so i didn't include it in this pass

Comment on lines 78 to 102
branch_workspace: BaseWorkspaceRequestContext,
parent_workspace: Optional[BaseWorkspaceRequestContext] = None,
) -> Optional["ParentAssetGraphDiffer"]:
"""Constructs a ParentAssetGraphDiffer for a particular repository in a code location for two
deployment workspaces, the parent deployment and the branch deployment. If the
parent_workspace is None, then we are not in a branch deployment and will not create a ParentAssetGraphDiffer.
"""
if parent_workspace is not None:
branch_repo = _get_external_repo_from_context(
branch_workspace, code_location_name, repository_name
)
if branch_repo is None:
raise DagsterInvariantViolationError(
f"Repository {repository_name} does not exist in code location {code_location_name} for the branch deployment."
)
parent_repo = _get_external_repo_from_context(
parent_workspace, code_location_name, repository_name
)
return ParentAssetGraphDiffer(
branch_asset_graph=lambda: ExternalAssetGraph.from_external_repository(branch_repo),
parent_asset_graph=(
lambda: ExternalAssetGraph.from_external_repository(parent_repo)
)
if parent_repo is not None
else None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to just use ExternalAssetGraph.from_workspace and remove this code location scoping, probably call this method from_workspaces

what led you to code location scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that if i just did from workspaces, I ran into issues with duplicate asset keys. I think it might be a larger issue in the AssetGraph maybe?

Let's say the parent deployment looks like this:

code_location_1: 
     - asset_1
     - asset_2
     - asset3
code_location_2:
    - asset_1
    - asset_2
    - asset_b

and the branch deployment we only modify asset_1 and asset_2 in code_location_1.

I found that when i did from_workspace the resulting AssetGraph would only have one occurrence of asset_1 and asset_2, and that it could be from code_location_2 instead of code_location_1 which would make the comparison of the changes incorrect

@jamiedemaria jamiedemaria merged commit 2c7052b into master Feb 20, 2024
1 check passed
@jamiedemaria jamiedemaria deleted the jamie/basic-resolver branch February 20, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants