-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[graph diffing] Class for computing diff between parent and branch asset graphs- FOU-22 #19681
Conversation
Current dependencies on/for this PR:
This stack of pull requests is managed by Graphite. |
f2727b0
to
0d98de7
Compare
0d98de7
to
cd12333
Compare
Deploy preview for dagit-core-storybook ready! ✅ Preview Built with commit cd12333. |
Deploy preview for dagit-storybook ready! ✅ Preview Built with commit cd12333. |
cd12333
to
846f9b3
Compare
5b13e97
to
f70d1b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dropping some comments even though this is still in draft, overall i think this is pretty much landable as a base to get started
python_modules/dagster/dagster/_core/definitions/parent_asset_graph_differ.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/parent_asset_graph_differ.py
Outdated
Show resolved
Hide resolved
.../dagster_tests/general_tests/parent_asset_graph_differ_tests/test_parent_asset_graph_diff.py
Outdated
Show resolved
Hide resolved
.../dagster_tests/general_tests/parent_asset_graph_differ_tests/test_parent_asset_graph_diff.py
Outdated
Show resolved
Hide resolved
.../dagster_tests/general_tests/parent_asset_graph_differ_tests/test_parent_asset_graph_diff.py
Outdated
Show resolved
Hide resolved
from dagster import Definitions, asset | ||
|
||
|
||
@asset | ||
def upstream(): | ||
return 1 | ||
|
||
|
||
@asset | ||
def downstream(upstream): | ||
return upstream + 1 | ||
|
||
|
||
defs = Definitions(assets=[upstream, downstream]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[1]
.../dagster_tests/general_tests/parent_asset_graph_differ_tests/test_parent_asset_graph_diff.py
Outdated
Show resolved
Hide resolved
python_modules/dagster/dagster/_core/definitions/parent_asset_graph_differ.py
Outdated
Show resolved
Hide resolved
class ChangeReason(Enum): | ||
NEW = "NEW" | ||
CODE_VERSION = "CODE_VERSION" | ||
INPUTS = "INPUTS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd to see inputs but not outputs but we can get that in a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i wasn't quite sure what would count as a change in output since each asset key should only have one output. we could inspect the type of the output maybe? it seemed worthy of more consideration so i didn't include it in this pass
branch_workspace: BaseWorkspaceRequestContext, | ||
parent_workspace: Optional[BaseWorkspaceRequestContext] = None, | ||
) -> Optional["ParentAssetGraphDiffer"]: | ||
"""Constructs a ParentAssetGraphDiffer for a particular repository in a code location for two | ||
deployment workspaces, the parent deployment and the branch deployment. If the | ||
parent_workspace is None, then we are not in a branch deployment and will not create a ParentAssetGraphDiffer. | ||
""" | ||
if parent_workspace is not None: | ||
branch_repo = _get_external_repo_from_context( | ||
branch_workspace, code_location_name, repository_name | ||
) | ||
if branch_repo is None: | ||
raise DagsterInvariantViolationError( | ||
f"Repository {repository_name} does not exist in code location {code_location_name} for the branch deployment." | ||
) | ||
parent_repo = _get_external_repo_from_context( | ||
parent_workspace, code_location_name, repository_name | ||
) | ||
return ParentAssetGraphDiffer( | ||
branch_asset_graph=lambda: ExternalAssetGraph.from_external_repository(branch_repo), | ||
parent_asset_graph=( | ||
lambda: ExternalAssetGraph.from_external_repository(parent_repo) | ||
) | ||
if parent_repo is not None | ||
else None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to just use ExternalAssetGraph.from_workspace
and remove this code location scoping, probably call this method from_workspaces
what led you to code location scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that if i just did from workspaces, I ran into issues with duplicate asset keys. I think it might be a larger issue in the AssetGraph
maybe?
Let's say the parent deployment looks like this:
code_location_1:
- asset_1
- asset_2
- asset3
code_location_2:
- asset_1
- asset_2
- asset_b
and the branch deployment we only modify asset_1
and asset_2
in code_location_1
.
I found that when i did from_workspace
the resulting AssetGraph would only have one occurrence of asset_1
and asset_2
, and that it could be from code_location_2
instead of code_location_1
which would make the comparison of the changes incorrect
… type of from_external_repositories
09feae0
to
e75f292
Compare
Summary & Motivation
Class that can compute the diff between a parent deployment asset graph and a branch deployment asset graph.
This first version can detect three types of changes: a completely new asset, an updated code version, and modified inputs.
To Dos:
How I Tested These Changes