Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[freshness-refactor][2/n] Reorganize/Rename/Remove/Document CachingInstanceQueryer methods #12813

Merged
merged 11 commits into from
Mar 9, 2023

Conversation

OwenKephart
Copy link
Contributor

@OwenKephart OwenKephart commented Mar 8, 2023

Summary & Motivation

This does a few things:

  • Before, the methods on this object were essentially randomly distributed. This sorts/organizes them into general sections, to aid readability.
  • Some methods were defined, but never used anywhere in the code base. These were removed
  • Renamed some methods
  • We had an overly-complicated caching system. In particular, I removed the _no_materializations_after_cursor_cache from the class. Instead, to answer queries of the form "what is the latest materialization for asset_partition Y after cursor X", we simply look at the latest materialization in general for asset_partition Y, then compare its record_id to the cursor. If the cursor is greater than that record id, we know there are no materializations after that cursor, and return None. If the cursor is less than that record id, the latest materialization after that cursor is the latest overall materialization.
  • In general, I refactored the get_latest_materialization_record method to make more sense (at least to my brain)
  • Broke out the prefetch methods into two separate things. Ideally, we would only pass in asset keys associated with partitioned assets into the second method, but that's not a huge deal.
  • I added docstrings to a lot of methods. While some of these docstrings are probably unnecessary (as the functions they document are pretty trivial), I erred on the side of more documentation. This also (subjectively) helps visually break up the class.
  • Updated the function signatures of things that use cached_method to be of the form def foo(self, *, arg1, arg2) (stole this from @smackesey). This properly indicates to the user/type system that all arguments must be passed in as kwargs.

How I Tested These Changes

Unscientific testing seems to indicate that this has either zero or mildly-positive performance impact

@vercel
Copy link

vercel bot commented Mar 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

2 Ignored Deployments
Name Status Preview Comments Updated
dagit-storybook ⬜️ Ignored (Inspect) Mar 9, 2023 at 9:12PM (UTC)
dagster ⬜️ Ignored (Inspect) Mar 9, 2023 at 9:12PM (UTC)

# RECONCILIATION
####################

def get_latest_storage_id(self, event_type: DagsterEventType) -> Optional[int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is get_latest_storage_id particularly related to reconciliation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reasonable question -- considering this function is not actually caching anything and is only used once (in asset backfill land), it's debatable if it seems like it might be more appropriate to move this out of the class entirely.

Base automatically changed from owen/caching-refactor-1 to master March 8, 2023 23:22
@OwenKephart OwenKephart merged commit 796279a into master Mar 9, 2023
@OwenKephart OwenKephart deleted the owen/caching-refactor-2 branch March 9, 2023 21:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants