Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support value in MaterializeResult #16887

Open
jamiedemaria opened this issue Sep 28, 2023 · 5 comments
Open

support value in MaterializeResult #16887

jamiedemaria opened this issue Sep 28, 2023 · 5 comments

Comments

@jamiedemaria
Copy link
Contributor

jamiedemaria commented Sep 28, 2023

What's the use case?

With the introduction of MaterializeResult, we anticipate that the ability to store scalar values to access in downstream assets will be a request feature.

Example use case:

@asset 
def table_1():
     full_table_name = "my_schema.table_1"
     snowflake.query(f"CREATE TABLE {full_table_name}  ...")
    return MaterializeResult(value=full_table_name)

@asset(
    deps=[table_1]
)
def table_2(context):
     table_1_table_name = context.get_materialize_result_value(table_1)  # some API for getting the value from MaterializeResult
     snowflake.query(f"CREATE TABLE my_schema.table_2 AS SELECT * FROM {table_1_table_name} ...")

Some things to consider:

  • How will this value be stored? Will we just store the value or the entire MaterializeResult?
  • If the user has a default IO manager set that can't handle storing scalars or full MaterializeResult objects (for example, the DB io managers can only store tabular data), how should we handle that? MaterializeResults are supposed to be a way out of I/O manager world, so needing to do this
@asset(
    io_manager_key="file_system"
)
def table_1():
     full_table_name = "my_schema.table_1"
     snowflake.query(f"CREATE TABLE {full_table_name}  ...")
    return MaterializeResult(value=full_table_name)

would go against the purpose of MaterializeResult in the first place.

Ideas of implementation

  1. instead of converting MaterializeResult into Output(None), convert to Output(MaterializeResult(...)) and store the full object via IO manager
  2. extract the value from MaterializeResult and convert that to Output(materialize_result.value). Store that via the IO manager
    2a. As Sandy mentioned below, use the metadata system for storing the scalar instead. I'm liking this idea at the moment as it sidsteps the IO managers entirely, which stays more inline with the purpose of MaterializeResult

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

@sryza
Copy link
Contributor

sryza commented Sep 28, 2023

I wonder if it makes sense to use the metadata system for this? At least in the case above, materialization metadata would work well, and would also have the advantage of being consumable via the UI.

@jamiedemaria
Copy link
Contributor Author

Depending on the approach taken for storing value, we may also want to do something like this #16136 and bypass the IO manager when MaterializeResult is returned.

@yuhan
Copy link
Contributor

yuhan commented Dec 14, 2023

I wonder if it makes sense to use the metadata system for this? At least in the case above, materialization metadata would work well, and would also have the advantage of being consumable via the UI.

@sryza did you mean #8521 could an alternative to enabling this?

@sryza
Copy link
Contributor

sryza commented Dec 14, 2023

@yuhan exactly

@yuhan
Copy link
Contributor

yuhan commented Mar 28, 2024

with #20091, could we now expose upstream metadata through AssetExecutionContext now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants