Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change "free outputs" to also return MemoryDataSet entries from the catalog #1900

Closed
merelcht opened this issue Oct 5, 2022 · 3 comments
Closed
Assignees
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@merelcht
Copy link
Member

merelcht commented Oct 5, 2022

Description

session.run() currently returns "free outputs". What free_output means in the code is just "output that's not defined in the catalog", which is a subset of "output that's not a MemoryDataSet".

free_outputs = pipeline.outputs() - set(catalog.list())
unregistered_ds = pipeline.data_sets() - set(catalog.list())
for ds_name in unregistered_ds:
catalog.add(ds_name, self.create_default_data_set(ds_name))
if self._is_async:
self._logger.info(
"Asynchronous mode is enabled for loading and saving data"
)
self._run(pipeline, catalog, hook_manager, session_id)
self._logger.info("Pipeline execution completed successfully.")
return {ds_name: catalog.load(ds_name) for ds_name in free_outputs}

There was agreement that the "free outputs" output from session isn't very clear. It was suggested to simply return all output from nodes that is not consumed, even if it's defined in the catalog. However, this could lead to very large amounts of data being returned. Instead we'll change it to return all free outputs and additionally any MemoryDataSets that are defined in the catalog.

Context

#1802

@datajoely
Copy link
Contributor

The last line - are those explicitly defined MemoryDataSets or implicit ones?

@noklam
Copy link
Contributor

noklam commented Oct 5, 2022

I think this is a bug fix, rather than any behavioral changes. This happens if someone put MemoryDataSet in the catalog and session.run removes it from the output

free_outputs = pipeline.outputs() - set(catalog.list()) 
# This will be changed to
free_outputs = pipelines.outputs() - set(catalog_excluding_memory_dataset.list())

@noklam
Copy link
Contributor

noklam commented Jan 11, 2024

Already implemented in #3475.

@noklam noklam closed this as completed Jan 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
Archived in project
Development

No branches or pull requests

4 participants