Change "free outputs" to also return `MemoryDataSet` entries from the catalog #1900

merelcht · 2022-10-05T14:08:12Z

Description

session.run() currently returns "free outputs". What free_output means in the code is just "output that's not defined in the catalog", which is a subset of "output that's not a MemoryDataSet".

kedro/kedro/runner/runner.py

Lines 78 to 91 in f491420

 free_outputs = pipeline.outputs() - set(catalog.list()) 

 unregistered_ds = pipeline.data_sets() - set(catalog.list()) 

 for ds_name in unregistered_ds: 

 catalog.add(ds_name, self.create_default_data_set(ds_name)) 

 if self._is_async: 

 self._logger.info( 

 "Asynchronous mode is enabled for loading and saving data" 

 ) 

 self._run(pipeline, catalog, hook_manager, session_id) 

 self._logger.info("Pipeline execution completed successfully.") 

 return {ds_name: catalog.load(ds_name) for ds_name in free_outputs}

There was agreement that the "free outputs" output from session isn't very clear. It was suggested to simply return all output from nodes that is not consumed, even if it's defined in the catalog. However, this could lead to very large amounts of data being returned. Instead we'll change it to return all free outputs and additionally any MemoryDataSets that are defined in the catalog.

Context

#1802

The text was updated successfully, but these errors were encountered:

datajoely · 2022-10-05T14:34:32Z

The last line - are those explicitly defined MemoryDataSets or implicit ones?

noklam · 2022-10-05T15:00:29Z

I think this is a bug fix, rather than any behavioral changes. This happens if someone put MemoryDataSet in the catalog and session.run removes it from the output

free_outputs = pipeline.outputs() - set(catalog.list()) 
# This will be changed to
free_outputs = pipelines.outputs() - set(catalog_excluding_memory_dataset.list())

noklam · 2024-01-11T16:57:28Z

Already implemented in #3475.

merelcht mentioned this issue Oct 5, 2022

Should we change the output of session.run? #1802

Closed

1 task

merelcht added this to the Improve the Interactive Jupyter notebook workflow milestone Feb 6, 2023

noklam added the Issue: Bug Report 🐞 Bug that needs to be fixed label Mar 22, 2023

noklam modified the milestones: Improve the Interactive Jupyter notebook workflow, Improving the debugging experience with Notebook Mar 22, 2023

merelcht mentioned this issue Apr 12, 2023

Make return value of session run until dataset consistent #2106

Closed

SajidAlamQB self-assigned this Jan 2, 2024

SajidAlamQB mentioned this issue Jan 3, 2024

Add MemoryDataset entries to free_outputs #3475

Merged

7 tasks

noklam closed this as completed Jan 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change "free outputs" to also return `MemoryDataSet` entries from the catalog #1900

Change "free outputs" to also return `MemoryDataSet` entries from the catalog #1900

merelcht commented Oct 5, 2022

datajoely commented Oct 5, 2022

noklam commented Oct 5, 2022 •

edited

Loading

noklam commented Jan 11, 2024

Change "free outputs" to also return MemoryDataSet entries from the catalog #1900

Change "free outputs" to also return MemoryDataSet entries from the catalog #1900

Comments

merelcht commented Oct 5, 2022

Description

Context

datajoely commented Oct 5, 2022

noklam commented Oct 5, 2022 • edited Loading

noklam commented Jan 11, 2024

Change "free outputs" to also return `MemoryDataSet` entries from the catalog #1900

Change "free outputs" to also return `MemoryDataSet` entries from the catalog #1900

noklam commented Oct 5, 2022 •

edited

Loading