[KED-1642] `index` option in `pandas.ParquetDataSet` #352

juan-carlos-calvo · 2020-05-04T23:11:39Z

Description

ParquetLocalDataSet accepts the index save option which specifies whether to save the index in parquet or not. pandas.ParquetDataSet doesn't have an equivalent option at the moment.

Context

As the datasets in kedro.io are going to be deprecated and moved to kedro.extras.datasets kedro should have backward compatibility with the load_args save_args options. Also, index is a useful option to have.

Possible Implementation

at the _save method of pandas.ParquetDataSet substitute:

table = pa.Table.from_pandas(data)

with

preserve_index = self._save_args.pop('index', False)
table = pa.Table.from_pandas(data, preserve_index=preserve_index)

The text was updated successfully, but these errors were encountered:

921kiyo · 2020-05-05T09:55:34Z

@juan-carlos-calvo Thank you for opening the issue! This makes sense to me. I've logged this in our backlog and will fix it :)

andrii-ivaniuk · 2020-05-27T09:39:00Z

Thanks @juan-carlos-calvo for reporting this.
It was fixed in 5330b45 commit.

The arguments for from_pandas() should be passed through a nested key: from_pandas. E.g.: save_args = {"from_pandas": {"preserve_index": False}}

Release 0.15.5

juan-carlos-calvo added the Issue: Feature Request New feature or improvement to existing feature label May 4, 2020

921kiyo changed the title ~~index option in pandas.ParquetDataSet~~ [KED-1642] index option in pandas.ParquetDataSet May 5, 2020

andrii-ivaniuk closed this as completed May 27, 2020

pull bot pushed a commit to FoundryAI/kedro that referenced this issue Jul 17, 2020

Merge pull request kedro-org#352 from quantumblacklabs/release/0.15.5

98a6c8f

Release 0.15.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KED-1642] `index` option in `pandas.ParquetDataSet` #352

[KED-1642] `index` option in `pandas.ParquetDataSet` #352

juan-carlos-calvo commented May 4, 2020 •

edited

Loading

921kiyo commented May 5, 2020

andrii-ivaniuk commented May 27, 2020 •

edited

Loading

[KED-1642] index option in pandas.ParquetDataSet #352

[KED-1642] index option in pandas.ParquetDataSet #352

Comments

juan-carlos-calvo commented May 4, 2020 • edited Loading

Description

Context

Possible Implementation

921kiyo commented May 5, 2020

andrii-ivaniuk commented May 27, 2020 • edited Loading

[KED-1642] `index` option in `pandas.ParquetDataSet` #352

[KED-1642] `index` option in `pandas.ParquetDataSet` #352

juan-carlos-calvo commented May 4, 2020 •

edited

Loading

andrii-ivaniuk commented May 27, 2020 •

edited

Loading