Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36537: [Python] Ensure dataset writer follows default Parquet version of 2.6 #36538

Merged
merged 2 commits into from
Jul 7, 2023

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jul 7, 2023

Rationale for this change

When bumping the default Parquet write version from 1.0 to 2.4 and then to 2.6, we forgot to also bump that default in the parquet.dataset writer (ParquetFileWriteOptions).

This PR bumps that directly from 1.0 to 2.6 to follow the default of the pyarrow.parquet module.

Follow-up on #36137

Are these changes tested?

Yes

Are there any user-facing changes?

Different default version can give different types in the parquet file.

@github-actions
Copy link

github-actions bot commented Jul 7, 2023

⚠️ GitHub issue #36537 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 7, 2023
@jorisvandenbossche
Copy link
Member Author

@github-actions crossbow submit -g integration

@github-actions
Copy link

github-actions bot commented Jul 7, 2023

Revision: ea69499

Submitted crossbow builds: ursacomputing/crossbow @ actions-87a2b14740

Task Status
test-conda-python-3.10-hdfs-2.9.2 Github Actions
test-conda-python-3.10-hdfs-3.2.1 Github Actions
test-conda-python-3.10-pandas-latest Github Actions
test-conda-python-3.10-pandas-nightly Github Actions
test-conda-python-3.10-spark-master Github Actions
test-conda-python-3.11-dask-latest Github Actions
test-conda-python-3.11-dask-upstream_devel Github Actions
test-conda-python-3.11-pandas-upstream_devel Github Actions
test-conda-python-3.8-pandas-1.0 Github Actions
test-conda-python-3.8-spark-v3.1.2 Github Actions
test-conda-python-3.9-pandas-latest Github Actions
test-conda-python-3.9-spark-v3.2.0 Github Actions

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jul 7, 2023
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jul 7, 2023
@jorisvandenbossche jorisvandenbossche merged commit f8256bd into apache:main Jul 7, 2023
10 of 11 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting changes Awaiting changes label Jul 7, 2023
@jorisvandenbossche jorisvandenbossche deleted the gh-36537 branch July 7, 2023 13:45
@conbench-apache-arrow
Copy link

Conbench analyzed the 6 benchmark runs on commit f8256bd6.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Writing Parquet with pyarrow.dataset still defaults to Parquet version 1.0
2 participants