Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure when hidden partition column name conflicted in Iceberg #22366

Merged
merged 1 commit into from
Jul 4, 2024

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Jun 12, 2024

Description

Iceberg connector caused query failure when the hidden partition column name conflicts with other columns.

The approach is different from Spark. They throw an exception when conflicts happen:

CREATE TABLE test (ts timestamp, ts_day int) USING iceberg PARTITIONED BY (day(ts));
Failed in [create table test (ts timestamp, ts_day int) using iceberg partitioned by (day(ts))]
java.lang.IllegalArgumentException: Cannot create partition from name that exists in schema: ts_day

Fixes #22351

Release notes

(x) Release notes are required, with the following suggested text:

# Iceberg
* Fix failure when hidden partition name conflicted. ({issue}`22351`)

@cla-bot cla-bot bot added the cla-signed label Jun 12, 2024
@github-actions github-actions bot added docs iceberg Iceberg connector labels Jun 12, 2024
@ebyhr ebyhr marked this pull request as draft June 12, 2024 06:11
@ebyhr ebyhr force-pushed the ebi/iceberg-column-name-conflicts branch from 318913c to a125e1a Compare June 12, 2024 06:41
@ebyhr ebyhr marked this pull request as ready for review June 12, 2024 06:41
@ebyhr ebyhr force-pushed the ebi/iceberg-column-name-conflicts branch from a125e1a to b5a32cb Compare June 12, 2024 07:41
@findepi
Copy link
Member

findepi commented Jun 12, 2024

I am not against supporting explicit aliases, but i also think that auto-generated names shouldn't conflict with user-provided names without extra effort from the user, so I am not convinced this PR is needed.

IMO the logic in parsePartitionField should automatically deconflict the names.

Once we have this, the only remaning source of name collisions is when user wants to add a new column that overlaps a partitioning field. If we wanted to solve this problem too, the right resolution for this would be allow user to rename existing partitioning field.

@ebyhr ebyhr force-pushed the ebi/iceberg-column-name-conflicts branch 2 times, most recently from a9f00e5 to ff080bf Compare June 19, 2024 06:26
@ebyhr ebyhr changed the title Add support for alias in Iceberg partitioning Fix failure when hidden partition column name conflicted in Iceberg Jun 19, 2024
@ebyhr ebyhr force-pushed the ebi/iceberg-column-name-conflicts branch from ff080bf to 5e54bf8 Compare June 24, 2024 04:29
@ebyhr ebyhr force-pushed the ebi/iceberg-column-name-conflicts branch from 5e54bf8 to 6cd1cc9 Compare June 26, 2024 00:59
@ebyhr ebyhr requested review from raunaqmorarka and removed request for findepi July 3, 2024 04:09
@ebyhr ebyhr merged commit eb161e4 into trinodb:master Jul 4, 2024
49 checks passed
@ebyhr ebyhr deleted the ebi/iceberg-column-name-conflicts branch July 4, 2024 23:08
@github-actions github-actions bot added this to the 452 milestone Jul 4, 2024
return;
}
catch (IllegalArgumentException e) {
if (e.getMessage().contains("Cannot create partition from name that exists in schema")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you by chance open a ticket on iceberg to have a more structured way of identifying such a problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 participants