Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the actual partition name from the storage in syncing partition metadata #22484

Merged

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Jun 24, 2024

Description

The canonical partition name built from the column name and values of the partition may differ from the actual storage location of the partition. This can lead to inconsistencies when syncing partition metadata, like identifying a partition to add/remove when there is actually nothing to be done.

Rely on the partition name from the storage location for identifying the partitions to sync via the sync_partition_metadata procedure call.

Additional context and related issues

This fix applies specifically where there is a partition name case sensitive variation between the canonical name and the storage location of the partition.

See io.trino.plugin.hive.TestHive3OnDataLake#testSyncPartitionCaseSensitivePathVariation for details.

Follow-up from #21168

Release notes

(x) Release notes are required, with the following suggested text:

# Hive
* Fix `sync_partition_metadata` to cope with case sensitive variations of the partition names on the storage. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jun 24, 2024
@github-actions github-actions bot added the hive Hive connector label Jun 24, 2024
@findinpath findinpath requested review from findepi and ebyhr June 24, 2024 09:56
.map(ImmutableSet::copyOf)
.orElseThrow(() -> new TableNotFoundException(schemaTableName));
String tableLocationDirectory = table.getStorage().getLocation().endsWith("/") ? table.getStorage().getLocation() : table.getStorage().getLocation() + "/";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use tableLocation here?
Also you could move this logic to getPartitionNameFromPartitionLocation as it's only used there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly modified the code.

@pajaks besides code cosmetics, do you have any concerns with regards to the bugfix?

@findinpath findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch from b99d0b8 to d55f9bf Compare July 1, 2024 17:14
@findinpath findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch from d55f9bf to 0e5eab6 Compare July 1, 2024 20:03
@findinpath findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch 2 times, most recently from 6dcc49c to 0bf6b32 Compare July 2, 2024 13:53
…etadata

The canonical partition name built from the column name and
values of the partition may differ in case from the actual storage
location of the partition.
This can lead to inconsistencies when syncing partition
metadata, like identifying a partition to add/remove when
there is actually nothing to be done.

In case of dealing with conventional partition locations,
rely on the partition name from the storage location for
identifying the partitions to sync via
the `sync_partition_metadata` procedure call.

An example of how to add through Hive a partition with
non-conventional location:

```
ALTER TABLE ... ADD PARTITION (...) LOCATION '...'
```
@findinpath findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch from 0bf6b32 to 96a822d Compare July 2, 2024 14:06
@@ -116,6 +119,45 @@ public void testSyncPartitionMetadataWithNullArgument()
super.testSyncPartitionMetadataWithNullArgument();
}

@Test(groups = SMOKE)
public void testAddNonConventionalHivePartition()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ebyhr ebyhr merged commit fb6f9be into trinodb:master Jul 3, 2024
57 checks passed
@github-actions github-actions bot added this to the 452 milestone Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector
Development

Successfully merging this pull request may close these issues.

None yet

5 participants