Use the actual partition name from the storage in syncing partition metadata #22484

findinpath · 2024-06-24T09:53:15Z

Description

The canonical partition name built from the column name and values of the partition may differ from the actual storage location of the partition. This can lead to inconsistencies when syncing partition metadata, like identifying a partition to add/remove when there is actually nothing to be done.

Rely on the partition name from the storage location for identifying the partitions to sync via the sync_partition_metadata procedure call.

Additional context and related issues

This fix applies specifically where there is a partition name case sensitive variation between the canonical name and the storage location of the partition.

See io.trino.plugin.hive.TestHive3OnDataLake#testSyncPartitionCaseSensitivePathVariation for details.

Follow-up from #21168

Release notes

(x) Release notes are required, with the following suggested text:

# Hive
* Fix `sync_partition_metadata` to cope with case sensitive variations of the partition names on the storage. ({issue}`issuenumber`)

pajaks · 2024-06-27T14:45:11Z

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java

 .map(ImmutableSet::copyOf)
 .orElseThrow(() -> new TableNotFoundException(schemaTableName));
+ String tableLocationDirectory = table.getStorage().getLocation().endsWith("/") ? table.getStorage().getLocation() : table.getStorage().getLocation() + "/";


Can you use tableLocation here?
Also you could move this logic to getPartitionNameFromPartitionLocation as it's only used there.

Slightly modified the code.

@pajaks besides code cosmetics, do you have any concerns with regards to the bugfix?

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java

…etadata The canonical partition name built from the column name and values of the partition may differ in case from the actual storage location of the partition. This can lead to inconsistencies when syncing partition metadata, like identifying a partition to add/remove when there is actually nothing to be done. In case of dealing with conventional partition locations, rely on the partition name from the storage location for identifying the partitions to sync via the `sync_partition_metadata` procedure call. An example of how to add through Hive a partition with non-conventional location: ``` ALTER TABLE ... ADD PARTITION (...) LOCATION '...' ```

…acters

anusudarsan · 2024-07-02T14:24:40Z

...o-product-tests/src/main/java/io/trino/tests/product/hive/TestHdfsSyncPartitionMetadata.java

@@ -116,6 +119,45 @@ public void testSyncPartitionMetadataWithNullArgument()
 super.testSyncPartitionMetadataWithNullArgument();
 }

+ @Test(groups = SMOKE)
+ public void testAddNonConventionalHivePartition()


cla-bot bot added the cla-signed label Jun 24, 2024

github-actions bot added the hive Hive connector label Jun 24, 2024

findinpath requested review from findepi and ebyhr June 24, 2024 09:56

pajaks reviewed Jun 27, 2024

View reviewed changes

findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch from b99d0b8 to d55f9bf Compare July 1, 2024 17:14

anusudarsan reviewed Jul 1, 2024

View reviewed changes

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java Outdated Show resolved Hide resolved

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java Show resolved Hide resolved

electrum approved these changes Jul 1, 2024

View reviewed changes

findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch from d55f9bf to 0e5eab6 Compare July 1, 2024 20:03

findinpath requested review from anusudarsan and electrum July 1, 2024 20:04

ebyhr reviewed Jul 2, 2024

View reviewed changes

.../trino-hive/src/main/java/io/trino/plugin/hive/procedure/SyncPartitionMetadataProcedure.java Outdated Show resolved Hide resolved

findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch 2 times, most recently from 6dcc49c to 0bf6b32 Compare July 2, 2024 13:53

findinpath added 2 commits July 2, 2024 16:06

Test syncing partition metadata on partition values with special char…

96a822d

…acters

findinpath force-pushed the findinpath/sync_partition_metadata_bugfix branch from 0bf6b32 to 96a822d Compare July 2, 2024 14:06

anusudarsan approved these changes Jul 2, 2024

View reviewed changes

ebyhr approved these changes Jul 3, 2024

View reviewed changes

ebyhr merged commit fb6f9be into trinodb:master Jul 3, 2024
57 checks passed

github-actions bot added this to the 452 milestone Jul 3, 2024

colebow mentioned this pull request Jul 3, 2024

Add Trino 452 release notes #22573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the actual partition name from the storage in syncing partition metadata #22484

Use the actual partition name from the storage in syncing partition metadata #22484

findinpath commented Jun 24, 2024 •

edited by ebyhr

Loading

pajaks Jun 27, 2024

findinpath Jul 1, 2024

anusudarsan Jul 2, 2024

Use the actual partition name from the storage in syncing partition metadata #22484

Use the actual partition name from the storage in syncing partition metadata #22484

Conversation

findinpath commented Jun 24, 2024 • edited by ebyhr Loading

Description

Additional context and related issues

Release notes

pajaks Jun 27, 2024

Choose a reason for hiding this comment

findinpath Jul 1, 2024

Choose a reason for hiding this comment

anusudarsan Jul 2, 2024

Choose a reason for hiding this comment

findinpath commented Jun 24, 2024 •

edited by ebyhr

Loading