Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glue v2 #20657

Merged
merged 6 commits into from
May 10, 2024
Merged

Glue v2 #20657

merged 6 commits into from
May 10, 2024

Conversation

dain
Copy link
Member

@dain dain commented Feb 11, 2024

Description

New metstore implementation based in the Glue v2 apis.
New caching caching system built directly into Glue which supports caching of the full objects returned during listing operations.

Release notes

(X) Release notes are required, with the following suggested text:

# Hive/Delta (not Iceberg)
* Update Glue to V2 REST interface. The old implementation can be restored by setting `hive.metastore="glue-v1`, but this will eventually be removed. ({issue}`20657 `)
* Add custom cache for Glue metadata.

@cla-bot cla-bot bot added the cla-signed label Feb 11, 2024
@github-actions github-actions bot added tests:hive iceberg Iceberg connector delta-lake Delta Lake connector hive Hive connector labels Feb 11, 2024
@dain dain force-pushed the glue-v2 branch 3 times, most recently from 898ec70 to db30322 Compare February 12, 2024 05:50
@findinpath
Copy link
Contributor

FYI there is also another PR tackling the AWS Glue library update #17866

@dain
Copy link
Member Author

dain commented Feb 12, 2024

@findinpath this is closer to a full rewrite than an upgrade, so this would invalidate that other PR

Copy link

github-actions bot commented Mar 5, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Mar 5, 2024
@dain dain force-pushed the glue-v2 branch 3 times, most recently from 3689143 to 7a088a5 Compare March 17, 2024 20:15
@github-actions github-actions bot removed the stale label Mar 18, 2024
@dain dain changed the title WIP: Glue v2 (DO NOT REVIEW) Glue v2 Mar 18, 2024
@dain dain requested a review from electrum March 18, 2024 22:04
@dain dain force-pushed the glue-v2 branch 2 times, most recently from 8240afc to 4c7100c Compare March 21, 2024 02:32
@dain dain merged commit 94dedf5 into master May 10, 2024
104 of 107 checks passed
@dain dain deleted the glue-v2 branch May 10, 2024 05:47
@github-actions github-actions bot added this to the 448 milestone May 10, 2024
@@ -199,7 +199,7 @@ public GlueHiveMetastore(
config.getPartitionSegments(),
config.isAssumeCanonicalPartitionKeys(),
visibleTableKinds,
taskWrapping(newThreadPerTaskExecutor(Thread.ofVirtual().name("glue-", 0L).factory())));
newFixedThreadPool(config.getThreads(), Thread.ofPlatform().name("glue-", 0L).factory()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch Glue metastore to platform threads

why this particular change?

(i like it because it makes it possible to detect thread leaks. the API used by #21470 doesn't report virtual threads currently, but i assume this is not the reason)

@sudohainguyen
Copy link
Contributor

sudohainguyen commented May 24, 2024

jmx metrics io.trino.plugin.hive.metastore.glue.GlueHiveMetastore and io.trino.plugin.hive.metastore.cache are gone since I upgraded to 448
possibily caused by this change I think

@findepi
Copy link
Member

findepi commented May 24, 2024

@sudohainguyen GlueHiveMetastore isn't used when Glue v2 is used.
io.trino.plugin.hive.metastore.glue.InMemoryGlueCache is used instead and the code looks like it exports some metrics. please verify

@sudohainguyen
Copy link
Contributor

sudohainguyen commented May 24, 2024

hmm I didn't see the classname you mentioned when exploring in jconsole
this is all I got
image

@findepi
Copy link
Member

findepi commented May 24, 2024

maybe it's not exported because it's not public? i don't know how this works. you will need to experiment a bit

@sudohainguyen
Copy link
Contributor

yeah I tried 😅 but not able to get metrics from InMemoryGlueCache

"hive.metastore.glue.read-statistics-threads",
"hive.metastore.glue.write-statistics-threads",
"hive.metastore.glue.proxy-api-id",
"hive.metastore.glue.aws-credentials-provider",
Copy link
Contributor

@rohanag12 rohanag12 Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR removed the ability to specify a custom AWS credentials provider for Glue v2. Is that intentional @dain @findepi? Custom provider is useful for working around some corner cases, like #15267, which still applies to AWS SDK v2. Can we get a replacement for this similar to what's being done for S3 in #22162 and #22163?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #22425 to address this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

None yet

6 participants