Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry and jmx metrics while generating Thrift's delegation token #21000

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

osscm
Copy link
Contributor

@osscm osscm commented Mar 9, 2024

Description

Additional context and related issues

fixes : #20999

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`20999`)

@cla-bot cla-bot bot added the cla-signed label Mar 9, 2024
@github-actions github-actions bot added tests:hive hive Hive connector labels Mar 9, 2024
@osscm osscm changed the title Add retry and jmx means for the Thrift's delegation token generation Add retry and jmx metrics while generating Thrift's delegation token Mar 9, 2024
@osscm
Copy link
Contributor Author

osscm commented Mar 9, 2024

cc @electrum if you can please help with the review, thanks a lot!

@osscm osscm force-pushed the thrift-dt-retry branch 3 times, most recently from ba0ed22 to 7b7d7b8 Compare April 8, 2024 21:43
@osscm
Copy link
Contributor Author

osscm commented Apr 9, 2024

@electrum will cross check the pt failures.

.withMaxDuration(java.time.Duration.of(30, SECONDS))
.withMaxAttempts(maxRetries + 1)
.withBackoff(minBackoffDelay.toMillis(), maxBackoffDelay.toMillis(), MILLIS, backoffScaleFactor)
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what errors the metastore throws for getDelegationToken(). The only declared exception is MetaException. If that is always a permanent error, then we should abort on that. Otherwise, we should retry everything. So this should be

.abortOn(MetaException.class)

or remove it if MetaException is thrown for transient errors

Copy link
Contributor Author

@osscm osscm Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering, if we can abort on TException?

as getDT throws MetaException, TException,
and MetaException extends TException that extends Exception

public String getDelegationToken(String token_owner, String renewer_kerberos_principal_name) throws MetaException, TException {
            this.sendGetDelegationToken(token_owner, renewer_kerberos_principal_name);
            return this.recvGetDelegationToken();
        }

so, .abortOn(TException.class)

like it is done above, make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@electrum thanks for the review and help!

Apart from this, I have tried to take care of all the other comments, if you can please check once.

Please see if .abortOn(TException.class) make sense or not as TException is also thrown.
Or may be I am missing something?

thanks

Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 10, 2024
@mosabua
Copy link
Member

mosabua commented May 13, 2024

@osscm .. can you address the final remarks from @electrum .. I think this is ready to merge afterwards since he approved. Any concerns @findinpath ?

@github-actions github-actions bot removed the stale label May 13, 2024
@osscm
Copy link
Contributor Author

osscm commented May 15, 2024 via email

@osscm osscm force-pushed the thrift-dt-retry branch 2 times, most recently from 35f8695 to 7024c36 Compare May 27, 2024 23:30
Copy link

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label Jun 18, 2024
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jul 10, 2024
@mosabua mosabua reopened this Jul 10, 2024
@mosabua
Copy link
Member

mosabua commented Jul 10, 2024

@electrum can you look .. seems like you approved already. Could you merge? I will see what the CI build does ...

@mosabua mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Jul 10, 2024
@@ -58,6 +66,12 @@ public TokenFetchingMetastoreClientFactory(
.maximumSize(thriftConfig.getDelegationTokenCacheMaximumSize()),
CacheLoader.from(this::loadDelegationToken));
this.refreshPeriod = Duration.ofMinutes(1).toNanos();
retryPolicy = RetryPolicy.builder()
.withMaxDuration(java.time.Duration.of(30, SECONDS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use maxRetryTime from ThriftHiveMetastore

.withMaxDuration(java.time.Duration.of(30, SECONDS))
.withMaxAttempts(thriftConfig.getMaxRetries() + 1)
.withBackoff(thriftConfig.getMinBackoffDelay().toMillis(), thriftConfig.getMaxBackoffDelay().toMillis(), MILLIS, thriftConfig.getBackoffScaleFactor())
.abortOn(throwable -> Throwables.getCausalChain(throwable).stream().anyMatch(TrinoException.class::isInstance))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we figure which exceptions are permanent or temporary?
Is MetaException permanent?
Can TException be temporary?

Would getDelegationToken throw TrinoExceptipon at all (I don't think it does)?

}
catch (TException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just

catch (TException e) {
  throw new TrinoException(HIVE_METASTORE_ERROR, e);
}

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

Add retry and jmx metrics while generating Thrift's delegation token
6 participants