Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-7580] Automatically retry failed gateway retrievals #4643

Conversation

tillrohrmann
Copy link
Contributor

What is the purpose of the change

The LeaderGatewayRetriever implementations, AkkaJobManagerRetriever and the RpcGatewayRetriever, now automatically retry the gateway retrieval operation for a fixed number of times with a retry delay before completing the gateway future with an exception.

Verifying this change

This change is already covered by existing tests, such as FutureUtilsTest (retryWithDelay tests).

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@tillrohrmann tillrohrmann force-pushed the retryingLeaderGatewayRetrieverImpl branch 2 times, most recently from 1c90ddc to 36df55a Compare September 5, 2017 14:04
Copy link
Contributor

@zentol zentol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+!

Introduce a generalized GatewayRetriever replacing the JobManagerRetriever. The
GatewayRetriever fulfills the same purpose as the JobManagerRetriever with the
ability to retrieve the gateway for an arbitrary endpoint type.

This closes apache#4549.
Introduce RedirectHandler which can be extended to add redirection functionality to all
SimpleInboundChannelHandlers. This allows to share the same functionality across the
StaticFileServerHandler and the RuntimeMonitorHandlerBase which could now be removed.
In the future, the AbstractRestHandler will also extend the RedirectHandler.

This closes apache#4551.
Add test case

Only log LeaderGatewayRetriever exception on Debug log level

Properly fail outdated gateway retrieval operations

This closes apache#4602.
@tillrohrmann
Copy link
Contributor Author

Thanks for the review @zentol. Do I assume correctly that you also approved #4602 with your review?

Rebasing onto the latest master. If Travis gives green light, I would like to merge this PR.

The LeaderGatewayRetriever implementations, AkkaJobManagerRetriever and the
RpcGatewayRetriever, now automatically retry the gateway retrieval operation
for a fixed number of times with a retry delay before completing the gateway
future with an exception.

Retry AkkaJobManagerRetriever

Retry RpcGatewayRetriever

This closes apache#4643.
@tillrohrmann tillrohrmann force-pushed the retryingLeaderGatewayRetrieverImpl branch from 36df55a to 80cfebb Compare September 18, 2017 13:19
@zentol
Copy link
Contributor

zentol commented Sep 18, 2017

@tillrohrmann No, I haven't looked at #4602 yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants