-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-21382][doc] Update documentation for standalone Flink on Kubernetes with standby JobManagers #15248
[FLINK-21382][doc] Update documentation for standalone Flink on Kubernetes with standby JobManagers #15248
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 3f30a89 (Wed Mar 17 03:20:18 UTC 2021) ✅no warnings Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
3f30a89
to
138d7fb
Compare
cc @tillrohrmann could you please have a look on this documentation change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for creating this PR @wangyang0918. The changes look good to me. I had a few minor comments. +1 for merging this PR once they are resolved.
@@ -218,11 +218,19 @@ data: | |||
Moreover, you have to start the JobManager and TaskManager pods with a service account which has the permissions to create, edit, delete ConfigMaps. | |||
See [how to configure service accounts for pods](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) for more information. | |||
|
|||
When the High-Availability is enabled, JobManager pods should be started with IP address instead of Kubernetes service as its `jobmanager.rpc.address`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the High-Availability is enabled, JobManager pods should be started with IP address instead of Kubernetes service as its `jobmanager.rpc.address`. | |
When High-Availability is enabled, Flink will use its own HA-services for service discovery. Therefore, JobManager pods should be started with their IP address instead of a Kubernetes service as its `jobmanager.rpc.address`. |
|
||
#### Standby JobManagers | ||
|
||
Usually, it is enough to only have one JobManager. Since Kubernetes will launch a new one once the current JobManager pod crashed exceptionally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually, it is enough to only have one JobManager. Since Kubernetes will launch a new one once the current JobManager pod crashed exceptionally. | |
Usually, it is enough to only start a single JobManager pod, because Kubernetes will restart it once the pod crashes. |
#### Standby JobManagers | ||
|
||
Usually, it is enough to only have one JobManager. Since Kubernetes will launch a new one once the current JobManager pod crashed exceptionally. | ||
If you want to achieve faster recovery, configure the `replicas` in `jobmanager-session-deployment-ha.yaml` or `parallelism` in `jobmanager-application-ha.yaml` to a value greater than one to start standby JobManagers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to achieve faster recovery, configure the `replicas` in `jobmanager-session-deployment-ha.yaml` or `parallelism` in `jobmanager-application-ha.yaml` to a value greater than one to start standby JobManagers. | |
If you want to achieve faster recovery, configure the `replicas` in `jobmanager-session-deployment-ha.yaml` or `parallelism` in `jobmanager-application-ha.yaml` to a value greater than `1` to start standby JobManagers. |
### Enabling Queryable State | ||
|
||
You can access the queryable state of TaskManager if you create a `NodePort` service for it: | ||
1. Run `kubectl create -f taskmanager-query-state-service.yaml` to create the `NodePort` service for the `taskmanager` pod. The example of `taskmanager-query-state-service.yaml` can be found in [appendix](#common-cluster-resource-definitions). | ||
2. Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(<public-node-ip>, <node-port>]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit the state queries. | ||
2. Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(<public-node-ip>, <node-port>]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit the state queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(<public-node-ip>, <node-port>]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit the state queries. | |
2. Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(<public-node-ip>, <node-port>]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit state queries. |
fieldRef: | ||
apiVersion: v1 | ||
fieldPath: status.podIP | ||
args: ["jobmanager", "$(POD_IP)"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that this will set the POD_IP
as jobmanager.rpc.address
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better, that this overwrites the value configured in the configuration config map.
138d7fb
to
8a5ae18
Compare
…netes with standby JobManagers
8a5ae18
to
291ea8b
Compare
@tillrohrmann Thanks for your review. Comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this PR @wangyang0918. LGTM. Merging this PR into master
and release-1.12
.
…rnetes with standby JobManagers This closes #15248.
This PR tries to update the documentation for standalone Flink on Kubernetes for HA with standby JobManagers.
Note: Even we just have one JobManager, we should also use the pod IP instead of Kubernetes service when the HA enabled. This is also the current behavior of native Kubernetes integration.