New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-21382][doc] Update documentation for standalone Flink on Kubernetes with standby JobManagers #15248

Closed

wangyang0918 wants to merge 2 commits into apache:master from wangyang0918:FLINK-21382-standalone-k8s-ha

Contributor

wangyang0918 commented Mar 17, 2021

This PR tries to update the documentation for standalone Flink on Kubernetes for HA with standby JobManagers.

Note: Even we just have one JobManager, we should also use the pod IP instead of Kubernetes service when the HA enabled. This is also the current behavior of native Kubernetes integration.

Collaborator

flinkbot commented Mar 17, 2021

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 3f30a89 (Wed Mar 17 03:20:18 UTC 2021)

✅no warnings

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

rmetzger added review=description? component=Deployment/Kubernetes component=Documentation labels

Collaborator

flinkbot commented Mar 17, 2021 •

edited

Loading

CI report:

291ea8b Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

wangyang0918 force-pushed the FLINK-21382-standalone-k8s-ha branch from 3f30a89 to 138d7fb Compare

March 17, 2021 06:56

Contributor Author

wangyang0918 commented Mar 17, 2021

cc @tillrohrmann could you please have a look on this documentation change?

tillrohrmann approved these changes

View reviewed changes

Contributor

tillrohrmann left a comment

Thanks for creating this PR @wangyang0918. The changes look good to me. I had a few minor comments. +1 for merging this PR once they are resolved.

docs/content.zh/docs/deployment/resource-providers/standalone/kubernetes.md Outdated

@@ @@ -218,11 +218,19 @@ data: @@
 Moreover, you have to start the JobManager and TaskManager pods with a service account which has the permissions to create, edit, delete ConfigMaps.
 See [how to configure service accounts for pods](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/) for more information.
+When the High-Availability is enabled, JobManager pods should be started with IP address instead of Kubernetes service as its `jobmanager.rpc.address`.

Contributor

tillrohrmann Mar 22, 2021

Suggested change

When the High-Availability is enabled, JobManager pods should be started with IP address instead of Kubernetes service as its `jobmanager.rpc.address`.

When High-Availability is enabled, Flink will use its own HA-services for service discovery. Therefore, JobManager pods should be started with their IP address instead of a Kubernetes service as its `jobmanager.rpc.address`.

docs/content.zh/docs/deployment/resource-providers/standalone/kubernetes.md Outdated


		#### Standby JobManagers

		Usually, it is enough to only have one JobManager. Since Kubernetes will launch a new one once the current JobManager pod crashed exceptionally.

Contributor

tillrohrmann Mar 22, 2021

Suggested change

Usually, it is enough to only have one JobManager. Since Kubernetes will launch a new one once the current JobManager pod crashed exceptionally.

Usually, it is enough to only start a single JobManager pod, because Kubernetes will restart it once the pod crashes.

docs/content.zh/docs/deployment/resource-providers/standalone/kubernetes.md Outdated

+#### Standby JobManagers
+Usually, it is enough to only have one JobManager. Since Kubernetes will launch a new one once the current JobManager pod crashed exceptionally.
+If you want to achieve faster recovery, configure the `replicas` in `jobmanager-session-deployment-ha.yaml` or `parallelism` in `jobmanager-application-ha.yaml` to a value greater than one to start standby JobManagers.

Contributor

tillrohrmann Mar 22, 2021

Suggested change

If you want to achieve faster recovery, configure the `replicas` in `jobmanager-session-deployment-ha.yaml` or `parallelism` in `jobmanager-application-ha.yaml` to a value greater than one to start standby JobManagers.

If you want to achieve faster recovery, configure the `replicas` in `jobmanager-session-deployment-ha.yaml` or `parallelism` in `jobmanager-application-ha.yaml` to a value greater than `1` to start standby JobManagers.

docs/content.zh/docs/deployment/resource-providers/standalone/kubernetes.md Outdated

### Enabling Queryable State

You can access the queryable state of TaskManager if you create a `NodePort` service for it:

 1. Run `kubectl create -f taskmanager-query-state-service.yaml` to create the `NodePort` service for the `taskmanager` pod. The example of `taskmanager-query-state-service.yaml` can be found in [appendix](#common-cluster-resource-definitions).

 2. Run `kubectl get svc flink-taskmanager-query-state` to get the `&lt;node-port&gt;` of this service. Then you can create the [QueryableStateClient(&lt;public-node-ip&gt;, &lt;node-port&gt;]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit the state queries.

 2. Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(&lt;public-node-ip&gt;, &lt;node-port&gt;]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit the state queries.

Contributor

tillrohrmann Mar 22, 2021

Suggested change

Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(&lt;public-node-ip&gt;, &lt;node-port&gt;]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit the state queries.

Run `kubectl get svc flink-taskmanager-query-state` to get the `<node-port>` of this service. Then you can create the [QueryableStateClient(&lt;public-node-ip&gt;, &lt;node-port&gt;]({{< ref "docs/dev/datastream/fault-tolerance/queryable_state" >}}#querying-state) to submit state queries.

docs/content.zh/docs/deployment/resource-providers/standalone/kubernetes.md

+ fieldRef:
+ apiVersion: v1
+ fieldPath: status.podIP
+ args: ["jobmanager", "$(POD_IP)"]

Contributor

tillrohrmann Mar 22, 2021

Maybe add a comment that this will set the POD_IP as jobmanager.rpc.address.

Contributor

tillrohrmann Mar 22, 2021

Or better, that this overwrites the value configured in the configuration config map.

wangyang0918 mentioned this pull request

[FLINK-17707][k8s] Support configuring replicas of JobManager deployment when HA enabled #15286

Closed

wangyang0918 force-pushed the FLINK-21382-standalone-k8s-ha branch from 138d7fb to 8a5ae18 Compare

March 23, 2021 05:47

wangyang0918 added 2 commits

March 23, 2021 13:48


[hotfix][doc] Fix the less-than and greater-than sign in Kubernetes d…

568f608

…ocumentation


[FLINK-21382][doc] Update documentation for standalone Flink on Kuber…

291ea8b

…netes with standby JobManagers

wangyang0918 force-pushed the FLINK-21382-standalone-k8s-ha branch from 8a5ae18 to 291ea8b Compare

March 23, 2021 05:48

Contributor Author

wangyang0918 commented Mar 23, 2021

@tillrohrmann Thanks for your review. Comments addressed.

tillrohrmann approved these changes

View reviewed changes

Contributor

tillrohrmann left a comment

Thanks for updating this PR @wangyang0918. LGTM. Merging this PR into master and release-1.12.

tillrohrmann closed this in

d67c4d9

tillrohrmann pushed a commit that referenced this pull request


[FLINK-21382][docs] Update documentation for standalone Flink on Kube…

33893b0

…rnetes with standby JobManagers

This closes #15248.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment