From 59ebdb2746cba1e90fb7ef3136b48f9b06b7295d Mon Sep 17 00:00:00 2001 From: Piotr Nowojski Date: Wed, 14 Nov 2018 13:57:44 +0100 Subject: [PATCH] [FLINK-10874][kafka-docs] Document likely cause of UnknownTopicOrPartitionException --- docs/dev/connectors/kafka.md | 62 ++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 28 deletions(-) diff --git a/docs/dev/connectors/kafka.md b/docs/dev/connectors/kafka.md index 0630c6ec7d6c7..351a4dc2d4126 100644 --- a/docs/dev/connectors/kafka.md +++ b/docs/dev/connectors/kafka.md @@ -660,19 +660,6 @@ we recommend setting the number of retries to a higher value. **Note**: There is currently no transactional producer for Kafka, so Flink can not guarantee exactly-once delivery into a Kafka topic. -
- Attention: Depending on your Kafka configuration, even after Kafka acknowledges - writes you can still experience data loss. In particular keep in mind the following Kafka settings: - - Default values for the above options can easily lead to data loss. Please refer to Kafka documentation - for more explanation. -
- #### Kafka 0.11 and newer With Flink's checkpointing enabled, the `FlinkKafkaProducer011` (`FlinkKafkaProducer` for Kafka >= 1.0.0 versions) can provide @@ -690,21 +677,6 @@ chosen by passing appropriate `semantic` parameter to the `FlinkKafkaProducer011 or `read_uncommitted` - the latter one is the default value) for any application consuming records from Kafka. -
- Attention: Depending on your Kafka configuration, even after Kafka acknowledges - writes you can still experience data losses. In particular keep in mind about following properties - in Kafka config: - - Default values for the above options can easily lead to data loss. Please refer to the Kafka documentation - for more explanation. -
- - ##### Caveats `Semantic.EXACTLY_ONCE` mode relies on the ability to commit transactions @@ -831,4 +803,38 @@ A mismatch in service name between client and server configuration will cause th For more information on Flink configuration for Kerberos security, please see [here]({{ site.baseurl}}/ops/config.html). You can also find [here]({{ site.baseurl}}/ops/security-kerberos.html) further details on how Flink internally setups Kerberos-based security. +## Troubleshooting + +
+If you have a problem with Kafka when using Flink, keep in mind that Flink only wraps +KafkaConsumer or +KafkaProducer +and your problem might be independent of Flink and sometimes can be solved by upgrading Kafka brokers, +reconfiguring Kafka brokers or reconfiguring KafkaConsumer or KafkaProducer in Flink. +Some examples of common problems are listed below. +
+ +### Data loss + +Depending on your Kafka configuration, even after Kafka acknowledges +writes you can still experience data loss. In particular keep in mind about the following properties +in Kafka config: + +- `acks` +- `log.flush.interval.messages` +- `log.flush.interval.ms` +- `log.flush.*` + +Default values for the above options can easily lead to data loss. +Please refer to the Kafka documentation for more explanation. + +### UnknownTopicOrPartitionException + +One possible cause of this error is when a new leader election is taking place, +for example after or during restarting a Kafka broker. +This is a retriable exception, so Flink job should be able to restart and resume normal operation. +It also can be circumvented by changing `retries` property in the producer settings. +However this might cause reordering of messages, +which in turn if undesired can be circumvented by setting `max.in.flight.requests.per.connection` to 1. + {% top %}