Skip to content

Commit

Permalink
[FLINK-17340][docs] Update docs which related to default planner chan…
Browse files Browse the repository at this point in the history
…ging.

This closes apache#12429
  • Loading branch information
KurtYoung committed Jun 2, 2020
1 parent 03b82f9 commit 1c78ab3
Show file tree
Hide file tree
Showing 8 changed files with 60 additions and 71 deletions.
4 changes: 2 additions & 2 deletions docs/dev/table/catalogs.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Set a `JdbcCatalog` with the following parameters:
<div data-lang="Java" markdown="1">
{% highlight java %}

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

String name = "mypg";
Expand All @@ -82,7 +82,7 @@ tableEnv.useCatalog("mypg");
<div data-lang="Scala" markdown="1">
{% highlight scala %}

val settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build()
val settings = EnvironmentSettings.newInstance().inStreamingMode().build()
val tableEnv = TableEnvironment.create(settings)

val name = "mypg"
Expand Down
4 changes: 2 additions & 2 deletions docs/dev/table/catalogs.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ Set a `Jdbcatalog` with the following parameters:
<div data-lang="Java" markdown="1">
{% highlight java %}

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build();
EnvironmentSettings settings = EnvironmentSettings.newInstance().inStreamingMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

String name = "mypg";
Expand All @@ -78,7 +78,7 @@ tableEnv.useCatalog("mypg");
<div data-lang="Scala" markdown="1">
{% highlight scala %}

val settings = EnvironmentSettings.newInstance().useBlinkPlanner().inStreamingMode().build()
val settings = EnvironmentSettings.newInstance().inStreamingMode().build()
val tableEnv = TableEnvironment.create(settings)

val name = "mypg"
Expand Down
52 changes: 23 additions & 29 deletions docs/dev/table/common.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,11 @@ Main Differences Between the Two Planners

1. Blink treats batch jobs as a special case of streaming. As such, the conversion between Table and DataSet is also not supported, and batch jobs will not be translated into `DateSet` programs but translated into `DataStream` programs, the same as the streaming jobs.
2. The Blink planner does not support `BatchTableSource`, use bounded `StreamTableSource` instead of it.
3. The Blink planner only support the brand new `Catalog` and does not support `ExternalCatalog` which is deprecated.
4. The implementations of `FilterableTableSource` for the old planner and the Blink planner are incompatible. The old planner will push down `PlannerExpression`s into `FilterableTableSource`, while the Blink planner will push down `Expression`s.
5. String based key-value config options (Please see the documentation about [Configuration]({{ site.baseurl }}/dev/table/config.html) for details) are only used for the Blink planner.
6. The implementation(`CalciteConfig`) of `PlannerConfig` in two planners is different.
7. The Blink planner will optimize multiple-sinks into one DAG (supported only on `TableEnvironment`, not on `StreamTableEnvironment`). The old planner will always optimize each sink into a new DAG, where all DAGs are independent of each other.
8. The old planner does not support catalog statistics now, while the Blink planner does.
3. The implementations of `FilterableTableSource` for the old planner and the Blink planner are incompatible. The old planner will push down `PlannerExpression`s into `FilterableTableSource`, while the Blink planner will push down `Expression`s.
4. String based key-value config options (Please see the documentation about [Configuration]({{ site.baseurl }}/dev/table/config.html) for details) are only used for the Blink planner.
5. The implementation(`CalciteConfig`) of `PlannerConfig` in two planners is different.
6. The Blink planner will optimize multiple-sinks into one DAG (supported only on `TableEnvironment`, not on `StreamTableEnvironment`). The old planner will always optimize each sink into a new DAG, where all DAGs are independent of each other.
7. The old planner does not support catalog statistics now, while the Blink planner does.


Structure of Table API and SQL Programs
Expand Down Expand Up @@ -831,6 +830,19 @@ Translate and Execute a Query
The behavior of translating and executing a query is different for the two planners.

<div class="codetabs" markdown="1">

<div data-lang="Blink planner" markdown="1">
Table API and SQL queries are translated into [DataStream]({{ site.baseurl }}/dev/datastream_api.html) programs whether their input is streaming or batch. A query is internally represented as a logical query plan and is translated in two phases:

1. Optimization of the logical plan,
2. Translation into a DataStream program.

a Table API or SQL query is translated when:

* `TableEnvironment.execute()` is called. A `Table` (emitted to a `TableSink` through `Table.insertInto()`) or a SQL update query (specified through `TableEnvironment.sqlUpdate()`) will be buffered in `TableEnvironment` first. All sinks will be optimized into one DAG.
* A `Table` is translated when it is converted into a `DataStream` (see [Integration with DataStream and DataSet API](#integration-with-datastream-and-dataset-api)). Once translated, it's a regular DataStream program and is executed when `StreamExecutionEnvironment.execute()` is called.
</div>

<div data-lang="Old planner" markdown="1">
Table API and SQL queries are translated into [DataStream]({{ site.baseurl }}/dev/datastream_api.html) or [DataSet]({{ site.baseurl }}/dev/batch) programs depending on whether their input is a streaming or batch input. A query is internally represented as a logical query plan and is translated in two phases:

Expand All @@ -849,22 +861,8 @@ For batch, a Table API or SQL query is translated when:
* a `Table` is converted into a `DataSet` (see [Integration with DataStream and DataSet API](#integration-with-datastream-and-dataset-api)).

Once translated, a Table API or SQL query is handled like a regular DataSet program and is executed when `ExecutionEnvironment.execute()` is called.

</div>

<div data-lang="Blink planner" markdown="1">
Table API and SQL queries are translated into [DataStream]({{ site.baseurl }}/dev/datastream_api.html) programs whether their input is streaming or batch. A query is internally represented as a logical query plan and is translated in two phases:

1. Optimization of the logical plan,
2. Translation into a DataStream program.

a Table API or SQL query is translated when:

* `TableEnvironment.execute()` is called. A `Table` (emitted to a `TableSink` through `Table.insertInto()`) or a SQL update query (specified through `TableEnvironment.sqlUpdate()`) will be buffered in `TableEnvironment` first. All sinks will be optimized into one DAG.
* A `Table` is translated when it is converted into a `DataStream` (see [Integration with DataStream and DataSet API](#integration-with-datastream-and-dataset-api)). Once translated, it's a regular DataStream program and is executed when `StreamExecutionEnvironment.execute()` is called.


</div>
</div>

{% top %}
Expand Down Expand Up @@ -1407,16 +1405,7 @@ Query Optimization
------------------

<div class="codetabs" markdown="1">
<div data-lang="Old planner" markdown="1">

Apache Flink leverages Apache Calcite to optimize and translate queries. The optimization currently performed include projection and filter push-down, subquery decorrelation, and other kinds of query rewriting. Old planner does not yet optimize the order of joins, but executes them in the same order as defined in the query (order of Tables in the `FROM` clause and/or order of join predicates in the `WHERE` clause).

It is possible to tweak the set of optimization rules which are applied in different phases by providing a `CalciteConfig` object. This can be created via a builder by calling `CalciteConfig.createBuilder())` and is provided to the TableEnvironment by calling `tableEnv.getConfig.setPlannerConfig(calciteConfig)`.

</div>

<div data-lang="Blink planner" markdown="1">

Apache Flink leverages and extends Apache Calcite to perform sophisticated query optimization.
This includes a series of rule and cost-based optimizations such as:

Expand All @@ -1436,7 +1425,12 @@ This includes a series of rule and cost-based optimizations such as:
The optimizer makes intelligent decisions, based not only on the plan but also rich statistics available from the data sources and fine-grain costs for each operator such as io, cpu, network, and memory.

Advanced users may provide custom optimizations via a `CalciteConfig` object that can be provided to the table environment by calling `TableEnvironment#getConfig#setPlannerConfig`.
</div>

<div data-lang="Old planner" markdown="1">
Apache Flink leverages Apache Calcite to optimize and translate queries. The optimization currently performed include projection and filter push-down, subquery decorrelation, and other kinds of query rewriting. Old planner does not yet optimize the order of joins, but executes them in the same order as defined in the query (order of Tables in the `FROM` clause and/or order of join predicates in the `WHERE` clause).

It is possible to tweak the set of optimization rules which are applied in different phases by providing a `CalciteConfig` object. This can be created via a builder by calling `CalciteConfig.createBuilder())` and is provided to the TableEnvironment by calling `tableEnv.getConfig.setPlannerConfig(calciteConfig)`.
</div>
</div>

Expand Down
51 changes: 23 additions & 28 deletions docs/dev/table/common.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,11 @@ Table API 和 SQL 集成在同一套 API 中。这套 API 的核心概念是`Tab

1. Blink 将批处理作业视作流处理的一种特例。严格来说,`Table``DataSet` 之间不支持相互转换,并且批处理作业也不会转换成 `DataSet` 程序而是转换成 `DataStream` 程序,流处理作业也一样。
2. Blink 计划器不支持 `BatchTableSource`,而是使用有界的 `StreamTableSource` 来替代。
3. Blink 计划器仅支持全新的 `Catalog` 不支持被弃用的 `ExternalCatalog`
4. 旧计划器和 Blink 计划器中 `FilterableTableSource` 的实现是不兼容的。旧计划器会将 `PlannerExpression` 下推至 `FilterableTableSource`,而 Blink 计划器则是将 `Expression` 下推。
5. 基于字符串的键值配置选项仅在 Blink 计划器中使用。(详情参见 [配置]({{ site.baseurl }}/zh/dev/table/config.html) )
6. `PlannerConfig` 在两种计划器中的实现(`CalciteConfig`)是不同的。
7. Blink 计划器会将多sink(multiple-sinks)优化成一张有向无环图(DAG)(仅支持 `TableEnvironment`,不支持 `StreamTableEnvironment`)。旧计划器总是将每个sink都优化成一个新的有向无环图,且所有图相互独立。
8. 旧计划器目前不支持 catalog 统计数据,而 Blink 支持。
3. 旧计划器和 Blink 计划器中 `FilterableTableSource` 的实现是不兼容的。旧计划器会将 `PlannerExpression` 下推至 `FilterableTableSource`,而 Blink 计划器则是将 `Expression` 下推。
4. 基于字符串的键值配置选项仅在 Blink 计划器中使用。(详情参见 [配置]({{ site.baseurl }}/zh/dev/table/config.html) )
5. `PlannerConfig` 在两种计划器中的实现(`CalciteConfig`)是不同的。
6. Blink 计划器会将多sink(multiple-sinks)优化成一张有向无环图(DAG)(仅支持 `TableEnvironment`,不支持 `StreamTableEnvironment`)。旧计划器总是将每个sink都优化成一个新的有向无环图,且所有图相互独立。
7. 旧计划器目前不支持 catalog 统计数据,而 Blink 支持。


Table API 和 SQL 程序的结构
Expand Down Expand Up @@ -810,6 +809,19 @@ result.insert_into("CsvSinkTable")
两种计划器翻译和执行查询的方式是不同的。

<div class="codetabs" markdown="1">

<div data-lang="Blink planner" markdown="1">
不论输入数据源是流式的还是批式的,Table API 和 SQL 查询都会被转换成 [DataStream]({{ site.baseurl }}/zh/dev/datastream_api.html) 程序。查询在内部表示为逻辑查询计划,并被翻译成两个阶段:

1. 优化逻辑执行计划
2. 翻译成 DataStream 程序

Table API 或者 SQL 查询在下列情况下会被翻译:

*`TableEnvironment.execute()` 被调用时。`Table` (通过 `Table.insertInto()` 输出给 `TableSink`)和 SQL (通过调用 `TableEnvironment.sqlUpdate()`)会先被缓存到 `TableEnvironment` 中,所有的 sink 会被优化成一张有向无环图。
* `Table` 被转换成 `DataStream` 时(参阅[与 DataStream 和 DataSet API 结合](#integration-with-datastream-and-dataset-api))。转换完成后,它就成为一个普通的 DataStream 程序,并且会在调用 `StreamExecutionEnvironment.execute()` 的时候被执行。
</div>

<div data-lang="Old planner" markdown="1">
Table API 和 SQL 查询会被翻译成 [DataStream]({{ site.baseurl }}/zh/dev/datastream_api.html) 或者 [DataSet]({{ site.baseurl }}/zh/dev/batch) 程序, 这取决于它们的输入数据源是流式的还是批式的。查询在内部表示为逻辑查询计划,并被翻译成两个阶段:

Expand All @@ -828,21 +840,8 @@ Table API 和 SQL 查询会被翻译成 [DataStream]({{ site.baseurl }}/zh/dev/d
* `Table` 被转换成 `DataSet` 时(参阅[与 DataStream 和 DataSet API 结合](#integration-with-datastream-and-dataset-api))。

翻译完成后,Table API 或者 SQL 查询会被当做普通的 DataSet 程序对待并且会在调用 `ExecutionEnvironment.execute()` 的时候被执行。

</div>

<div data-lang="Blink planner" markdown="1">
不论输入数据源是流式的还是批式的,Table API 和 SQL 查询都会被转换成 [DataStream]({{ site.baseurl }}/zh/dev/datastream_api.html) 程序。查询在内部表示为逻辑查询计划,并被翻译成两个阶段:

1. 优化逻辑执行计划
2. 翻译成 DataStream 程序

Table API 或者 SQL 查询在下列情况下会被翻译:

*`TableEnvironment.execute()` 被调用时。`Table` (通过 `Table.insertInto()` 输出给 `TableSink`)和 SQL (通过调用 `TableEnvironment.sqlUpdate()`)会先被缓存到 `TableEnvironment` 中,所有的 sink 会被优化成一张有向无环图。
* `Table` 被转换成 `DataStream` 时(参阅[与 DataStream 和 DataSet API 结合](#integration-with-datastream-and-dataset-api))。转换完成后,它就成为一个普通的 DataStream 程序,并且会在调用 `StreamExecutionEnvironment.execute()` 的时候被执行。

</div>
</div>

{% top %}
Expand Down Expand Up @@ -1388,16 +1387,7 @@ val table: Table = tableEnv.fromDataStream(stream, $"name" as "myName")
------------------

<div class="codetabs" markdown="1">
<div data-lang="Old planner" markdown="1">

Apache Flink 利用 Apache Calcite 来优化和翻译查询。当前执行的优化包括投影和过滤器下推,子查询消除以及其他类型的查询重写。原版计划程序尚未优化 join 的顺序,而是按照查询中定义的顺序执行它们(FROM 子句中的表顺序和/或 WHERE 子句中的 join 谓词顺序)。

通过提供一个 `CalciteConfig` 对象,可以调整在不同阶段应用的优化规则集合。这个对象可以通过调用构造器 `CalciteConfig.createBuilder()` 创建,并通过调用 `tableEnv.getConfig.setPlannerConfig(calciteConfig)` 提供给 TableEnvironment。

</div>

<div data-lang="Blink planner" markdown="1">

Apache Flink 使用并扩展了 Apache Calcite 来执行复杂的查询优化。
这包括一系列基于规则和成本的优化,例如:

Expand All @@ -1417,7 +1407,12 @@ Apache Flink 使用并扩展了 Apache Calcite 来执行复杂的查询优化。
优化器不仅基于计划,而且还基于可从数据源获得的丰富统计信息以及每个算子(例如 io,cpu,网络和内存)的细粒度成本来做出明智的决策。

高级用户可以通过 `CalciteConfig` 对象提供自定义优化,可以通过调用 `TableEnvironment#getConfig#setPlannerConfig` 将其提供给 TableEnvironment。
</div>

<div data-lang="Old planner" markdown="1">
Apache Flink 利用 Apache Calcite 来优化和翻译查询。当前执行的优化包括投影和过滤器下推,子查询消除以及其他类型的查询重写。原版计划程序尚未优化 join 的顺序,而是按照查询中定义的顺序执行它们(FROM 子句中的表顺序和/或 WHERE 子句中的 join 谓词顺序)。

通过提供一个 `CalciteConfig` 对象,可以调整在不同阶段应用的优化规则集合。这个对象可以通过调用构造器 `CalciteConfig.createBuilder()` 创建,并通过调用 `tableEnv.getConfig.setPlannerConfig(calciteConfig)` 提供给 TableEnvironment。
</div>
</div>

Expand Down
4 changes: 2 additions & 2 deletions docs/dev/table/hive/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ Take Hive version 2.3.4 for example:
<div data-lang="Java" markdown="1">
{% highlight java %}

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

String name = "myhive";
Expand All @@ -329,7 +329,7 @@ tableEnv.useCatalog("myhive");
<div data-lang="Scala" markdown="1">
{% highlight scala %}

val settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build()
val settings = EnvironmentSettings.newInstance().inBatchMode().build()
val tableEnv = TableEnvironment.create(settings)

val name = "myhive"
Expand Down
4 changes: 2 additions & 2 deletions docs/dev/table/hive/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ Apache Hive 是基于 Hadoop 之上构建的, 首先您需要 Hadoop 的依赖
<div data-lang="Java" markdown="1">
{% highlight java %}

EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build();
EnvironmentSettings settings = EnvironmentSettings.newInstance().inBatchMode().build();
TableEnvironment tableEnv = TableEnvironment.create(settings);

String name = "myhive";
Expand All @@ -325,7 +325,7 @@ tableEnv.useCatalog("myhive");
<div data-lang="Scala" markdown="1">
{% highlight scala %}

val settings = EnvironmentSettings.newInstance().useBlinkPlanner().inBatchMode().build()
val settings = EnvironmentSettings.newInstance().inBatchMode().build()
val tableEnv = TableEnvironment.create(settings)

val name = "myhive"
Expand Down
6 changes: 3 additions & 3 deletions docs/dev/table/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Starting from Flink 1.9, Flink provides two different planner implementations fo
translating relational operators into an executable, optimized Flink job. Both of the planners come with different optimization rules and runtime classes.
They may also differ in the set of supported features.

<span class="label label-danger">Attention</span> For production use cases, we recommend the old planner that was present before Flink 1.9 for now.
<span class="label label-danger">Attention</span> For production use cases, we recommend the blink planner that has become the default planner since 1.11.

All Table API and SQL components are bundled in the `flink-table` or `flink-table-blink` Maven artifacts.

Expand All @@ -49,8 +49,8 @@ The following dependencies are relevant for most projects:
* `flink-table-api-scala`: The Table & SQL API for pure table programs using the Scala programming language (in early development stage, not recommended!).
* `flink-table-api-java-bridge`: The Table & SQL API with DataStream/DataSet API support using the Java programming language.
* `flink-table-api-scala-bridge`: The Table & SQL API with DataStream/DataSet API support using the Scala programming language.
* `flink-table-planner`: The table program planner and runtime. This was the only planner of Flink before the 1.9 release. It is still the recommended one.
* `flink-table-planner-blink`: The new Blink planner.
* `flink-table-planner`: The table program planner and runtime. This was the only planner of Flink before the 1.9 release. It's no longer recommended since Flink 1.11.
* `flink-table-planner-blink`: The new Blink planner, which has become the default one since Flink 1.11.
* `flink-table-runtime-blink`: The new Blink runtime.
* `flink-table-uber`: Packages the API modules above plus the old planner into a distribution for most Table & SQL API use cases. The uber JAR file `flink-table-*.jar` is located in the `/lib` directory of a Flink release by default.
* `flink-table-uber-blink`: Packages the API modules above plus the Blink specific modules into a distribution for most Table & SQL API use cases. The uber JAR file `flink-table-blink-*.jar` is located in the `/lib` directory of a Flink release by default.
Expand Down
Loading

0 comments on commit 1c78ab3

Please sign in to comment.