Skip to content

Commit

Permalink
[FLINK-15999][doc] Remove programming-model section and incorporate i…
Browse files Browse the repository at this point in the history
…nto overview

This also adds explanation about the concepts section
  • Loading branch information
aljoscha committed Feb 21, 2020
1 parent ac4e757 commit 87e6cc2
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 288 deletions.
58 changes: 56 additions & 2 deletions docs/concepts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ nav-id: concepts
nav-pos: 2
nav-title: '<i class="fa fa-map-o title appetizer" aria-hidden="true"></i> Concepts'
nav-parent_id: root
layout: redirect
redirect: /concepts/programming-model.html
nav-show_overview: true
permalink: /concepts/index.html
always-expand: true
---
Expand All @@ -27,3 +26,58 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

Flink offers different levels of abstraction for developing streaming/batch applications.

<img src="{{ site.baseurl }}/fig/levels_of_abstraction.svg" alt="Programming levels of abstraction" class="offset" width="80%" />

- The lowest level abstraction simply offers **stateful streaming**. It is
embedded into the [DataStream API]({{ site.baseurl}}{% link
dev/datastream_api.md %}) via the [Process Function]({{ site.baseurl }}{%
link dev/stream/operators/process_function.md %}). It allows users freely
process events from one or more streams, and use consistent fault tolerant
*state*. In addition, users can register event time and processing time
callbacks, allowing programs to realize sophisticated computations.

- In practice, most applications would not need the above described low level
abstraction, but would instead program against the **Core APIs** like the
[DataStream API]({{ site.baseurl }}{% link dev/datastream_api.md %})
(bounded/unbounded streams) and the [DataSet API]({{ site.baseurl }}{% link
dev/batch/index.md %}) (bounded data sets). These fluent APIs offer the
common building blocks for data processing, like various forms of
user-specified transformations, joins, aggregations, windows, state, etc.
Data types processed in these APIs are represented as classes in the
respective programming languages.

The low level *Process Function* integrates with the *DataStream API*,
making it possible to go the lower level abstraction for certain operations
only. The *DataSet API* offers additional primitives on bounded data sets,
like loops/iterations.

- The **Table API** is a declarative DSL centered around *tables*, which may
be dynamically changing tables (when representing streams). The [Table
API]({{ site.baseurl }}{% link dev/table/index.md %}) follows the
(extended) relational model: Tables have a schema attached (similar to
tables in relational databases) and the API offers comparable operations,
such as select, project, join, group-by, aggregate, etc. Table API
programs declaratively define *what logical operation should be done*
rather than specifying exactly *how the code for the operation looks*.
Though the Table API is extensible by various types of user-defined
functions, it is less expressive than the *Core APIs*, but more concise to
use (less code to write). In addition, Table API programs also go through
an optimizer that applies optimization rules before execution.

One can seamlessly convert between tables and *DataStream*/*DataSet*,
allowing programs to mix *Table API* and with the *DataStream* and
*DataSet* APIs.

- The highest level abstraction offered by Flink is **SQL**. This abstraction
is similar to the *Table API* both in semantics and expressiveness, but
represents programs as SQL query expressions. The [SQL]({{ site.baseurl
}}{% link dev/table/index.md %}#sql) abstraction closely interacts with the
Table API, and SQL queries can be executed over tables defined in the
*Table API*.

This _concepts_ section explains the basic concepts behind the different APIs,
that is the concepts behind Flink as a stateful and timely stream processing
system.
58 changes: 56 additions & 2 deletions docs/concepts/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@ nav-id: concepts
nav-pos: 2
nav-title: '<i class="fa fa-map-o title appetizer" aria-hidden="true"></i> 概念'
nav-parent_id: root
layout: redirect
redirect: /concepts/programming-model.html
nav-show_overview: true
permalink: /concepts/index.html
always-expand: true
---
Expand All @@ -27,3 +26,58 @@ KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

Flink offers different levels of abstraction for developing streaming/batch applications.

<img src="{{ site.baseurl }}/fig/levels_of_abstraction.svg" alt="Programming levels of abstraction" class="offset" width="80%" />

- The lowest level abstraction simply offers **stateful streaming**. It is
embedded into the [DataStream API]({{ site.baseurl}}{% link
dev/datastream_api.md %}) via the [Process Function]({{ site.baseurl }}{%
link dev/stream/operators/process_function.md %}). It allows users freely
process events from one or more streams, and use consistent fault tolerant
*state*. In addition, users can register event time and processing time
callbacks, allowing programs to realize sophisticated computations.

- In practice, most applications would not need the above described low level
abstraction, but would instead program against the **Core APIs** like the
[DataStream API]({{ site.baseurl }}{% link dev/datastream_api.md %})
(bounded/unbounded streams) and the [DataSet API]({{ site.baseurl }}{% link
dev/batch/index.md %}) (bounded data sets). These fluent APIs offer the
common building blocks for data processing, like various forms of
user-specified transformations, joins, aggregations, windows, state, etc.
Data types processed in these APIs are represented as classes in the
respective programming languages.

The low level *Process Function* integrates with the *DataStream API*,
making it possible to go the lower level abstraction for certain operations
only. The *DataSet API* offers additional primitives on bounded data sets,
like loops/iterations.

- The **Table API** is a declarative DSL centered around *tables*, which may
be dynamically changing tables (when representing streams). The [Table
API]({{ site.baseurl }}{% link dev/table/index.md %}) follows the
(extended) relational model: Tables have a schema attached (similar to
tables in relational databases) and the API offers comparable operations,
such as select, project, join, group-by, aggregate, etc. Table API
programs declaratively define *what logical operation should be done*
rather than specifying exactly *how the code for the operation looks*.
Though the Table API is extensible by various types of user-defined
functions, it is less expressive than the *Core APIs*, but more concise to
use (less code to write). In addition, Table API programs also go through
an optimizer that applies optimization rules before execution.

One can seamlessly convert between tables and *DataStream*/*DataSet*,
allowing programs to mix *Table API* and with the *DataStream* and
*DataSet* APIs.

- The highest level abstraction offered by Flink is **SQL**. This abstraction
is similar to the *Table API* both in semantics and expressiveness, but
represents programs as SQL query expressions. The [SQL]({{ site.baseurl
}}{% link dev/table/index.md %}#sql) abstraction closely interacts with the
Table API, and SQL queries can be executed over tables defined in the
*Table API*.

This _concepts_ section explains the basic concepts behind the different APIs,
that is the concepts behind Flink as a stateful and timely stream processing
system.
67 changes: 0 additions & 67 deletions docs/concepts/programming-model.md

This file was deleted.

Loading

0 comments on commit 87e6cc2

Please sign in to comment.