--- title: "Table API" nav-parent_id: tableapi nav-pos: 20 --- The Table API is a unified, relational API for stream and batch processing. Table API queries can be run on batch or streaming input without modifications. The Table API is a super set of the SQL language and is specially designed for working with Apache Flink. The Table API is a language-integrated API for Scala and Java. Instead of specifying queries as String values as common with SQL, Table API queries are defined in a language-embedded style in Java or Scala with IDE support like autocompletion and syntax validation. The Table API shares many concepts and parts of its API with Flink's SQL integration. Have a look at the [Common Concepts & API]({{ site.baseurl }}/dev/table/common.html) to learn how to register tables or to create a `Table` object. The [Streaming Concepts](./streaming) pages discuss streaming specific concepts such as dynamic tables and time attributes. The following examples assume a registered table called `Orders` with attributes `(a, b, c, rowtime)`. The `rowtime` field is either a logical [time attribute](./streaming/time_attributes.html) in streaming or a regular timestamp field in batch. * This will be replaced by the TOC {:toc} Overview & Examples ----------------------------- The Table API is available for Scala and Java. The Scala Table API leverages on Scala expressions, the Java Table API is based on strings which are parsed and converted into equivalent expressions. The following example shows the differences between the Scala and Java Table API. The table program is executed in a batch environment. It scans the `Orders` table, groups by field `a`, and counts the resulting rows per group. The result of the table program is converted into a `DataSet` of type `Row` and printed.
Operators | Description |
---|---|
Scan Batch Streaming |
Similar to the FROM clause in a SQL query. Performs a scan of a registered table. {% highlight java %} Table orders = tableEnv.scan("Orders"); {% endhighlight %} |
Select Batch Streaming |
Similar to a SQL SELECT statement. Performs a select operation. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.select("a, c as d"); {% endhighlight %}You can use star ( |
As Batch Streaming |
Renames fields. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.as("x, y, z, t"); {% endhighlight %} |
Where / Filter Batch Streaming |
Similar to a SQL WHERE clause. Filters out rows that do not pass the filter predicate. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.where("b === 'red'"); {% endhighlight %} or {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.filter("a % 2 === 0"); {% endhighlight %} |
Operators | Description |
---|---|
Scan Batch Streaming |
Similar to the FROM clause in a SQL query. Performs a scan of a registered table. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") {% endhighlight %} |
Select Batch Streaming |
Similar to a SQL SELECT statement. Performs a select operation. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result = orders.select('a, 'c as 'd) {% endhighlight %}You can use star ( |
As Batch Streaming |
Renames fields. {% highlight scala %} val orders: Table = tableEnv.scan("Orders").as('x, 'y, 'z, 't) {% endhighlight %} |
Where / Filter Batch Streaming |
Similar to a SQL WHERE clause. Filters out rows that do not pass the filter predicate. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result = orders.filter('a % 2 === 0) {% endhighlight %} or {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result = orders.where('b === "red") {% endhighlight %} |
操作 | 描述 |
---|---|
Scan 批处理 流处理 |
类似于SQL请求中的FROM子句,将一个环境中已注册的表转换成Table对象。 {% highlight python %} orders = table_env.scan("Orders"); {% endhighlight %} |
Select 批处理 流处理 |
类似于SQL请求中的SELECT子句,执行一个select操作。 {% highlight python %} orders = table_env.scan("Orders"); result = orders.select("a, c as d"); {% endhighlight %}您可以使用星号 ( |
Alias 批处理 流处理 |
重命名字段。 {% highlight python %} orders = table_env.scan("Orders"); result = orders.alias("x, y, z, t"); {% endhighlight %} |
Where / Filter 批处理 流处理 |
类似于SQL请求中的WHERE子句,过滤掉表中不满足条件的行。 {% highlight python %} orders = table_env.scan("Orders"); result = orders.where("b === 'red'"); {% endhighlight %} or {% highlight python %} orders = table_env.scan("Orders"); result = orders.filter("a % 2 === 0"); {% endhighlight %} |
Operators | Description |
---|---|
AddColumns Batch Streaming |
Performs a field add operation. It will throw an exception if the added fields already exist. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.addColumns("concat(c, 'sunny')"); {% endhighlight %} |
AddOrReplaceColumns Batch Streaming |
Performs a field add operation. Existing fields will be replaced if add columns name is the same as the existing column name. Moreover, if the added fields have duplicate field name, then the last one is used. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.addOrReplaceColumns("concat(c, 'sunny') as desc"); {% endhighlight %} |
DropColumns Batch Streaming |
Performs a field drop operation. The field expressions should be field reference expressions, and only existing fields can be dropped. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.dropColumns("b, c"); {% endhighlight %} |
RenameColumns Batch Streaming |
Performs a field rename operation. The field expressions should be alias expressions, and only the existing fields can be renamed. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.renameColumns("b as b2, c as c2"); {% endhighlight %} |
Operators | Description |
---|---|
AddColumns Batch Streaming |
Performs a field add operation. It will throw an exception if the added fields already exist. {% highlight scala %} val orders = tableEnv.scan("Orders"); val result = orders.addColumns(concat('c, "Sunny")) {% endhighlight %} |
AddOrReplaceColumns Batch Streaming |
Performs a field add operation. Existing fields will be replaced if add columns name is the same as the existing column name. Moreover, if the added fields have duplicate field name, then the last one is used. {% highlight scala %} val orders = tableEnv.scan("Orders"); val result = orders.addOrReplaceColumns(concat('c, "Sunny") as 'desc) {% endhighlight %} |
DropColumns Batch Streaming |
Performs a field drop operation. The field expressions should be field reference expressions, and only existing fields can be dropped. {% highlight scala %} val orders = tableEnv.scan("Orders"); val result = orders.dropColumns('b, 'c) {% endhighlight %} |
RenameColumns Batch Streaming |
Performs a field rename operation. The field expressions should be alias expressions, and only the existing fields can be renamed. {% highlight scala %} val orders = tableEnv.scan("Orders"); val result = orders.renameColumns('b as 'b2, 'c as 'c2) {% endhighlight %} |
Operators | Description |
---|---|
GroupBy Aggregation Batch Streaming Result Updating |
Similar to a SQL GROUP BY clause. Groups the rows on the grouping keys with a following running aggregation operator to aggregate rows group-wise. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.groupBy("a").select("a, b.sum as d"); {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the type of aggregation and the number of distinct grouping keys. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
GroupBy Window Aggregation Batch Streaming |
Groups and aggregates a table on a group window and possibly one or more grouping keys. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders .window(Tumble.over("5.minutes").on("rowtime").as("w")) // define window .groupBy("a, w") // group by key and window .select("a, w.start, w.end, w.rowtime, b.sum as d"); // access window properties and aggregate {% endhighlight %} |
Over Window Aggregation Streaming |
Similar to a SQL OVER clause. Over window aggregates are computed for each row, based on a window (range) of preceding and succeeding rows. See the over windows section for more details. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders // define window .window(Over .partitionBy("a") .orderBy("rowtime") .preceding("UNBOUNDED_RANGE") .following("CURRENT_RANGE") .as("w")) .select("a, b.avg over w, b.max over w, b.min over w"); // sliding aggregate {% endhighlight %}Note: All aggregates must be defined over the same window, i.e., same partitioning, sorting, and range. Currently, only windows with PRECEDING (UNBOUNDED and bounded) to CURRENT ROW range are supported. Ranges with FOLLOWING are not supported yet. ORDER BY must be specified on a single time attribute. |
Distinct Aggregation Batch Streaming Result Updating |
Similar to a SQL DISTINCT aggregation clause such as COUNT(DISTINCT a). Distinct aggregation declares that an aggregation function (built-in or user-defined) is only applied on distinct input values. Distinct can be applied to GroupBy Aggregation, GroupBy Window Aggregation and Over Window Aggregation. {% highlight java %} Table orders = tableEnv.scan("Orders"); // Distinct aggregation on group by Table groupByDistinctResult = orders .groupBy("a") .select("a, b.sum.distinct as d"); // Distinct aggregation on time window group by Table groupByWindowDistinctResult = orders .window(Tumble.over("5.minutes").on("rowtime").as("w")).groupBy("a, w") .select("a, b.sum.distinct as d"); // Distinct aggregation on over window Table result = orders .window(Over .partitionBy("a") .orderBy("rowtime") .preceding("UNBOUNDED_RANGE") .as("w")) .select("a, b.avg.distinct over w, b.max over w, b.min over w"); {% endhighlight %}User-defined aggregation function can also be used with DISTINCT modifiers. To calculate the aggregate results only for distinct values, simply add the distinct modifier towards the aggregation function. {% highlight java %} Table orders = tEnv.scan("Orders"); // Use distinct aggregation for user-defined aggregate functions tEnv.registerFunction("myUdagg", new MyUdagg()); orders.groupBy("users").select("users, myUdagg.distinct(points) as myDistinctResult"); {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Distinct Batch Streaming Result Updating |
Similar to a SQL DISTINCT clause. Returns records with distinct value combinations. {% highlight java %} Table orders = tableEnv.scan("Orders"); Table result = orders.distinct(); {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Operators | Description |
---|---|
GroupBy Aggregation Batch Streaming Result Updating |
Similar to a SQL GROUP BY clause. Groups the rows on the grouping keys with a following running aggregation operator to aggregate rows group-wise. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result = orders.groupBy('a).select('a, 'b.sum as 'd) {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the type of aggregation and the number of distinct grouping keys. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
GroupBy Window Aggregation Batch Streaming |
Groups and aggregates a table on a group window and possibly one or more grouping keys. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result: Table = orders .window(Tumble over 5.minutes on 'rowtime as 'w) // define window .groupBy('a, 'w) // group by key and window .select('a, w.start, 'w.end, 'w.rowtime, 'b.sum as 'd) // access window properties and aggregate {% endhighlight %} |
Over Window Aggregation Streaming |
Similar to a SQL OVER clause. Over window aggregates are computed for each row, based on a window (range) of preceding and succeeding rows. See the over windows section for more details. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result: Table = orders // define window .window(Over partitionBy 'a orderBy 'rowtime preceding UNBOUNDED_RANGE following CURRENT_RANGE as 'w) .select('a, 'b.avg over 'w, 'b.max over 'w, 'b.min over 'w) // sliding aggregate {% endhighlight %}Note: All aggregates must be defined over the same window, i.e., same partitioning, sorting, and range. Currently, only windows with PRECEDING (UNBOUNDED and bounded) to CURRENT ROW range are supported. Ranges with FOLLOWING are not supported yet. ORDER BY must be specified on a single time attribute. |
Distinct Aggregation Batch Streaming Result Updating |
Similar to a SQL DISTINCT AGGREGATION clause such as COUNT(DISTINCT a). Distinct aggregation declares that an aggregation function (built-in or user-defined) is only applied on distinct input values. Distinct can be applied to GroupBy Aggregation, GroupBy Window Aggregation and Over Window Aggregation. {% highlight scala %} val orders: Table = tableEnv.scan("Orders"); // Distinct aggregation on group by val groupByDistinctResult = orders .groupBy('a) .select('a, 'b.sum.distinct as 'd) // Distinct aggregation on time window group by val groupByWindowDistinctResult = orders .window(Tumble over 5.minutes on 'rowtime as 'w).groupBy('a, 'w) .select('a, 'b.sum.distinct as 'd) // Distinct aggregation on over window val result = orders .window(Over partitionBy 'a orderBy 'rowtime preceding UNBOUNDED_RANGE as 'w) .select('a, 'b.avg.distinct over 'w, 'b.max over 'w, 'b.min over 'w) {% endhighlight %}User-defined aggregation function can also be used with DISTINCT modifiers. To calculate the aggregate results only for distinct values, simply add the distinct modifier towards the aggregation function. {% highlight scala %} val orders: Table = tEnv.scan("Orders"); // Use distinct aggregation for user-defined aggregate functions val myUdagg = new MyUdagg(); orders.groupBy('users).select('users, myUdagg.distinct('points) as 'myDistinctResult); {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Distinct Batch |
Similar to a SQL DISTINCT clause. Returns records with distinct value combinations. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") val result = orders.distinct() {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct fields. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Operators | Description |
---|---|
Inner Join Batch Streaming |
Similar to a SQL JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined through join operator or using a where or filter operator. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.join(right).where("a = d").select("a, b, e"); {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Outer Join Batch Streaming Result Updating |
Similar to SQL LEFT/RIGHT/FULL OUTER JOIN clauses. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table leftOuterResult = left.leftOuterJoin(right, "a = d").select("a, b, e"); Table rightOuterResult = left.rightOuterJoin(right, "a = d").select("a, b, e"); Table fullOuterResult = left.fullOuterJoin(right, "a = d").select("a, b, e"); {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Time-windowed Join Batch Streaming |
Note: Time-windowed joins are a subset of regular joins that can be processed in a streaming fashion. A time-windowed join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Such a condition can be defined by two appropriate range predicates ( For example, the following predicates are valid window join conditions:
|
Inner Join with Table Function Batch Streaming |
Joins a table with the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. A row of the left (outer) table is dropped, if its table function call returns an empty result. {% highlight java %} // register User-Defined Table Function TableFunction |
Left Outer Join with Table Function Batch Streaming |
Joins a table with the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. If a table function call returns an empty result, the corresponding outer row is preserved and the result padded with null values. Note: Currently, the predicate of a table function left outer join can only be empty or literal |
Join with Temporal Table Streaming |
Temporal tables are tables that track changes over time. A temporal table function provides access to the state of a temporal table at a specific point in time. The syntax to join a table with a temporal table function is the same as in Inner Join with Table Function. Currently only inner joins with temporal tables are supported. {% highlight java %} Table ratesHistory = tableEnv.scan("RatesHistory"); // register temporal table function with a time attribute and primary key TemporalTableFunction rates = ratesHistory.createTemporalTableFunction( "r_proctime", "r_currency"); tableEnv.registerFunction("rates", rates); // join with "Orders" based on the time attribute and key Table orders = tableEnv.scan("Orders"); Table result = orders .joinLateral("rates(o_proctime)", "o_currency = r_currency") {% endhighlight %}For more information please check the more detailed temporal tables concept description. |
Operators | Description |
---|---|
Inner Join Batch Streaming |
Similar to a SQL JOIN clause. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined through join operator or using a where or filter operator. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'd, 'e, 'f) val result = left.join(right).where('a === 'd).select('a, 'b, 'e) {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Outer Join Batch Streaming Result Updating |
Similar to SQL LEFT/RIGHT/FULL OUTER JOIN clauses. Joins two tables. Both tables must have distinct field names and at least one equality join predicate must be defined. {% highlight scala %} val left = tableEnv.fromDataSet(ds1, 'a, 'b, 'c) val right = tableEnv.fromDataSet(ds2, 'd, 'e, 'f) val leftOuterResult = left.leftOuterJoin(right, 'a === 'd).select('a, 'b, 'e) val rightOuterResult = left.rightOuterJoin(right, 'a === 'd).select('a, 'b, 'e) val fullOuterResult = left.fullOuterJoin(right, 'a === 'd).select('a, 'b, 'e) {% endhighlight %}Note: For streaming queries the required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Time-windowed Join Batch Streaming |
Note: Time-windowed joins are a subset of regular joins that can be processed in a streaming fashion. A time-windowed join requires at least one equi-join predicate and a join condition that bounds the time on both sides. Such a condition can be defined by two appropriate range predicates ( For example, the following predicates are valid window join conditions:
|
Inner Join with Table Function Batch Streaming |
Joins a table with the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. A row of the left (outer) table is dropped, if its table function call returns an empty result. {% highlight scala %} // instantiate User-Defined Table Function val split: TableFunction[_] = new MySplitUDTF() // join val result: Table = table .joinLateral(split('c) as ('s, 't, 'v)) .select('a, 'b, 's, 't, 'v) {% endhighlight %} |
Left Outer Join with Table Function Batch Streaming |
Joins a table with the results of a table function. Each row of the left (outer) table is joined with all rows produced by the corresponding call of the table function. If a table function call returns an empty result, the corresponding outer row is preserved and the result padded with null values. Note: Currently, the predicate of a table function left outer join can only be empty or literal |
Join with Temporal Table Streaming |
Temporal tables are tables that track their changes over time. A temporal table function provides access to the state of a temporal table at a specific point in time. The syntax to join a table with a temporal table function is the same as in Inner Join with Table Function. Currently only inner joins with temporal tables are supported. {% highlight scala %} val ratesHistory = tableEnv.scan("RatesHistory") // register temporal table function with a time attribute and primary key val rates = ratesHistory.createTemporalTableFunction('r_proctime, 'r_currency) // join with "Orders" based on the time attribute and key val orders = tableEnv.scan("Orders") val result = orders .joinLateral(rates('o_rowtime), 'r_currency === 'o_currency) {% endhighlight %}For more information please check the more detailed temporal tables concept description. |
Operators | Description |
---|---|
Union Batch |
Similar to a SQL UNION clause. Unions two tables with duplicate records removed. Both tables must have identical field types. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.union(right); {% endhighlight %} |
UnionAll Batch Streaming |
Similar to a SQL UNION ALL clause. Unions two tables. Both tables must have identical field types. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.unionAll(right); {% endhighlight %} |
Intersect Batch |
Similar to a SQL INTERSECT clause. Intersect returns records that exist in both tables. If a record is present one or both tables more than once, it is returned just once, i.e., the resulting table has no duplicate records. Both tables must have identical field types. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.intersect(right); {% endhighlight %} |
IntersectAll Batch |
Similar to a SQL INTERSECT ALL clause. IntersectAll returns records that exist in both tables. If a record is present in both tables more than once, it is returned as many times as it is present in both tables, i.e., the resulting table might have duplicate records. Both tables must have identical field types. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "d, e, f"); Table result = left.intersectAll(right); {% endhighlight %} |
Minus Batch |
Similar to a SQL EXCEPT clause. Minus returns records from the left table that do not exist in the right table. Duplicate records in the left table are returned exactly once, i.e., duplicates are removed. Both tables must have identical field types. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.minus(right); {% endhighlight %} |
MinusAll Batch |
Similar to a SQL EXCEPT ALL clause. MinusAll returns the records that do not exist in the right table. A record that is present n times in the left table and m times in the right table is returned (n - m) times, i.e., as many duplicates as are present in the right table are removed. Both tables must have identical field types. {% highlight java %} Table left = tableEnv.fromDataSet(ds1, "a, b, c"); Table right = tableEnv.fromDataSet(ds2, "a, b, c"); Table result = left.minusAll(right); {% endhighlight %} |
In Batch Streaming |
Similar to a SQL IN clause. In returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression. {% highlight java %} Table left = ds1.toTable(tableEnv, "a, b, c"); Table right = ds2.toTable(tableEnv, "a"); // using implicit registration Table result = left.select("a, b, c").where("a.in(" + right + ")"); // using explicit registration tableEnv.registerTable("RightTable", right); Table result = left.select("a, b, c").where("a.in(RightTable)"); {% endhighlight %}Note: For streaming queries the operation is rewritten in a join and group operation. The required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Operators | Description |
---|---|
Union Batch |
Similar to a SQL UNION clause. Unions two tables with duplicate records removed, both tables must have identical field types. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'a, 'b, 'c) val result = left.union(right) {% endhighlight %} |
UnionAll Batch Streaming |
Similar to a SQL UNION ALL clause. Unions two tables, both tables must have identical field types. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'a, 'b, 'c) val result = left.unionAll(right) {% endhighlight %} |
Intersect Batch |
Similar to a SQL INTERSECT clause. Intersect returns records that exist in both tables. If a record is present in one or both tables more than once, it is returned just once, i.e., the resulting table has no duplicate records. Both tables must have identical field types. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'e, 'f, 'g) val result = left.intersect(right) {% endhighlight %} |
IntersectAll Batch |
Similar to a SQL INTERSECT ALL clause. IntersectAll returns records that exist in both tables. If a record is present in both tables more than once, it is returned as many times as it is present in both tables, i.e., the resulting table might have duplicate records. Both tables must have identical field types. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'e, 'f, 'g) val result = left.intersectAll(right) {% endhighlight %} |
Minus Batch |
Similar to a SQL EXCEPT clause. Minus returns records from the left table that do not exist in the right table. Duplicate records in the left table are returned exactly once, i.e., duplicates are removed. Both tables must have identical field types. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'a, 'b, 'c) val result = left.minus(right) {% endhighlight %} |
MinusAll Batch |
Similar to a SQL EXCEPT ALL clause. MinusAll returns the records that do not exist in the right table. A record that is present n times in the left table and m times in the right table is returned (n - m) times, i.e., as many duplicates as are present in the right table are removed. Both tables must have identical field types. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'a, 'b, 'c) val result = left.minusAll(right) {% endhighlight %} |
In Batch Streaming |
Similar to a SQL IN clause. In returns true if an expression exists in a given table sub-query. The sub-query table must consist of one column. This column must have the same data type as the expression. {% highlight scala %} val left = ds1.toTable(tableEnv, 'a, 'b, 'c) val right = ds2.toTable(tableEnv, 'a) val result = left.select('a, 'b, 'c).where('a.in(right)) {% endhighlight %}Note: For streaming queries the operation is rewritten in a join and group operation. The required state to compute the query result might grow infinitely depending on the number of distinct input rows. Please provide a query configuration with valid retention interval to prevent excessive state size. See Query Configuration for details. |
Operators | Description |
---|---|
Order By Batch |
Similar to a SQL ORDER BY clause. Returns records globally sorted across all parallel partitions. {% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); Table result = in.orderBy("a.asc"); {% endhighlight %} |
Offset & Fetch Batch |
Similar to the SQL OFFSET and FETCH clauses. Offset and Fetch limit the number of records returned from a sorted result. Offset and Fetch are technically part of the Order By operator and thus must be preceded by it. {% highlight java %} Table in = tableEnv.fromDataSet(ds, "a, b, c"); // returns the first 5 records from the sorted result Table result1 = in.orderBy("a.asc").fetch(5); // skips the first 3 records and returns all following records from the sorted result Table result2 = in.orderBy("a.asc").offset(3); // skips the first 10 records and returns the next 5 records from the sorted result Table result3 = in.orderBy("a.asc").offset(10).fetch(5); {% endhighlight %} |
Operators | Description |
---|---|
Order By Batch |
Similar to a SQL ORDER BY clause. Returns records globally sorted across all parallel partitions. {% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c) val result = in.orderBy('a.asc) {% endhighlight %} |
Offset & Fetch Batch |
Similar to the SQL OFFSET and FETCH clauses. Offset and Fetch limit the number of records returned from a sorted result. Offset and Fetch are technically part of the Order By operator and thus must be preceded by it. {% highlight scala %} val in = ds.toTable(tableEnv, 'a, 'b, 'c) // returns the first 5 records from the sorted result val result1: Table = in.orderBy('a.asc).fetch(5) // skips the first 3 records and returns all following records from the sorted result val result2: Table = in.orderBy('a.asc).offset(3) // skips the first 10 records and returns the next 5 records from the sorted result val result3: Table = in.orderBy('a.asc).offset(10).fetch(5) {% endhighlight %} |
Operators | Description |
---|---|
Insert Into Batch Streaming |
Similar to the INSERT INTO clause in a SQL query. Performs a insertion into a registered output table. Output tables must be registered in the TableEnvironment (see Register a TableSink). Moreover, the schema of the registered table must match the schema of the query. {% highlight java %} Table orders = tableEnv.scan("Orders"); orders.insertInto("OutOrders"); {% endhighlight %} |
Operators | Description |
---|---|
Insert Into Batch Streaming |
Similar to the INSERT INTO clause in a SQL query. Performs a insertion into a registered output table. Output tables must be registered in the TableEnvironment (see Register a TableSink). Moreover, the schema of the registered table must match the schema of the query. {% highlight scala %} val orders: Table = tableEnv.scan("Orders") orders.insertInto("OutOrders") {% endhighlight %} |
Method | Description |
---|---|
over |
Defines the length the window, either as time or row-count interval. |
on |
The time attribute to group (time interval) or sort (row count) on. For batch queries this might be any Long or Timestamp attribute. For streaming queries this must be a declared event-time or processing-time time attribute. |
as |
Assigns an alias to the window. The alias is used to reference the window in the following groupBy() clause and optionally to select window properties such as window start, end, or rowtime timestamps in the select() clause. |
Method | Description |
---|---|
over |
Defines the length of the window, either as time or row-count interval. |
every |
Defines the slide interval, either as time or row-count interval. The slide interval must be of the same type as the size interval. |
on |
The time attribute to group (time interval) or sort (row count) on. For batch queries this might be any Long or Timestamp attribute. For streaming queries this must be a declared event-time or processing-time time attribute. |
as |
Assigns an alias to the window. The alias is used to reference the window in the following groupBy() clause and optionally to select window properties such as window start, end, or rowtime timestamps in the select() clause. |
Method | Description |
---|---|
withGap |
Defines the gap between two windows as time interval. |
on |
The time attribute to group (time interval) or sort (row count) on. For batch queries this might be any Long or Timestamp attribute. For streaming queries this must be a declared event-time or processing-time time attribute. |
as |
Assigns an alias to the window. The alias is used to reference the window in the following groupBy() clause and optionally to select window properties such as window start, end, or rowtime timestamps in the select() clause. |
Method | Required | Description |
---|---|---|
partitionBy |
Optional |
Defines a partitioning of the input on one or more attributes. Each partition is individually sorted and aggregate functions are applied to each partition separately. Note: In streaming environments, over window aggregates can only be computed in parallel if the window includes a partition by clause. Without |
orderBy |
Required |
Defines the order of rows within each partition and thereby the order in which the aggregate functions are applied to rows. Note: For streaming queries this must be a declared event-time or processing-time time attribute. Currently, only a single sort attribute is supported. |
preceding |
Optional |
Defines the interval of rows that are included in the window and precede the current row. The interval can either be specified as time or row-count interval. Bounded over windows are specified with the size of the interval, e.g., Unbounded over windows are specified using a constant, i.e., If the |
following |
Optional |
Defines the window interval of rows that are included in the window and follow the current row. The interval must be specified in the same unit as the preceding interval (time or row-count). At the moment, over windows with rows following the current row are not supported. Instead you can specify one of two constants:
If the |
as |
Required |
Assigns an alias to the over window. The alias is used to reference the over window in the following |
Operators | Description |
---|---|
Map Batch Streaming |
Performs a map operation with a user-defined scalar function or built-in scalar function. The output will be flattened if the output type is a composite type. {% highlight java %} ScalarFunction func = new MyMapFunction(); tableEnv.registerFunction("func", func); Table table = input .map(func("c")).as("a, b") {% endhighlight %} |
Operators | Description |
---|---|
Map Batch Streaming |
Performs a map operation with a user-defined scalar function or built-in scalar function. The output will be flattened if the output type is a composite type. {% highlight scala %} val func: ScalarFunction = new MyMapFunction() val table = input .map(func('c)).as('a, 'b) {% endhighlight %} |