Skip to content

Commit

Permalink
[FLINK-22119][hive][doc] Update document for hive dialect
Browse files Browse the repository at this point in the history
This closes apache#15630
  • Loading branch information
lirui-apache committed Apr 25, 2021
1 parent a4dcd91 commit aea79b9
Show file tree
Hide file tree
Showing 4 changed files with 192 additions and 30 deletions.
81 changes: 70 additions & 11 deletions docs/content.zh/docs/connectors/table/hive/hive_dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,26 +335,85 @@ CREATE FUNCTION function_name AS class_name;
DROP FUNCTION [IF EXISTS] function_name;
```

## DML
## DML & DQL _`Beta`_

### INSERT
Hive 方言支持常用的 Hive [DML](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)
[DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select) 。 下表列出了一些 Hive 方言支持的语法。

```sql
INSERT (INTO|OVERWRITE) [TABLE] table_name [PARTITION partition_spec] SELECT ...;
```
- [SORT/CLUSTER/DISTRIBUTE BY](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
- [Group By](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy)
- [Join](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
- [Union](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union)
- [LATERAL VIEW](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView)
- [Window Functions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics)
- [SubQueries](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries)
- [CTE](https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression)
- [INSERT INTO dest schema](https://issues.apache.org/jira/browse/HIVE-9481)
- [Implicit type conversions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-AllowedImplicitConversions)

为了实现更好的语法和语义的兼容,强烈建议使用 [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule)
并将其放在 Module 列表的首位,以便在函数解析时优先使用 Hive 内置函数。

如果指定了 `partition_spec`,可以是完整或者部分分区列。如果是部分指定,则可以省略动态分区的列名。
Hive 方言不再支持 [Flink SQL 语法]({{< ref "docs/dev/table/sql/queries" >}}) 。 若需使用 Flink 语法,请切换到 `default` 方言。

以下是一个使用 Hive 方言的示例。

```bash
Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf');
[INFO] Execute statement succeed.

## DQL
Flink SQL> use catalog myhive;
[INFO] Execute statement succeed.

目前,对于DQL语句 Hive 方言和 Flink SQL 支持的语法相同。有关更多详细信息,请参考[Flink SQL 查询]({{< ref "docs/dev/table/sql/queries" >}})。并且建议切换到 `default` 方言来执行 DQL 语句。
Flink SQL> load module hive;
[INFO] Execute statement succeed.

Flink SQL> use modules hive,core;
[INFO] Execute statement succeed.

Flink SQL> set table.sql-dialect=hive;
[INFO] Session property has been set.

Flink SQL> select explode(array(1,2,3)); -- call hive udtf
+-----+
| col |
+-----+
| 1 |
| 2 |
| 3 |
+-----+
3 rows in set

Flink SQL> create table tbl (key int,value string);
[INFO] Execute statement succeed.

Flink SQL> insert overwrite table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:

Flink SQL> select * from tbl cluster by key; -- run cluster by
2021-04-22 16:13:57,005 INFO org.apache.hadoop.mapred.FileInputFormat [] - Total input paths to process : 1
+-----+-------+
| key | value |
+-----+-------+
| 1 | a |
| 1 | a |
| 5 | e |
| 2 | b |
| 3 | c |
| 3 | c |
| 3 | c |
| 4 | d |
+-----+-------+
8 rows in set
```
## 注意
以下是使用 Hive 方言的一些注意事项。
- Hive 方言只能用于操作 Hive 表,不能用于一般表。Hive 方言应与[HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}})一起使用。
- Hive 方言只能用于操作 Hive 对象,并要求当前 Catalog 是一个 [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}) 。
- Hive 方言只支持 `db.table` 这种两级的标识符,不支持带有 Catalog 名字的标识符。
- 虽然所有 Hive 版本支持相同的语法,但是一些特定的功能是否可用仍取决于你使用的[Hive 版本]({{< ref "docs/connectors/table/hive/overview" >}}#支持的hive版本)。例如,更新数据库位置
只在 Hive-2.4.0 或更高版本支持。
- Hive 和 Calcite 有不同的保留关键字集合。例如,`default` 是 Calcite 的保留关键字,却不是 Hive 的保留关键字。即使使用 Hive 方言, 也必须使用反引号 ( ` ) 引用此类关键字才能将其用作标识符。
- 由于扩展的查询语句的不兼容性,在 Flink 中创建的视图是不能在 Hive 中查询的。
- 执行 DML 和 DQL 时应该使用 [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) 。
24 changes: 24 additions & 0 deletions docs/content.zh/docs/connectors/table/hive/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
// Hive dependencies
hive-exec-2.3.4.jar
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 1.0.0" >}}
Expand All @@ -146,6 +149,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 1.1.0" >}}
Expand All @@ -165,6 +171,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 1.2.1" >}}
Expand All @@ -184,6 +193,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 2.0.0" >}}
Expand All @@ -197,6 +209,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
// Hive dependencies
hive-exec-2.0.0.jar
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 2.1.0" >}}
Expand All @@ -210,6 +225,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
// Hive dependencies
hive-exec-2.1.0.jar
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 2.2.0" >}}
Expand All @@ -227,6 +245,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
orc-core-1.4.3.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 3.1.0" >}}
Expand All @@ -241,6 +262,9 @@ export HADOOP_CLASSPATH=`hadoop classpath`
hive-exec-3.1.0.jar
libfb303-0.9.3.jar // libfb303 is not packed into hive-exec in some versions, need to add it separately
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< /tabs >}}
Expand Down
93 changes: 74 additions & 19 deletions docs/content/docs/connectors/table/hive/hive_dialect.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,8 +300,6 @@ CREATE VIEW [IF NOT EXISTS] view_name [(column_name, ...) ]

#### Alter

**NOTE**: Altering view only works in Table API, but not supported via SQL client.

##### Rename

```sql
Expand Down Expand Up @@ -346,33 +344,90 @@ CREATE FUNCTION function_name AS class_name;
DROP FUNCTION [IF EXISTS] function_name;
```

## DML
## DML & DQL _`Beta`_

### INSERT
Hive dialect supports a commonly-used subset of Hive's [DML](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML)
and [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). The following lists some examples of
HiveQL supported by the Hive dialect.

```sql
INSERT (INTO|OVERWRITE) [TABLE] table_name [PARTITION partition_spec] SELECT ...;
```
- [SORT/CLUSTER/DISTRIBUTE BY](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy)
- [Group By](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy)
- [Join](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins)
- [Union](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Union)
- [LATERAL VIEW](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView)
- [Window Functions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics)
- [SubQueries](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries)
- [CTE](https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression)
- [INSERT INTO dest schema](https://issues.apache.org/jira/browse/HIVE-9481)
- [Implicit type conversions](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-AllowedImplicitConversions)

In order to have better syntax and semantic compatibility, it's highly recommended to use [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule)
and place it first in the module list, so that Hive built-in functions can be picked up during function resolution.

Hive dialect no longer supports [Flink SQL queries]({{< ref "docs/dev/table/sql/queries" >}}). Please switch to `default`
dialect if you'd like to write in Flink syntax.

Following is an example of using hive dialect to run some queries.

The `partition_spec`, if present, can be either a full spec or partial spec. If the `partition_spec` is a partial
spec, the dynamic partition column names can be omitted.
```bash
Flink SQL> create catalog myhive with ('type' = 'hive', 'hive-conf-dir' = '/opt/hive-conf');
[INFO] Execute statement succeed.

Flink SQL> use catalog myhive;
[INFO] Execute statement succeed.

Flink SQL> load module hive;
[INFO] Execute statement succeed.

## DQL
Flink SQL> use modules hive,core;
[INFO] Execute statement succeed.

At the moment, Hive dialect supports the same syntax as Flink SQL for DQLs. Refer to
[Flink SQL queries]({{< ref "docs/dev/table/sql/queries" >}}) for more details. And it's recommended to switch to
`default` dialect to execute DQLs.
Flink SQL> set table.sql-dialect=hive;
[INFO] Session property has been set.

Flink SQL> select explode(array(1,2,3)); -- call hive udtf
+-----+
| col |
+-----+
| 1 |
| 2 |
| 3 |
+-----+
3 rows in set

Flink SQL> create table tbl (key int,value string);
[INFO] Execute statement succeed.

Flink SQL> insert overwrite table tbl values (5,'e'),(1,'a'),(1,'a'),(3,'c'),(2,'b'),(3,'c'),(3,'c'),(4,'d');
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:

Flink SQL> select * from tbl cluster by key; -- run cluster by
2021-04-22 16:13:57,005 INFO org.apache.hadoop.mapred.FileInputFormat [] - Total input paths to process : 1
+-----+-------+
| key | value |
+-----+-------+
| 1 | a |
| 1 | a |
| 5 | e |
| 2 | b |
| 3 | c |
| 3 | c |
| 3 | c |
| 4 | d |
+-----+-------+
8 rows in set
```
## Notice
The following are some precautions for using the Hive dialect.
- Hive dialect should only be used to manipulate Hive tables, not generic tables. And Hive dialect should be used together
with a [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}).
- Hive dialect should only be used to process Hive meta objects, and requires the current catalog to be a
[HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}).
- Hive dialect only supports 2-part identifiers, so you can't specify catalog for an identifier.
- While all Hive versions support the same syntax, whether a specific feature is available still depends on the
[Hive version]({{< ref "docs/connectors/table/hive/overview" >}}#supported-hive-versions) you use. For example, updating database
location is only supported in Hive-2.4.0 or later.
- Hive and Calcite have different sets of reserved keywords. For example, `default` is a reserved keyword in Calcite and
a non-reserved keyword in Hive. Even with Hive dialect, you have to quote such keywords with backtick ( ` ) in order to
use them as identifiers.
- Due to expanded query incompatibility, views created in Flink cannot be queried in Hive.
- Use [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule)
to run DML and DQL.
24 changes: 24 additions & 0 deletions docs/content/docs/connectors/table/hive/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,9 @@ Please find the required dependencies for different Hive major versions below.
// Hive dependencies
hive-exec-2.3.4.jar
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 1.0.0" >}}
Expand All @@ -150,6 +153,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 1.1.0" >}}
Expand All @@ -169,6 +175,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 1.2.1" >}}
Expand All @@ -188,6 +197,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3-nohive.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 2.0.0" >}}
Expand All @@ -201,6 +213,9 @@ Please find the required dependencies for different Hive major versions below.
// Hive dependencies
hive-exec-2.0.0.jar
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 2.1.0" >}}
Expand All @@ -214,6 +229,9 @@ Please find the required dependencies for different Hive major versions below.
// Hive dependencies
hive-exec-2.1.0.jar
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 2.2.0" >}}
Expand All @@ -231,6 +249,9 @@ Please find the required dependencies for different Hive major versions below.
orc-core-1.4.3.jar
aircompressor-0.8.jar // transitive dependency of orc-core
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< tab "Hive 3.1.0" >}}
Expand All @@ -245,6 +266,9 @@ Please find the required dependencies for different Hive major versions below.
hive-exec-3.1.0.jar
libfb303-0.9.3.jar // libfb303 is not packed into hive-exec in some versions, need to add it separately
// add antlr-runtime if you need to use hive dialect
antlr-runtime-3.5.2.jar
```
{{< /tab >}}
{{< /tabs >}}
Expand Down

0 comments on commit aea79b9

Please sign in to comment.