[FLINK-17285][python][docs] Translate "Python Table API" page into Ch…

…inese (data type, metrics, etc) This closes apache#13116.
spotify · Aug 17, 2020 · b757816 · b757816
1 parent 29bd21a
commit b757816
Show file tree

Hide file tree

Showing 5 changed files with 83 additions and 90 deletions.
diff --git a/docs/dev/table/python/conversion_of_pandas.zh.md b/docs/dev/table/python/conversion_of_pandas.zh.md
@@ -22,60 +22,57 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-It supports to convert between PyFlink Table and Pandas DataFrame.
+PyFlink支持PyFlink表和Pandas DataFrame之间进行转换。
 
 * This will be replaced by the TOC
 {:toc}
 
-## Convert Pandas DataFrame to PyFlink Table
+## 将Pandas DataFrame转换为PyFlink表
 
-It supports creating a PyFlink Table from a Pandas DataFrame. Internally, it will serialize the Pandas DataFrame
-using Arrow columnar format at client side and the serialized data will be processed and deserialized in Arrow source
-during execution. The Arrow source could also be used in streaming jobs and it will properly handle the checkpoint
-and provides the exactly once guarantees.
+PyFlink支持将Pandas DataFrame转换成PyFlink表。在内部实现上，会在客户端将Pandas DataFrame序列化成Arrow列存格式，序列化后的数据
+在作业执行期间，在Arrow源中会被反序列化，并进行处理。Arrow源除了可以用在批作业中外，还可以用于流作业，它将正确处理检查点并提供恰好一次的保证。
 
-The following example shows how to create a PyFlink Table from a Pandas DataFrame:
+以下示例显示如何从Pandas DataFrame创建PyFlink表：
 
 {% highlight python %}
 import pandas as pd
 import numpy as np
 
-# Create a Pandas DataFrame
+# 创建一个Pandas DataFrame
 pdf = pd.DataFrame(np.random.rand(1000, 2))
 
-# Create a PyFlink Table from a Pandas DataFrame
+# 由Pandas DataFrame创建PyFlink表
 table = t_env.from_pandas(pdf)
 
-# Create a PyFlink Table from a Pandas DataFrame with the specified column names
+# 由Pandas DataFrame创建指定列名的PyFlink表
 table = t_env.from_pandas(pdf, ['f0', 'f1'])
 
-# Create a PyFlink Table from a Pandas DataFrame with the specified column types
+# 由Pandas DataFrame创建指定列类型的PyFlink表
 table = t_env.from_pandas(pdf, [DataTypes.DOUBLE(), DataTypes.DOUBLE()])
 
-# Create a PyFlink Table from a Pandas DataFrame with the specified row type
+# 由Pandas DataFrame创建列名和列类型的PyFlink表
 table = t_env.from_pandas(pdf,
  DataTypes.ROW([DataTypes.FIELD("f0", DataTypes.DOUBLE()),
  DataTypes.FIELD("f1", DataTypes.DOUBLE())])
 {% endhighlight %}
 
-## Convert PyFlink Table to Pandas DataFrame
+## 将PyFlink表转换为Pandas DataFrame
 
-It also supports converting a PyFlink Table to a Pandas DataFrame. Internally, it will materialize the results of the 
-table and serialize them into multiple Arrow batches of Arrow columnar format at client side. The maximum Arrow batch size
-is determined by the config option [python.fn-execution.arrow.batch.size]({{ site.baseurl }}/zh/dev/table/python/python_config.html#python-fn-execution-arrow-batch-size).
-The serialized data will then be converted to Pandas DataFrame. It will collect the content of the table to
-the client side and so please make sure that the content of the table could fit in memory before calling this method.
+除此之外，还支持将PyFlink表转换为Pandas DataFrame。在内部实现上，它将执行表的计算逻辑，得到物化之后的表的执行结果，并
+在客户端将其序列化为Arrow列存格式，最大Arrow批处理大小
+由配置选项[python.fn-execution.arrow.batch.size]({{ site.baseurl }}/zh/dev/table/python/python_config.html#python-fn-execution-arrow-batch-size) 确定。
+序列化后的数据将被转换为Pandas DataFrame。这意味着需要把表的内容收集到客户端，因此在调用此函数之前，请确保表的内容可以容纳在内存中。
 
-The following example shows how to convert a PyFlink Table to a Pandas DataFrame:
+以下示例显示了如何将PyFlink表转换为Pandas DataFrame：
 
 {% highlight python %}
 import pandas as pd
 import numpy as np
 
-# Create a PyFlink Table
+# 创建PyFlink Table
 pdf = pd.DataFrame(np.random.rand(1000, 2))
 table = t_env.from_pandas(pdf, ["a", "b"]).filter("a > 0.5")
 
-# Convert the PyFlink Table to a Pandas DataFrame
+# 转换PyFlink Table为Pandas DataFrame
 pdf = table.to_pandas()
 {% endhighlight %}
diff --git a/docs/dev/table/python/metrics.md b/docs/dev/table/python/metrics.md
@@ -146,7 +146,7 @@ You can refer to the Java metric document for more details on [Scope definition]
 
 ### User Scope
 
-You can define a user scope by calling `MetricGroup.add_group(key: str, value: str = None)`. If extra is not None, creates a new key-value MetricGroup pair. The key group is added to this group's sub-groups, while the value group is added to the key group's sub-groups. In this case, the value group will be returned and a user variable will be defined.
+You can define a user scope by calling `MetricGroup.add_group(key: str, value: str = None)`. If value is not None, creates a new key-value MetricGroup pair. The key group is added to this group's sub-groups, while the value group is added to the key group's sub-groups. In this case, the value group will be returned and a user variable will be defined.
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">

diff --git a/docs/dev/table/python/metrics.zh.md b/docs/dev/table/python/metrics.zh.md
@@ -22,24 +22,24 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-PyFlink exposes a metric system that allows gathering and exposing metrics to external systems. 
+PyFlink支持指标系统，该指标系统允许收集指标并将其暴露给外部系统。
 
 * This will be replaced by the TOC
 {:toc}
 
-## Registering metrics
+## 注册指标
 
-You can access the metric system from a [User-defined Function]({{ site.baseurl }}/zh/dev/table/python/python_udfs.html) by calling `function_context.get_metric_group()` in the `open` method.
-The `get_metric_group()` method returns a `MetricGroup` object on which you can create and register new metrics.
+您可以通过在[用户自定义函数]({{ site.baseurl }}/zh/dev/table/python/python_udfs.html)的`open`方法中调用`function_context.get_metric_group()`来访问指标系统。
+`get_metric_group()`方法返回一个`MetricGroup`对象，您可以在该对象上创建和注册新指标。
 
-### Metric types
+### 指标类型
 
-PyFlink supports `Counters`, `Gauges`, `Distribution` and `Meters`.
+PyFlink支持计数器`Counters`，量表`Gauges`，分布`Distribution`和仪表`Meters`。
 
-#### Counter
+#### 计数器 Counter
 
-A `Counter` is used to count something. The current value can be in- or decremented using `inc()/inc(n: int)` or `dec()/dec(n: int)`.
-You can create and register a `Counter` by calling `counter(name: str)` on a `MetricGroup`.
+`Counter`用于计算某个东西的出现次数。可以通过`inc()/inc(n: int)`或`dec()/dec(n: int)`增加或减少当前值。
+您可以通过在`MetricGroup`上调用`counter(name: str)`来创建和注册`Counter`。
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">
@@ -63,9 +63,9 @@ class MyUDF(ScalarFunction):
 
 </div>
 
-#### Gauge
+#### 量表
 
-A `Gauge` provides a value on demand. You can register a gauge by calling `gauge(name: str, obj: Callable[[], int])` on a MetricGroup. The Callable object will be used to report the values. Gauge metrics are restricted to integer-only values.
+`Gauge`可按需返回数值。您可以通过在MetricGroup上调用`gauge(name: str, obj: Callable[[], int])`来注册一个量表。Callable对象将用于汇报数值。量表指标(Gauge metrics)只能用于汇报整数值。
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">
@@ -88,9 +88,9 @@ class MyUDF(ScalarFunction):
 
 </div>
 
-#### Distribution
+#### 分布（Distribution）
 
-A metric that reports information(sum, count, min, max and mean) about the distribution of reported values. The value can be updated using `update(n: int)`. You can register a distribution by calling `distribution(name: str)` on a MetricGroup. Distribution metrics are restricted to integer-only distributions.
+`Distribution`用于报告关于所报告值分布的信息（总和，计数，最小，最大和平均值）的指标。可以通过`update(n: int)`来更新当前值。您可以通过在MetricGroup上调用`distribution(name: str)`来注册该指标。分布指标(Distribution metrics)只能用于汇报整数指标。
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">
@@ -113,9 +113,10 @@ class MyUDF(ScalarFunction):
 
 </div>
 
-#### Meter
+#### 仪表
 
-A Meter measures an average throughput. An occurrence of an event can be registered with the `mark_event()` method. The occurrence of multiple events at the same time can be registered with mark_event(n: int) method. You can register a meter by calling `meter(self, name: str, time_span_in_seconds: int = 60)` on a MetricGroup. The default value of time_span_in_seconds is 60.
+仪表用于汇报平均吞吐量。可以使用`mark_event()`函数来注册事件的发生，使用mark_event(n: int)函数来注册同时发生的多个事件。
+您可以通过在MetricGroup上调用`meter(self, name: str, time_span_in_seconds: int = 60)`来注册仪表。time_span_in_seconds的默认值为60。
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">
@@ -129,7 +130,7 @@ class MyUDF(ScalarFunction):
 
  def open(self, function_context):
   super().open(function_context)
-  # an average rate of events per second over 120s, default is 60s.
+  # 120秒内统计的平均每秒事件数，默认是60秒
   self.meter = function_context.get_metric_group().meter("my_meter", time_span_in_seconds=120)
 
  def eval(self, i):
@@ -140,13 +141,14 @@ class MyUDF(ScalarFunction):
 
 </div>
 
-## Scope
+## 范围（Scope）
 
-You can refer to the Java metric document for more details on [Scope definition]({{ site.baseurl }}/zh/monitoring/metrics.html#Scope).
+您可以参考Java指标文档以获取有关[范围定义]({{ site.baseurl }}/zh/monitoring/metrics.html#Scope)的更多详细信息。
 
-### User Scope
+### 用户范围（User Scope）
 
-You can define a user scope by calling `MetricGroup.add_group(key: str, value: str = None)`. If extra is not None, creates a new key-value MetricGroup pair. The key group is added to this group's sub-groups, while the value group is added to the key group's sub-groups. In this case, the value group will be returned and a user variable will be defined.
+您可以通过调用`MetricGroup.add_group(key: str, value: str = None)`来定义用户范围。如果``value``不为``None``，则创建一个新的键值``MetricGroup``对。
+其中，键组被添加到该组的子组中，而值组又被添加到键组的子组中。在这种情况下，值组将作为结果返回，与此同时，创建一个用户变量。
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">
@@ -167,19 +169,19 @@ function_context
 
 </div>
 
-### System Scope
+### 系统范围（System Scope）
 
-You can refer to the Java metric document for more details on [System Scope]({{ site.baseurl }}/zh/monitoring/metrics.html#system-scope).
+您可以参考Java指标文档以获取有关[系统范围]({{ site.baseurl }}/zh/monitoring/metrics.html#system-scope)的更多详细信息。
 
-### List of all Variables
+### 所有变量列表
 
-You can refer to the Java metric document for more details on [List of all Variables]({{ site.baseurl }}/zh/monitoring/metrics.html#list-of-all-variables).
+您可以参考Java指标文档以获取有关[“所有变量列表”的]({{ site.baseurl }}/zh/monitoring/metrics.html#list-of-all-variables)更多详细信息。
 
-### User Variables
+### 用户变量（User Variables）
 
-You can define a user variable by calling `MetricGroup.addGroup(key: str, value: str = None)` and specifying the value parameter.
+您可以通过调用`MetricGroup.addGroup(key: str, value: str = None)`并指定value参数来定义用户变量。
 
-**Important:** User variables cannot be used in scope formats.
+**重要提示：**用户变量不能在以`scope format`中使用。
 
 <div class="codetabs" markdown="1">
 <div data-lang="python" markdown="1">
@@ -193,15 +195,15 @@ function_context
 
 </div>
 
-## Common part between PyFlink and Flink
+##  PyFlink和Flink的共通部分
 
-You can refer to the Java metric document for more details on the following sections:
+您可以参考Java的指标文档，以获取关于以下部分的更多详细信息：
 
-- [Reporter]({{ site.baseurl }}/zh/monitoring/metrics.html#reporter).
-- [System metrics]({{ site.baseurl }}/zh/monitoring/metrics.html#system-metrics).
-- [Latency tracking]({{ site.baseurl }}/zh/monitoring/metrics.html#latency-tracking).
-- [REST API integration]({{ site.baseurl }}/zh/monitoring/metrics.html#rest-api-integration).
-- [Dashboard integration]({{ site.baseurl }}/zh/monitoring/metrics.html#dashboard-integration).
+* [Reporter]({{ site.baseurl }}/zh/monitoring/metrics.html#reporter) 。
+* [系统指标]({{ site.baseurl }}/zh/monitoring/metrics.html#system-metrics) 。
+* [延迟跟踪]({{ site.baseurl }}/zh/monitoring/metrics.html#latency-tracking) 。
+* [REST API集成]({{ site.baseurl }}/zh/monitoring/metrics.html#rest-api-integration) 。
+* [仪表板集成]({{ site.baseurl }}/zh/monitoring/metrics.html#dashboard-integration) 。
 
 
 {% top %}
diff --git a/docs/dev/table/python/python_types.zh.md b/docs/dev/table/python/python_types.zh.md
@@ -22,33 +22,28 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-This page describes the data types supported in PyFlink Table API.
+本节描述PyFlink Table API中所支持的数据类型.
 
 * This will be replaced by the TOC
 {:toc}
 
 Data Type
 ---------
 
-A *data type* describes the logical type of a value in the table ecosystem. It can be used to declare input and/or
-output types of Python user-defined functions. Users of the Python Table API work with instances of
-`pyflink.table.types.DataType` within the Python Table API or when defining user-defined functions.
+在Table生态系统中，数据类型用于描述值的逻辑类型。它可以用来声明Python用户自定义函数的输入／输出类型。
+Python Table API的用户可以在Python Table API中，或者定义Python用户自定义函数时，使用`pyflink.table.types.DataType`实例。
 
-A `DataType` instance declares the **logical type** which does not imply a concrete physical representation for transmission
-or storage. All pre-defined data types are available in `pyflink.table.types` and can be instantiated with the utility methods
-defined in `pyflink.table.types.DataTypes`.
+`DataType`实例声明了数据的**逻辑类型**，这并不能用于推断数据在进行传输或存储时的具体物理表示形式。
+所有预定义的数据类型都位于`pyflink.table.types`中，并且可以通过类`pyflink.table.types.DataTypes`中所定义的方法创建。
 
-A list of all pre-defined data types can be found [below]({{ site.baseurl }}/zh/dev/table/types.html#list-of-data-types).
+可以在[下面]({{ site.baseurl }}/zh/dev/table/types.html#list-of-data-types)找到所有预定义数据类型的列表。
 
-Data Type and Python Type Mapping
+数据类型（Data Type）和Python类型的映射关系
 ------------------
 
-A *data type* can be used to declare input and/or output types of Python user-defined functions. The inputs
-will be converted to Python objects corresponding to the data type and the type of the user-defined functions
-result must also match the defined data type.
+*数据类型*可用于声明Python用户自定义函数的输入/输出类型。输入数据将被转换为与所定义的数据类型相对应的Python对象，用户自定义函数的执行结果的类型也必须与所定义的数据类型匹配。
 
-For vectorized Python UDF, the input types and output type are `pandas.Series`. The element type
-of the `pandas.Series` corresponds to the specified data type.
+对于向量化Python UDF，输入类型和输出类型都为`pandas.Series`。`pandas.Series`中的元素类型对应于指定的数据类型。
 
 | Data Type | Python Type | Pandas Type |
 |:-----------------|:-----------------------|