[FLINK-7234] [docs] Fix CombineHint documentation

The CombineHint documentation applies to DataSet#reduce not DataSet#reduceGroup and should also be noted for DataSet#distinct. Also correct the usage where the CombineHint is set with setCombineHint rather than alongside the user-defined function parameter. This closes #4372
apache · Jul 26, 2017 · 4a88f65 · 4a88f65
1 parent 8695a21
commit 4a88f65
Show file tree

Hide file tree

Showing 2 changed files with 16 additions and 11 deletions.
diff --git a/docs/dev/batch/index.md b/docs/dev/batch/index.md
@@ -205,20 +205,25 @@ data.filter(new FilterFunction<Integer>() {
   <td><strong>Reduce</strong></td>
   <td>
   <p>Combines a group of elements into a single element by repeatedly combining two elements
-  into one. Reduce may be applied on a full data set, or on a grouped data set.</p>
+  into one. Reduce may be applied on a full data set or on a grouped data set.</p>
 {% highlight java %}
 data.reduce(new ReduceFunction<Integer> {
  public Integer reduce(Integer a, Integer b) { return a + b; }
 });
 {% endhighlight %}
+ <p>If the reduce was applied to a grouped data set then you can specify the way that the
+ runtime executes the combine phase of the reduce by supplying a <code>CombineHint</code> to
+ <code>setCombineHint</code>. The hash-based strategy should be faster in most cases,
+ especially if the number of different keys is small compared to the number of input
+ elements (eg. 1/10).</p>
  </td>
  </tr>
 
  <tr>
   <td><strong>ReduceGroup</strong></td>
   <td>
   <p>Combines a group of elements into one or more elements. ReduceGroup may be applied on a
-  full data set, or on a grouped data set.</p>
+  full data set or on a grouped data set.</p>
 {% highlight java %}
 data.reduceGroup(new GroupReduceFunction<Integer, Integer> {
  public void reduce(Iterable<Integer> values, Collector<Integer> out) {
@@ -230,10 +235,6 @@ data.reduceGroup(new GroupReduceFunction<Integer, Integer> {
  }
 });
 {% endhighlight %}
- <p>If the reduce was applied to a grouped data set, you can specify the way that the
- runtime executes the combine phase of the reduce via supplying a CombineHint as a second
- parameter. The hash-based strategy should be faster in most cases, especially if the
- number of different keys is small compared to the number of input elements (eg. 1/10).</p>
  </td>
  </tr>
 
@@ -260,9 +261,14 @@ DataSet<Tuple3<Integer, String, Double>> output = input.sum(0).andMin(2);
   <td>
   <p>Returns the distinct elements of a data set. It removes the duplicate entries
   from the input DataSet, with respect to all fields of the elements, or a subset of fields.</p>
- {% highlight java %}
-  data.distinct();
- {% endhighlight %}
+{% highlight java %}
+data.distinct();
+{% endhighlight %}
+ <p>Distinct is implemented using a reduce function. You can specify the way that the
+ runtime executes the combine phase of the reduce by supplying a <code>CombineHint</code> to
+ <code>setCombineHint</code>. The hash-based strategy should be faster in most cases,
+ especially if the number of different keys is small compared to the number of input
+ elements (eg. 1/10).</p>
  </td>
  </tr>
 

diff --git a/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java b/flink-core/src/main/java/org/apache/flink/api/common/operators/base/ReduceOperatorBase.java
@@ -79,8 +79,7 @@ public enum CombineHint {
  HASH,
 
  /**
- * Disable the use of a combiner. This can be faster in cases when the number of different keys
- * is very small compared to the number of input elements (eg. 1/100).
+ * Disable the use of a combiner.
  */
  NONE
  }