Skip to content

Commit

Permalink
[FLINK-7234] [docs] Fix CombineHint documentation
Browse files Browse the repository at this point in the history
The CombineHint documentation applies to DataSet#reduce not
DataSet#reduceGroup and should also be noted for DataSet#distinct. Also
correct the usage where the CombineHint is set with setCombineHint
rather than alongside the user-defined function parameter.

This closes #4372
  • Loading branch information
greghogan committed Jul 26, 2017
1 parent 8695a21 commit 4a88f65
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 11 deletions.
24 changes: 15 additions & 9 deletions docs/dev/batch/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,20 +205,25 @@ data.filter(new FilterFunction<Integer>() {
<td><strong>Reduce</strong></td>
<td>
<p>Combines a group of elements into a single element by repeatedly combining two elements
into one. Reduce may be applied on a full data set, or on a grouped data set.</p>
into one. Reduce may be applied on a full data set or on a grouped data set.</p>
{% highlight java %}
data.reduce(new ReduceFunction<Integer> {
public Integer reduce(Integer a, Integer b) { return a + b; }
});
{% endhighlight %}
<p>If the reduce was applied to a grouped data set then you can specify the way that the
runtime executes the combine phase of the reduce by supplying a <code>CombineHint</code> to
<code>setCombineHint</code>. The hash-based strategy should be faster in most cases,
especially if the number of different keys is small compared to the number of input
elements (eg. 1/10).</p>
</td>
</tr>

<tr>
<td><strong>ReduceGroup</strong></td>
<td>
<p>Combines a group of elements into one or more elements. ReduceGroup may be applied on a
full data set, or on a grouped data set.</p>
full data set or on a grouped data set.</p>
{% highlight java %}
data.reduceGroup(new GroupReduceFunction<Integer, Integer> {
public void reduce(Iterable<Integer> values, Collector<Integer> out) {
Expand All @@ -230,10 +235,6 @@ data.reduceGroup(new GroupReduceFunction<Integer, Integer> {
}
});
{% endhighlight %}
<p>If the reduce was applied to a grouped data set, you can specify the way that the
runtime executes the combine phase of the reduce via supplying a CombineHint as a second
parameter. The hash-based strategy should be faster in most cases, especially if the
number of different keys is small compared to the number of input elements (eg. 1/10).</p>
</td>
</tr>

Expand All @@ -260,9 +261,14 @@ DataSet<Tuple3<Integer, String, Double>> output = input.sum(0).andMin(2);
<td>
<p>Returns the distinct elements of a data set. It removes the duplicate entries
from the input DataSet, with respect to all fields of the elements, or a subset of fields.</p>
{% highlight java %}
data.distinct();
{% endhighlight %}
{% highlight java %}
data.distinct();
{% endhighlight %}
<p>Distinct is implemented using a reduce function. You can specify the way that the
runtime executes the combine phase of the reduce by supplying a <code>CombineHint</code> to
<code>setCombineHint</code>. The hash-based strategy should be faster in most cases,
especially if the number of different keys is small compared to the number of input
elements (eg. 1/10).</p>
</td>
</tr>

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,7 @@ public enum CombineHint {
HASH,

/**
* Disable the use of a combiner. This can be faster in cases when the number of different keys
* is very small compared to the number of input elements (eg. 1/100).
* Disable the use of a combiner.
*/
NONE
}
Expand Down

0 comments on commit 4a88f65

Please sign in to comment.