Update doc with Scala First-N and Partition/Rebalance

mbalassi · Sep 29, 2014 · 2b4d779 · 2b4d779
1 parent 66f236d
commit 2b4d779
Show file tree

Hide file tree

Showing 2 changed files with 83 additions and 17 deletions.
diff --git a/docs/dataset_transformations.md b/docs/dataset_transformations.md
@@ -284,7 +284,7 @@ When using Case Classes you can also specify the grouping key using the names of
 
 ~~~scala
 case class MyClass(val a: String, b: Int, c: Double)
-val tuples = DataSet[MyClass]] = // [...]
+val tuples = DataSet[MyClass] = // [...]
 // group on the first and second field
 val reducedTuples = tuples.groupBy("a", "b").reduce { ... }
 ~~~
@@ -1103,15 +1103,11 @@ val unioned = vals1.union(vals2).union(vals3)
 </div>
 </div>
 
-### Rebalance (Java API Only)
-
+### Rebalance
 Evenly rebalances the parallel partitions of a DataSet to eliminate data skew.
-Only Map-like transformations may follow a rebalance transformation, i.e.,
 
-- Map
-- FlatMap
-- Filter
-- MapPartition
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
 
 ~~~java
 DataSet<String> in = // [...]
@@ -1120,16 +1116,26 @@ DataSet<Tuple2<String, String>> out = in.rebalance()
  .map(new Mapper());
 ~~~
 
-### Hash-Partition (Java API Only)
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val in: DataSet[String] = // [...]
+// rebalance DataSet and apply a Map transformation.
+val out = in.rebalance().map { ... }
+~~~
+
+</div>
+</div>
+
+
+### Hash-Partition
 
 Hash-partitions a DataSet on a given key. 
 Keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).
-Only Map-like transformations may follow a hash-partition transformation, i.e.,
 
-- Map
-- FlatMap
-- Filter
-- MapPartition
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
 
 ~~~java
 DataSet<Tuple2<String, Integer>> in = // [...]
@@ -1138,10 +1144,25 @@ DataSet<Tuple2<String, String>> out = in.partitionByHash(0)
  .mapPartition(new PartitionMapper());
 ~~~
 
-### First-n (Java API Only)
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val in: DataSet[(String, Int)] = // [...]
+// hash-partition DataSet by String value and apply a MapPartition transformation.
+val out = in.partitionByHash(0).mapPartition { ... }
+~~~
+
+</div>
+</div>
+
+### First-n
 
 Returns the first n (arbitrary) elements of a DataSet. First-n can be applied on a regular DataSet, a grouped DataSet, or a grouped-sorted DataSet. Grouping keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys).
 
+<div class="codetabs" markdown="1">
+<div data-lang="java" markdown="1">
+
 ~~~java
 DataSet<Tuple2<String, Integer>> in = // [...]
 // Return the first five (arbitrary) elements of the DataSet
@@ -1155,4 +1176,22 @@ DataSet<Tuple2<String, Integer>> out2 = in.groupBy(0)
 DataSet<Tuple2<String, Integer>> out3 = in.groupBy(0)
  .sortGroup(1, Order.ASCENDING)
  .first(3);
-~~~
+~~~
+
+</div>
+<div data-lang="scala" markdown="1">
+
+~~~scala
+val in: DataSet[(String, Int)] = // [...]
+// Return the first five (arbitrary) elements of the DataSet
+val out1 = in.first(5)
+
+// Return the first two (arbitrary) elements of each String group
+val out2 = in.groupBy(0).first(2)
+
+// Return the first three elements of each String group ordered by the Integer field
+val out3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
+~~~
+
+</div>
+</div>
diff --git a/docs/programming_guide.md b/docs/programming_guide.md
@@ -608,7 +608,7 @@ DataSet<String> result = in.rebalance()
  <tr>
  <td><strong>Hash-Partition</strong></td>
  <td>
- <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys. Only Map-like transformations may follow a hash-partition transformation. (Java API Only)</p>
+ <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys.</p>
 {% highlight java %}
 DataSet<Tuple2<String,Integer>> in = // [...]
 DataSet<Integer> result = in.partitionByHash(0)
@@ -804,6 +804,33 @@ val result: DataSet[(Int, String)] = data1.cross(data2)
  <p>Produces the union of two data sets.</p>
 {% highlight scala %}
 data.union(data2)
+{% endhighlight %}
+ </td>
+ </tr>
+ <tr>
+ <td><strong>Hash-Partition</strong></td>
+ <td>
+ <p>Hash-partitions a data set on a given key. Keys can be specified as key-selector functions, tuple positions
+ or case class fields.</p>
+{% highlight scala %}
+val in: DataSet[(Int, String)] = // [...]
+val result = in.partitionByHash(0).mapPartition { ... }
+{% endhighlight %}
+ </td>
+ </tr>
+ <tr>
+ <td><strong>First-n</strong></td>
+ <td>
+ <p>Returns the first n (arbitrary) elements of a data set. First-n can be applied on a regular data set, a grouped data set, or a grouped-sorted data set. Grouping keys can be specified as key-selector functions,
+ tuple positions or case class fields.</p>
+{% highlight scala %}
+val in: DataSet[(Int, String)] = // [...]
+// regular data set
+val result1 = in.first(3)
+// grouped data set
+val result2 = in.groupBy(0).first(3)
+// grouped-sorted data set
+val result3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
 {% endhighlight %}
  </td>
  </tr>