[FLINK-10153] [docs] Add Tutorials section and rework structure.

- Add a Tutorials section - Move tutorial & quickstart guides into Tutorial section - Add a "Building & Developing Flink" section for Flink contributors - Remove Project Setup section and move content to relevant sections - Update Examples section - Update links and add redirects for moved pages. - Fix a few broken links. This closes apache#6565.
laubersder · Aug 24, 2018 · 52cbe07 · 52cbe07
1 parent 1de8600
commit 52cbe07
Show file tree

Hide file tree

Showing 36 changed files with 336 additions and 156 deletions.
diff --git a/docs/dev/api_concepts.md b/docs/dev/api_concepts.md
@@ -510,7 +510,7 @@ data.map(new MapFunction<String, Integer> () {
 
 #### Java 8 Lambdas
 
-Flink also supports Java 8 Lambdas in the Java API. Please see the full [Java 8 Guide]({{ site.baseurl }}/dev/java8.html).
+Flink also supports Java 8 Lambdas in the Java API.
 
 {% highlight java %}
 data.filter(s -> s.startsWith("https://"));

diff --git a/docs/dev/batch/examples.md b/docs/dev/batch/examples.md
@@ -27,8 +27,7 @@ The following example programs showcase different applications of Flink
 from simple word counting to graph algorithms. The code samples illustrate the
 use of [Flink's DataSet API]({{ site.baseurl }}/dev/batch/index.html).
 
-The full source code of the following and more examples can be found in the __flink-examples-batch__
-or __flink-examples-streaming__ module of the Flink source repository.
+The full source code of the following and more examples can be found in the {% gh_link flink-examples/flink-examples-batch "flink-examples-batch" %} module of the Flink source repository.
 
 * This will be replaced by the TOC
 {:toc}
@@ -420,102 +419,4 @@ Input files are plain text files and must be formatted as follows:
 - Edges are represented as pairs for vertex IDs which are separated by space characters. Edges are separated by new-line characters:
  * For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (undirected) links (1)-(2), (2)-(12), (1)-(12), and (42)-(63).
 
-## Relational Query
-
-The Relational Query example assumes two tables, one with `orders` and the other with `lineitems` as specified by the [TPC-H decision support benchmark](https://www.tpc.org/tpch/). TPC-H is a standard benchmark in the database industry. See below for instructions how to generate the input data.
-
-The example implements the following SQL query.
-
-{% highlight sql %}
-SELECT l_orderkey, o_shippriority, sum(l_extendedprice) as revenue
- FROM orders, lineitem
-WHERE l_orderkey = o_orderkey
- AND o_orderstatus = "F"
- AND YEAR(o_orderdate) > 1993
- AND o_orderpriority LIKE "5%"
-GROUP BY l_orderkey, o_shippriority;
-{% endhighlight %}
-
-The Flink program, which implements the above query looks as follows.
-
-<div class="codetabs" markdown="1">
-<div data-lang="java" markdown="1">
-
-{% highlight java %}
-// get orders data set: (orderkey, orderstatus, orderdate, orderpriority, shippriority)
-DataSet<Tuple5<Integer, String, String, String, Integer>> orders = getOrdersDataSet(env);
-// get lineitem data set: (orderkey, extendedprice)
-DataSet<Tuple2<Integer, Double>> lineitems = getLineitemDataSet(env);
-
-// orders filtered by year: (orderkey, custkey)
-DataSet<Tuple2<Integer, Integer>> ordersFilteredByYear =
- // filter orders
- orders.filter(
- new FilterFunction<Tuple5<Integer, String, String, String, Integer>>() {
- @Override
- public boolean filter(Tuple5<Integer, String, String, String, Integer> t) {
- // status filter
- if(!t.f1.equals(STATUS_FILTER)) {
- return false;
- // year filter
- } else if(Integer.parseInt(t.f2.substring(0, 4)) <= YEAR_FILTER) {
- return false;
- // order priority filter
- } else if(!t.f3.startsWith(OPRIO_FILTER)) {
- return false;
- }
- return true;
- }
- })
- // project fields out that are no longer required
- .project(0,4).types(Integer.class, Integer.class);
-
-// join orders with lineitems: (orderkey, shippriority, extendedprice)
-DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders =
- ordersFilteredByYear.joinWithHuge(lineitems)
- .where(0).equalTo(0)
- .projectFirst(0,1).projectSecond(1)
- .types(Integer.class, Integer.class, Double.class);
-
-// extendedprice sums: (orderkey, shippriority, sum(extendedprice))
-DataSet<Tuple3<Integer, Integer, Double>> priceSums =
- // group by order and sum extendedprice
- lineitemsOfOrders.groupBy(0,1).aggregate(Aggregations.SUM, 2);
-
-// emit result
-priceSums.writeAsCsv(outputPath);
-{% endhighlight %}
-
-The {% gh_link /flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/relational/TPCHQuery10.java "Relational Query program" %} implements the above query. It requires the following parameters to run: `--orders <path> --lineitem <path> --output <path>`.
-
-</div>
-<div data-lang="scala" markdown="1">
-Coming soon...
-
-The {% gh_link /flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/relational/TPCHQuery3.scala "Relational Query program" %} implements the above query. It requires the following parameters to run: `--orders <path> --lineitem <path> --output <path>`.
-
-</div>
-</div>
-
-The orders and lineitem files can be generated using the [TPC-H benchmark](https://www.tpc.org/tpch/) suite's data generator tool (DBGEN).
-Take the following steps to generate arbitrary large input files for the provided Flink programs:
-
-1. Download and unpack DBGEN
-2. Make a copy of *makefile.suite* called *Makefile* and perform the following changes:
-
-{% highlight bash %}
-DATABASE = DB2
-MACHINE = LINUX
-WORKLOAD = TPCH
-CC = gcc
-{% endhighlight %}
-
-1. Build DBGEN using *make*
-2. Generate lineitem and orders relations using dbgen. A scale factor
- (-s) of 1 results in a generated data set with about 1 GB size.
-
-{% highlight bash %}
-./dbgen -T o -s 1
-{% endhighlight %}
-
 {% top %}
diff --git a/docs/dev/best_practices.md b/docs/dev/best_practices.md
@@ -192,7 +192,7 @@ public class MyClass implements MapFunction {
 
 In all cases were classes are executed with a classpath created by a dependency manager such as Maven, Flink will pull log4j into the classpath.
 
-Therefore, you will need to exclude log4j from Flink's dependencies. The following description will assume a Maven project created from a [Flink quickstart](../quickstart/java_api_quickstart.html).
+Therefore, you will need to exclude log4j from Flink's dependencies. The following description will assume a Maven project created from a [Flink quickstart](./projectsetup/java_api_quickstart.html).
 
 Change your projects `pom.xml` file like this:
 

diff --git a/docs/dev/index.md b/docs/dev/index.md
@@ -4,6 +4,7 @@ nav-id: dev
 nav-title: '<i class="fa fa-code title maindish" aria-hidden="true"></i> Application Development'
 nav-parent_id: root
 nav-pos: 5
+section-break: true
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one

diff --git a/docs/start/dependencies.md → docs/dev/projectsetup/dependencies.md b/docs/start/dependencies.md → docs/dev/projectsetup/dependencies.md
@@ -1,6 +1,6 @@
 ---
 title: "Configuring Dependencies, Connectors, Libraries"
-nav-parent_id: start
+nav-parent_id: projectsetup
 nav-pos: 2
 ---
 <!--
@@ -57,8 +57,8 @@ As with most systems that run user-defined applications, there are two broad cat
 ## Setting up a Project: Basic Dependencies
 
 Every Flink application needs as the bare minimum the API dependencies, to develop against.
-For Maven, you can use the [Java Project Template]({{ site.baseurl }}/quickstart/java_api_quickstart.html)
-or [Scala Project Template]({{ site.baseurl }}/quickstart/scala_api_quickstart.html) to create
+For Maven, you can use the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html)
+or [Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) to create
 a program skeleton with these initial dependencies.
 
 When setting up a project manually, you need to add the following dependencies for the Java/Scala API
@@ -136,8 +136,8 @@ We recommend to package the application code and all its required dependencies i
 we refer to as the *application jar*. The application jar can be submitted to an already running Flink cluster,
 or added to a Flink application container image.
 
-Projects created from the [Java Project Template]({{ site.baseurl }}/quickstart/java_api_quickstart.html) or
-[Scala Project Template]({{ site.baseurl }}/quickstart/scala_api_quickstart.html) are configured to automatically include
+Projects created from the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) or
+[Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) are configured to automatically include
 the application dependencies into the application jar when running `mvn clean package`. For projects that are
 not set up from those templates, we recommend to add the Maven Shade Plugin (as listed in the Appendix below)
 to build the application jar with all required dependencies.
@@ -159,7 +159,7 @@ Scala version that they are built for, for example `flink-streaming-scala_2.11`.
 Developers that only use Java can pick any Scala version, Scala developers need to
 pick the Scala version that matches their application's Scala version.
 
-Please refer to the [build guide]({{ site.baseurl }}/start/building.html#scala-versions)
+Please refer to the [build guide]({{ site.baseurl }}/flinkdev/building.html#scala-versions)
 for details on how to build Flink for a specific Scala version.
 
 **Note:** Because of major breaking changes in Scala 2.12, Flink 1.5 currently builds only for Scala 2.11.

diff --git a/docs/dev/projectsetup/index.md b/docs/dev/projectsetup/index.md
@@ -0,0 +1,25 @@
+---
+title: "Project Build Setup"
+nav-id: projectsetup
+nav-title: 'Project Build Setup'
+nav-parent_id: dev
+nav-pos: 0
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ https://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
diff --git a/docs/quickstart/java_api_quickstart.md → docs/dev/projectsetup/java_api_quickstart.md b/docs/quickstart/java_api_quickstart.md → docs/dev/projectsetup/java_api_quickstart.md
@@ -1,7 +1,7 @@
 ---
 title: "Project Template for Java"
 nav-title: Project Template for Java
-nav-parent_id: start
+nav-parent_id: projectsetup
 nav-pos: 0
 ---
 <!--
@@ -124,7 +124,7 @@ can run time application from the JAR file without additionally specifying the m
 Write your application!
 
 If you are writing a streaming application and you are looking for inspiration what to write,
-take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/quickstart/run_example_quickstart.html#writing-a-flink-program).
+take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program).
 
 If you are writing a batch processing application and you are looking for inspiration what to write,
 take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html).
@@ -133,7 +133,7 @@ For a complete overview over the APIs, have a look at the
 [DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
 [DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.
 
-[Here]({{ site.baseurl }}/quickstart/setup_quickstart.html) you can find out how to run an application outside the IDE on a local cluster.
+[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster.
 
 If you have any trouble, ask on our
 [Mailing List](https://mail-archives.apache.org/mod_mbox/flink-user/).

diff --git a/docs/quickstart/scala_api_quickstart.md → .../dev/projectsetup/scala_api_quickstart.md b/docs/quickstart/scala_api_quickstart.md → .../dev/projectsetup/scala_api_quickstart.md
@@ -1,7 +1,7 @@
 ---
 title: "Project Template for Scala"
 nav-title: Project Template for Scala
-nav-parent_id: start
+nav-parent_id: projectsetup
 nav-pos: 1
 ---
 <!--
@@ -212,7 +212,7 @@ can run time application from the JAR file without additionally specifying the m
 Write your application!
 
 If you are writing a streaming application and you are looking for inspiration what to write,
-take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/quickstart/run_example_quickstart.html#writing-a-flink-program)
+take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program)
 
 If you are writing a batch processing application and you are looking for inspiration what to write,
 take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html)
@@ -221,7 +221,7 @@ For a complete overview over the APIa, have a look at the
 [DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
 [DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.
 
-[Here]({{ site.baseurl }}/quickstart/setup_quickstart.html) you can find out how to run an application outside the IDE on a local cluster.
+[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster.
 
 If you have any trouble, ask on our
 [Mailing List](https://mail-archives.apache.org/mod_mbox/flink-user/).

diff --git a/docs/dev/stream/python.md b/docs/dev/stream/python.md
@@ -624,7 +624,7 @@ env.execute()
 
 A system-wide default parallelism for all execution environments can be defined by setting the
 `parallelism.default` property in `./conf/flink-conf.yaml`. See the
-[Configuration]({{ site.baseurl }}/setup/config.html) documentation for details.
+[Configuration]({{ site.baseurl }}/ops/config.html) documentation for details.
 
 {% top %}
 

diff --git a/docs/dev/table/sourceSinks.md b/docs/dev/table/sourceSinks.md
@@ -664,7 +664,7 @@ connector.debug=true
 
 ### Use a TableFactory in the Table & SQL API
 
-For a type-safe, programmatic approach with explanatory Scaladoc/Javadoc, the Table & SQL API offers descriptors in `org.apache.flink.table.descriptors` that translate into string-based properties. See the [built-in descriptors](connect.md) for sources, sinks, and formats as a reference.
+For a type-safe, programmatic approach with explanatory Scaladoc/Javadoc, the Table & SQL API offers descriptors in `org.apache.flink.table.descriptors` that translate into string-based properties. See the [built-in descriptors](connect.html) for sources, sinks, and formats as a reference.
 
 A connector for `MySystem` in our example can extend `ConnectorDescriptor` as shown below:
 

diff --git a/docs/examples/index.md b/docs/examples/index.md
@@ -25,15 +25,24 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-[Sample Project in Java]({{ site.baseurl }}/quickstart/java_api_quickstart.html) and [Sample Project in Scala]({{ site.baseurl }}/quickstart/scala_api_quickstart.html) are guides to setting up Maven and SBT projects and include simple implementations of a word count application.
 
-[Monitoring Wikipedia Edits]({{ site.baseurl }}/quickstart/run_example_quickstart.html) is a more complete example of a streaming analytics application.
+## Bundled Examples
 
-[Building real-time dashboard applications with Apache Flink, Elasticsearch, and Kibana](https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana) is a blog post at elastic.co showing how to build a real-time dashboard solution for streaming data analytics using Apache Flink, Elasticsearch, and Kibana.
+The Flink sources include many examples for Flink's different APIs:
 
-The [Flink training website](https://training.data-artisans.com/) from data Artisans has a number of examples. See the hands-on sections, and the exercises.
+* DataStream applications ({% gh_link flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples "Java" %} / {% gh_link flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples "Scala" %}) 
+* DataSet applications ({% gh_link flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java "Java" %} / {% gh_link flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala "Scala" %})
+* Table API / SQL queries ({% gh_link flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java "Java" %} / {% gh_link flink-examples/flink-examples-table/src/main/scala/org/apache/flink/table/examples/scala "Scala" %})
 
-## Bundled Examples
+These [instructions]({{ site.baseurl }}/dev/batch/examples.html#running-an-example) explain how to run the examples.
+
+## Examples on the Web
+
+There are also a few blog posts published online that discuss example applications:
+
+* [How to build stateful streaming applications with Apache Flink
+](https://www.infoworld.com/article/3293426/big-data/how-to-build-stateful-streaming-applications-with-apache-flink.html) presents an event-driven application implemented with the DataStream API and two SQL queries for streaming analytics.
 
-The Flink sources include a number of examples for both **streaming** ( [java](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples) / [scala](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples) ) and **batch** ( [java](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java) / [scala](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala) ). These [instructions]({{ site.baseurl }}/dev/batch/examples.html#running-an-example) explain how to run the examples.
+* [Building real-time dashboard applications with Apache Flink, Elasticsearch, and Kibana](https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana) is a blog post at elastic.co showing how to build a real-time dashboard solution for streaming data analytics using Apache Flink, Elasticsearch, and Kibana.
 
+* The [Flink training website](https://training.data-artisans.com/) from data Artisans has a number of examples. Check out the hands-on sections and the exercises.
diff --git a/docs/start/building.md → docs/flinkDev/building.md b/docs/start/building.md → docs/flinkDev/building.md
@@ -1,6 +1,6 @@
 ---
 title: Building Flink from Source
-nav-parent_id: start
+nav-parent_id: flinkdev
 nav-pos: 20
 ---
 <!--

diff --git a/docs/internals/ide_setup.md → docs/flinkDev/ide_setup.md b/docs/internals/ide_setup.md → docs/flinkDev/ide_setup.md
@@ -1,6 +1,6 @@
 ---
-title: "IDE Setup"
-nav-parent_id: start
+title: "Importing Flink into an IDE"
+nav-parent_id: flinkdev
 nav-pos: 3
 ---
 <!--
@@ -27,8 +27,8 @@ under the License.
 
 The sections below describe how to import the Flink project into an IDE
 for the development of Flink itself. For writing Flink programs, please
-refer to the [Java API]({{ site.baseurl }}/quickstart/java_api_quickstart.html)
-and the [Scala API]({{ site.baseurl }}/quickstart/scala_api_quickstart.html)
+refer to the [Java API]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html)
+and the [Scala API]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html)
 quickstart guides.
 
 **NOTE:** Whenever something is not working in your IDE, try with the Maven

diff --git a/docs/start/index.md → docs/flinkDev/index.md b/docs/start/index.md → docs/flinkDev/index.md
@@ -1,10 +1,10 @@
 ---
 section-break: true
-nav-title: '<i class="fa fa-cogs title maindish" aria-hidden="true"></i> Project Setup'
-title: "Project Setup"
-nav-id: "start"
+nav-title: '<i class="fa fa-cogs title dessert" aria-hidden="true"></i> Flink Development'
+title: "Flink Development"
+nav-id: "flinkdev"
 nav-parent_id: root
-nav-pos: 4
+nav-pos: 8
 ---
 <!--
 Licensed to the Apache Software Foundation (ASF) under one

diff --git a/docs/index.md b/docs/index.md
@@ -33,7 +33,9 @@ Apache Flink is an open source platform for distributed stream and batch data pr
 
 - **Concepts**: Start with the basic concepts of Flink's [Dataflow Programming Model](concepts/programming-model.html) and [Distributed Runtime Environment](concepts/runtime.html). This will help you understand other parts of the documentation, including the setup and programming guides. We recommend you read these sections first.
 
-- **Quickstarts**: [Run an example program](quickstart/setup_quickstart.html) on your local machine or [study some examples](examples/index.html).
+- **Tutorials**: 
+ * [Implement and run a DataStream application](./tutorials/datastream_api.html)
+ * [Setup a local Flink cluster](./tutorials/local_setup.html)
 
 - **Programming Guides**: You can read our guides about [basic API concepts](dev/api_concepts.html) and the [DataStream API](dev/datastream_api.html) or the [DataSet API](dev/batch/index.html) to learn how to write your first Flink programs.