Skip to content

Commit

Permalink
[FLINK-10153] [docs] Add Tutorials section and rework structure.
Browse files Browse the repository at this point in the history
- Add a Tutorials section
- Move tutorial & quickstart guides into Tutorial section
- Add a "Building & Developing Flink" section for Flink contributors
- Remove Project Setup section and move content to relevant sections
- Update Examples section
- Update links and add redirects for moved pages.
- Fix a few broken links.

This closes apache#6565.
  • Loading branch information
fhueske committed Aug 24, 2018
1 parent 1de8600 commit 52cbe07
Show file tree
Hide file tree
Showing 36 changed files with 336 additions and 156 deletions.
2 changes: 1 addition & 1 deletion docs/dev/api_concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ data.map(new MapFunction<String, Integer> () {

#### Java 8 Lambdas

Flink also supports Java 8 Lambdas in the Java API. Please see the full [Java 8 Guide]({{ site.baseurl }}/dev/java8.html).
Flink also supports Java 8 Lambdas in the Java API.

{% highlight java %}
data.filter(s -> s.startsWith("https://"));
Expand Down
101 changes: 1 addition & 100 deletions docs/dev/batch/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,7 @@ The following example programs showcase different applications of Flink
from simple word counting to graph algorithms. The code samples illustrate the
use of [Flink's DataSet API]({{ site.baseurl }}/dev/batch/index.html).

The full source code of the following and more examples can be found in the __flink-examples-batch__
or __flink-examples-streaming__ module of the Flink source repository.
The full source code of the following and more examples can be found in the {% gh_link flink-examples/flink-examples-batch "flink-examples-batch" %} module of the Flink source repository.

* This will be replaced by the TOC
{:toc}
Expand Down Expand Up @@ -420,102 +419,4 @@ Input files are plain text files and must be formatted as follows:
- Edges are represented as pairs for vertex IDs which are separated by space characters. Edges are separated by new-line characters:
* For example `"1 2\n2 12\n1 12\n42 63\n"` gives four (undirected) links (1)-(2), (2)-(12), (1)-(12), and (42)-(63).

## Relational Query

The Relational Query example assumes two tables, one with `orders` and the other with `lineitems` as specified by the [TPC-H decision support benchmark](https://www.tpc.org/tpch/). TPC-H is a standard benchmark in the database industry. See below for instructions how to generate the input data.

The example implements the following SQL query.

{% highlight sql %}
SELECT l_orderkey, o_shippriority, sum(l_extendedprice) as revenue
FROM orders, lineitem
WHERE l_orderkey = o_orderkey
AND o_orderstatus = "F"
AND YEAR(o_orderdate) > 1993
AND o_orderpriority LIKE "5%"
GROUP BY l_orderkey, o_shippriority;
{% endhighlight %}

The Flink program, which implements the above query looks as follows.

<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">

{% highlight java %}
// get orders data set: (orderkey, orderstatus, orderdate, orderpriority, shippriority)
DataSet<Tuple5<Integer, String, String, String, Integer>> orders = getOrdersDataSet(env);
// get lineitem data set: (orderkey, extendedprice)
DataSet<Tuple2<Integer, Double>> lineitems = getLineitemDataSet(env);

// orders filtered by year: (orderkey, custkey)
DataSet<Tuple2<Integer, Integer>> ordersFilteredByYear =
// filter orders
orders.filter(
new FilterFunction<Tuple5<Integer, String, String, String, Integer>>() {
@Override
public boolean filter(Tuple5<Integer, String, String, String, Integer> t) {
// status filter
if(!t.f1.equals(STATUS_FILTER)) {
return false;
// year filter
} else if(Integer.parseInt(t.f2.substring(0, 4)) <= YEAR_FILTER) {
return false;
// order priority filter
} else if(!t.f3.startsWith(OPRIO_FILTER)) {
return false;
}
return true;
}
})
// project fields out that are no longer required
.project(0,4).types(Integer.class, Integer.class);

// join orders with lineitems: (orderkey, shippriority, extendedprice)
DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders =
ordersFilteredByYear.joinWithHuge(lineitems)
.where(0).equalTo(0)
.projectFirst(0,1).projectSecond(1)
.types(Integer.class, Integer.class, Double.class);

// extendedprice sums: (orderkey, shippriority, sum(extendedprice))
DataSet<Tuple3<Integer, Integer, Double>> priceSums =
// group by order and sum extendedprice
lineitemsOfOrders.groupBy(0,1).aggregate(Aggregations.SUM, 2);

// emit result
priceSums.writeAsCsv(outputPath);
{% endhighlight %}

The {% gh_link /flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java/relational/TPCHQuery10.java "Relational Query program" %} implements the above query. It requires the following parameters to run: `--orders <path> --lineitem <path> --output <path>`.

</div>
<div data-lang="scala" markdown="1">
Coming soon...

The {% gh_link /flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala/relational/TPCHQuery3.scala "Relational Query program" %} implements the above query. It requires the following parameters to run: `--orders <path> --lineitem <path> --output <path>`.

</div>
</div>

The orders and lineitem files can be generated using the [TPC-H benchmark](https://www.tpc.org/tpch/) suite's data generator tool (DBGEN).
Take the following steps to generate arbitrary large input files for the provided Flink programs:

1. Download and unpack DBGEN
2. Make a copy of *makefile.suite* called *Makefile* and perform the following changes:

{% highlight bash %}
DATABASE = DB2
MACHINE = LINUX
WORKLOAD = TPCH
CC = gcc
{% endhighlight %}

1. Build DBGEN using *make*
2. Generate lineitem and orders relations using dbgen. A scale factor
(-s) of 1 results in a generated data set with about 1 GB size.

{% highlight bash %}
./dbgen -T o -s 1
{% endhighlight %}

{% top %}
2 changes: 1 addition & 1 deletion docs/dev/best_practices.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ public class MyClass implements MapFunction {

In all cases were classes are executed with a classpath created by a dependency manager such as Maven, Flink will pull log4j into the classpath.

Therefore, you will need to exclude log4j from Flink's dependencies. The following description will assume a Maven project created from a [Flink quickstart](../quickstart/java_api_quickstart.html).
Therefore, you will need to exclude log4j from Flink's dependencies. The following description will assume a Maven project created from a [Flink quickstart](./projectsetup/java_api_quickstart.html).

Change your projects `pom.xml` file like this:

Expand Down
1 change: 1 addition & 0 deletions docs/dev/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ nav-id: dev
nav-title: '<i class="fa fa-code title maindish" aria-hidden="true"></i> Application Development'
nav-parent_id: root
nav-pos: 5
section-break: true
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Configuring Dependencies, Connectors, Libraries"
nav-parent_id: start
nav-parent_id: projectsetup
nav-pos: 2
---
<!--
Expand Down Expand Up @@ -57,8 +57,8 @@ As with most systems that run user-defined applications, there are two broad cat
## Setting up a Project: Basic Dependencies

Every Flink application needs as the bare minimum the API dependencies, to develop against.
For Maven, you can use the [Java Project Template]({{ site.baseurl }}/quickstart/java_api_quickstart.html)
or [Scala Project Template]({{ site.baseurl }}/quickstart/scala_api_quickstart.html) to create
For Maven, you can use the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html)
or [Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) to create
a program skeleton with these initial dependencies.

When setting up a project manually, you need to add the following dependencies for the Java/Scala API
Expand Down Expand Up @@ -136,8 +136,8 @@ We recommend to package the application code and all its required dependencies i
we refer to as the *application jar*. The application jar can be submitted to an already running Flink cluster,
or added to a Flink application container image.

Projects created from the [Java Project Template]({{ site.baseurl }}/quickstart/java_api_quickstart.html) or
[Scala Project Template]({{ site.baseurl }}/quickstart/scala_api_quickstart.html) are configured to automatically include
Projects created from the [Java Project Template]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html) or
[Scala Project Template]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html) are configured to automatically include
the application dependencies into the application jar when running `mvn clean package`. For projects that are
not set up from those templates, we recommend to add the Maven Shade Plugin (as listed in the Appendix below)
to build the application jar with all required dependencies.
Expand All @@ -159,7 +159,7 @@ Scala version that they are built for, for example `flink-streaming-scala_2.11`.
Developers that only use Java can pick any Scala version, Scala developers need to
pick the Scala version that matches their application's Scala version.

Please refer to the [build guide]({{ site.baseurl }}/start/building.html#scala-versions)
Please refer to the [build guide]({{ site.baseurl }}/flinkdev/building.html#scala-versions)
for details on how to build Flink for a specific Scala version.

**Note:** Because of major breaking changes in Scala 2.12, Flink 1.5 currently builds only for Scala 2.11.
Expand Down
25 changes: 25 additions & 0 deletions docs/dev/projectsetup/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: "Project Build Setup"
nav-id: projectsetup
nav-title: 'Project Build Setup'
nav-parent_id: dev
nav-pos: 0
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Project Template for Java"
nav-title: Project Template for Java
nav-parent_id: start
nav-parent_id: projectsetup
nav-pos: 0
---
<!--
Expand Down Expand Up @@ -124,7 +124,7 @@ can run time application from the JAR file without additionally specifying the m
Write your application!

If you are writing a streaming application and you are looking for inspiration what to write,
take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/quickstart/run_example_quickstart.html#writing-a-flink-program).
take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program).

If you are writing a batch processing application and you are looking for inspiration what to write,
take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html).
Expand All @@ -133,7 +133,7 @@ For a complete overview over the APIs, have a look at the
[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.

[Here]({{ site.baseurl }}/quickstart/setup_quickstart.html) you can find out how to run an application outside the IDE on a local cluster.
[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster.

If you have any trouble, ask on our
[Mailing List](https://mail-archives.apache.org/mod_mbox/flink-user/).
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Project Template for Scala"
nav-title: Project Template for Scala
nav-parent_id: start
nav-parent_id: projectsetup
nav-pos: 1
---
<!--
Expand Down Expand Up @@ -212,7 +212,7 @@ can run time application from the JAR file without additionally specifying the m
Write your application!

If you are writing a streaming application and you are looking for inspiration what to write,
take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/quickstart/run_example_quickstart.html#writing-a-flink-program)
take a look at the [Stream Processing Application Tutorial]({{ site.baseurl }}/tutorials/datastream_api.html#writing-a-flink-program)

If you are writing a batch processing application and you are looking for inspiration what to write,
take a look at the [Batch Application Examples]({{ site.baseurl }}/dev/batch/examples.html)
Expand All @@ -221,7 +221,7 @@ For a complete overview over the APIa, have a look at the
[DataStream API]({{ site.baseurl }}/dev/datastream_api.html) and
[DataSet API]({{ site.baseurl }}/dev/batch/index.html) sections.

[Here]({{ site.baseurl }}/quickstart/setup_quickstart.html) you can find out how to run an application outside the IDE on a local cluster.
[Here]({{ site.baseurl }}/tutorials/local_setup.html) you can find out how to run an application outside the IDE on a local cluster.

If you have any trouble, ask on our
[Mailing List](https://mail-archives.apache.org/mod_mbox/flink-user/).
Expand Down
2 changes: 1 addition & 1 deletion docs/dev/stream/python.md
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,7 @@ env.execute()

A system-wide default parallelism for all execution environments can be defined by setting the
`parallelism.default` property in `./conf/flink-conf.yaml`. See the
[Configuration]({{ site.baseurl }}/setup/config.html) documentation for details.
[Configuration]({{ site.baseurl }}/ops/config.html) documentation for details.

{% top %}

Expand Down
2 changes: 1 addition & 1 deletion docs/dev/table/sourceSinks.md
Original file line number Diff line number Diff line change
Expand Up @@ -664,7 +664,7 @@ connector.debug=true

### Use a TableFactory in the Table & SQL API

For a type-safe, programmatic approach with explanatory Scaladoc/Javadoc, the Table & SQL API offers descriptors in `org.apache.flink.table.descriptors` that translate into string-based properties. See the [built-in descriptors](connect.md) for sources, sinks, and formats as a reference.
For a type-safe, programmatic approach with explanatory Scaladoc/Javadoc, the Table & SQL API offers descriptors in `org.apache.flink.table.descriptors` that translate into string-based properties. See the [built-in descriptors](connect.html) for sources, sinks, and formats as a reference.

A connector for `MySystem` in our example can extend `ConnectorDescriptor` as shown below:

Expand Down
21 changes: 15 additions & 6 deletions docs/examples/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,24 @@ specific language governing permissions and limitations
under the License.
-->

[Sample Project in Java]({{ site.baseurl }}/quickstart/java_api_quickstart.html) and [Sample Project in Scala]({{ site.baseurl }}/quickstart/scala_api_quickstart.html) are guides to setting up Maven and SBT projects and include simple implementations of a word count application.

[Monitoring Wikipedia Edits]({{ site.baseurl }}/quickstart/run_example_quickstart.html) is a more complete example of a streaming analytics application.
## Bundled Examples

[Building real-time dashboard applications with Apache Flink, Elasticsearch, and Kibana](https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana) is a blog post at elastic.co showing how to build a real-time dashboard solution for streaming data analytics using Apache Flink, Elasticsearch, and Kibana.
The Flink sources include many examples for Flink's different APIs:

The [Flink training website](https://training.data-artisans.com/) from data Artisans has a number of examples. See the hands-on sections, and the exercises.
* DataStream applications ({% gh_link flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples "Java" %} / {% gh_link flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples "Scala" %})
* DataSet applications ({% gh_link flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java "Java" %} / {% gh_link flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala "Scala" %})
* Table API / SQL queries ({% gh_link flink-examples/flink-examples-table/src/main/java/org/apache/flink/table/examples/java "Java" %} / {% gh_link flink-examples/flink-examples-table/src/main/scala/org/apache/flink/table/examples/scala "Scala" %})

## Bundled Examples
These [instructions]({{ site.baseurl }}/dev/batch/examples.html#running-an-example) explain how to run the examples.

## Examples on the Web

There are also a few blog posts published online that discuss example applications:

* [How to build stateful streaming applications with Apache Flink
](https://www.infoworld.com/article/3293426/big-data/how-to-build-stateful-streaming-applications-with-apache-flink.html) presents an event-driven application implemented with the DataStream API and two SQL queries for streaming analytics.

The Flink sources include a number of examples for both **streaming** ( [java](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples) / [scala](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-streaming/src/main/scala/org/apache/flink/streaming/scala/examples) ) and **batch** ( [java](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/java/org/apache/flink/examples/java) / [scala](https://github.com/apache/flink/tree/master/flink-examples/flink-examples-batch/src/main/scala/org/apache/flink/examples/scala) ). These [instructions]({{ site.baseurl }}/dev/batch/examples.html#running-an-example) explain how to run the examples.
* [Building real-time dashboard applications with Apache Flink, Elasticsearch, and Kibana](https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana) is a blog post at elastic.co showing how to build a real-time dashboard solution for streaming data analytics using Apache Flink, Elasticsearch, and Kibana.

* The [Flink training website](https://training.data-artisans.com/) from data Artisans has a number of examples. Check out the hands-on sections and the exercises.
2 changes: 1 addition & 1 deletion docs/start/building.md → docs/flinkDev/building.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Building Flink from Source
nav-parent_id: start
nav-parent_id: flinkdev
nav-pos: 20
---
<!--
Expand Down
8 changes: 4 additions & 4 deletions docs/internals/ide_setup.md → docs/flinkDev/ide_setup.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "IDE Setup"
nav-parent_id: start
title: "Importing Flink into an IDE"
nav-parent_id: flinkdev
nav-pos: 3
---
<!--
Expand All @@ -27,8 +27,8 @@ under the License.

The sections below describe how to import the Flink project into an IDE
for the development of Flink itself. For writing Flink programs, please
refer to the [Java API]({{ site.baseurl }}/quickstart/java_api_quickstart.html)
and the [Scala API]({{ site.baseurl }}/quickstart/scala_api_quickstart.html)
refer to the [Java API]({{ site.baseurl }}/dev/projectsetup/java_api_quickstart.html)
and the [Scala API]({{ site.baseurl }}/dev/projectsetup/scala_api_quickstart.html)
quickstart guides.

**NOTE:** Whenever something is not working in your IDE, try with the Maven
Expand Down
8 changes: 4 additions & 4 deletions docs/start/index.md → docs/flinkDev/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
section-break: true
nav-title: '<i class="fa fa-cogs title maindish" aria-hidden="true"></i> Project Setup'
title: "Project Setup"
nav-id: "start"
nav-title: '<i class="fa fa-cogs title dessert" aria-hidden="true"></i> Flink Development'
title: "Flink Development"
nav-id: "flinkdev"
nav-parent_id: root
nav-pos: 4
nav-pos: 8
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
Expand Down
4 changes: 3 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ Apache Flink is an open source platform for distributed stream and batch data pr

- **Concepts**: Start with the basic concepts of Flink's [Dataflow Programming Model](concepts/programming-model.html) and [Distributed Runtime Environment](concepts/runtime.html). This will help you understand other parts of the documentation, including the setup and programming guides. We recommend you read these sections first.

- **Quickstarts**: [Run an example program](quickstart/setup_quickstart.html) on your local machine or [study some examples](examples/index.html).
- **Tutorials**:
* [Implement and run a DataStream application](./tutorials/datastream_api.html)
* [Setup a local Flink cluster](./tutorials/local_setup.html)

- **Programming Guides**: You can read our guides about [basic API concepts](dev/api_concepts.html) and the [DataStream API](dev/datastream_api.html) or the [DataSet API](dev/batch/index.html) to learn how to write your first Flink programs.

Expand Down
Loading

0 comments on commit 52cbe07

Please sign in to comment.