This closes apache#2281: Clean up dataflow/google references/URLs in …

…examples
mransley · Mar 23, 2017 · 9ca6511 · 9ca6511
2 parents ddc7595 + 81bcbb4
commit 9ca6511
Show file tree

Hide file tree

Showing 7 changed files with 42 additions and 36 deletions.
diff --git a/examples/java/README.md b/examples/java/README.md
@@ -20,25 +20,23 @@
 # Example Pipelines
 
 The examples included in this module serve to demonstrate the basic
-functionality of Google Cloud Dataflow, and act as starting points for
+functionality of Apache Beam, and act as starting points for
 the development of more complex pipelines.
 
 ## Word Count
 
 A good starting point for new users is our set of
-[word count](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples) examples, which computes word frequencies. This series of four successively more detailed pipelines is described in detail in the accompanying [walkthrough](https://cloud.google.com/dataflow/examples/wordcount-example).
+[word count](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples) examples, which computes word frequencies. This series of four successively more detailed pipelines is described in detail in the accompanying [walkthrough](https://beam.apache.org/get-started/wordcount-example/).
 
-1. [`MinimalWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java) is the simplest word count pipeline and introduces basic concepts like [Pipelines](https://cloud.google.com/dataflow/model/pipelines),
-[PCollections](https://cloud.google.com/dataflow/model/pcollection),
-[ParDo](https://cloud.google.com/dataflow/model/par-do),
-and [reading and writing data](https://cloud.google.com/dataflow/model/reading-and-writing-data) from external storage.
+1. [`MinimalWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/MinimalWordCount.java) is the simplest word count pipeline and introduces basic concepts like [Pipelines](https://beam.apache.org/documentation/programming-guide/#pipeline),
+[PCollections](https://beam.apache.org/documentation/programming-guide/#pcollection),
+[ParDo](https://beam.apache.org/documentation/programming-guide/#transforms-pardo),
+and [reading and writing data](https://beam.apache.org/documentation/programming-guide/#io) from external storage.
 
-1. [`WordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java) introduces Dataflow best practices like [PipelineOptions](https://cloud.google.com/dataflow/pipelines/constructing-your-pipeline#Creating) and custom [PTransforms](https://cloud.google.com/dataflow/model/composite-transforms).
+1. [`WordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java) introduces best practices like [PipelineOptions](https://beam.apache.org/documentation/programming-guide/#pipeline) and custom [PTransforms](https://beam.apache.org/documentation/programming-guide/#transforms-composite).
 
 1. [`DebuggingWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java)
-shows how to view live aggregators in the [Dataflow Monitoring Interface](https://cloud.google.com/dataflow/pipelines/dataflow-monitoring-intf), get the most out of
-[Cloud Logging](https://cloud.google.com/dataflow/pipelines/logging) integration, and start writing
-[good tests](https://cloud.google.com/dataflow/pipelines/testing-your-pipeline).
+demonstrates some best practices for instrumenting your pipeline code.
 
 1. [`WindowedWordCount`](https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WindowedWordCount.java) shows how to run the same pipeline over either unbounded PCollections in streaming mode or bounded PCollections in batch mode.
 
@@ -50,46 +48,55 @@ Change directory into `examples/java` and run the examples:
  -Dexec.mainClass=<MAIN CLASS> \
  -Dexec.args="<EXAMPLE-SPECIFIC ARGUMENTS>"
 
-For example, you can execute the `WordCount` pipeline on your local machine as follows:
+Alternatively, you may choose to bundle all dependencies into a single JAR and
+execute it outside of the Maven environment.
+
+### Direct Runner
+
+You can execute the `WordCount` pipeline on your local machine as follows:
 
  mvn compile exec:java \
  -Dexec.mainClass=org.apache.beam.examples.WordCount \
  -Dexec.args="--inputFile=<LOCAL INPUT FILE> --output=<LOCAL OUTPUT FILE>"
 
-Once you have followed the general Cloud Dataflow
-[Getting Started](https://cloud.google.com/dataflow/getting-started) instructions, you can execute
-the same pipeline on fully managed resources in Google Cloud Platform:
+To create the bundled JAR of the examples and execute it locally:
+
+ mvn package
+
+ java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
+ org.apache.beam.examples.WordCount \
+ --inputFile=<INPUT FILE PATTERN> --output=<OUTPUT FILE>
+
+### Google Cloud Dataflow Runner
+
+After you have followed the general Cloud Dataflow
+[prerequisites and setup](https://beam.apache.org/documentation/runners/dataflow/), you can execute
+the pipeline on fully managed resources in Google Cloud Platform:
 
  mvn compile exec:java \
  -Dexec.mainClass=org.apache.beam.examples.WordCount \
  -Dexec.args="--project=<YOUR CLOUD PLATFORM PROJECT ID> \
  --tempLocation=<YOUR CLOUD STORAGE LOCATION> \
- --runner=BlockingDataflowRunner"
+ --runner=DataflowRunner"
 
 Make sure to use your project id, not the project number or the descriptive name.
-The Cloud Storage location should be entered in the form of
+The Google Cloud Storage location should be entered in the form of
 `gs:https://bucket/path/to/staging/directory`.
 
-Alternatively, you may choose to bundle all dependencies into a single JAR and
-execute it outside of the Maven environment. For example, you can execute the
-following commands to create the
-bundled JAR of the examples and execute it both locally and in Cloud
-Platform:
+To create the bundled JAR of the examples and execute it in Google Cloud Platform:
 
  mvn package
 
- java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
- org.apache.beam.examples.WordCount \
- --inputFile=<INPUT FILE PATTERN> --output=<OUTPUT FILE>
-
  java -cp examples/java/target/beam-examples-java-bundled-<VERSION>.jar \
  org.apache.beam.examples.WordCount \
  --project=<YOUR CLOUD PLATFORM PROJECT ID> \
  --tempLocation=<YOUR CLOUD STORAGE LOCATION> \
- --runner=BlockingDataflowRunner
+ --runner=DataflowRunner
+
+## Other Examples
 
 Other examples can be run similarly by replacing the `WordCount` class path with the example classpath, e.g.
-`org.apache.beam.examples.cookbook.BigQueryTornadoes`,
+`org.apache.beam.examples.cookbook.CombinePerKeyExamples`,
 and adjusting runtime options under the `Dexec.args` parameter, as specified in
 the example itself.
 

diff --git a/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java b/examples/java/src/main/java/org/apache/beam/examples/DebuggingWordCount.java
@@ -151,7 +151,7 @@ public static void main(String[] args) {
  * <p>Below we verify that the set of filtered words matches our expected counts. Note
  * that PAssert does not provide any output and that successful completion of the
  * Pipeline implies that the expectations were met. Learn more at
- * https://cloud.google.com/dataflow/pipelines/testing-your-pipeline on how to test
+ * https://beam.apache.org/documentation/pipelines/test-your-pipeline/ on how to test
  * your Pipeline and see {@link DebuggingWordCountTest} for an example unit test.
  */
  List<KV<String, Long>> expectedResults = Arrays.asList(

diff --git a/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md b/examples/java/src/main/java/org/apache/beam/examples/cookbook/README.md
@@ -21,7 +21,7 @@
 
 This directory holds simple "cookbook" examples, which show how to define
 commonly-used data analysis patterns that you would likely incorporate into a
-larger Dataflow pipeline. They include:
+larger Apache Beam pipeline. They include:
 
  <ul>
  <li><a href="https://github.com/apache/beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/cookbook/BigQueryTornadoes.java">BigQueryTornadoes</a>

diff --git a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/README.md
@@ -20,10 +20,10 @@
 # 'Gaming' examples
 
 
-This directory holds a series of example Dataflow pipelines in a simple 'mobile
+This directory holds a series of example Apache Beam pipelines in a simple 'mobile
 gaming' domain. They all require Java 8. Each pipeline successively introduces
 new concepts, and gives some examples of using Java 8 syntax in constructing
-Dataflow pipelines. Other than usage of Java 8 lambda expressions, the concepts
+Beam pipelines. Other than usage of Java 8 lambda expressions, the concepts
 that are used apply equally well in Java 7.
 
 In the gaming scenario, many users play, as members of different teams, over
@@ -58,7 +58,7 @@ the day's cutoff point.
 
 The next pipeline in the series is `HourlyTeamScore`. This pipeline also
 processes data collected from gaming events in batch. It builds on `UserScore`,
-but uses [fixed windows](https://cloud.google.com/dataflow/model/windowing), by
+but uses [fixed windows](https://beam.apache.org/documentation/programming-guide/#windowing), by
 default an hour in duration. It calculates the sum of scores per team, for each
 window, optionally allowing specification of two timestamps before and after
 which data is filtered out. This allows a model where late data collected after

diff --git a/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java b/examples/java8/src/main/java/org/apache/beam/examples/complete/game/injector/Injector.java
@@ -260,8 +260,7 @@ private static String generateEvent(Long currTime, int delayInMillis) {
  user = team.getRandomUser();
  }
  String event = user + "," + teamName + "," + random.nextInt(MAX_SCORE);
- // Randomly introduce occasional parse errors. You can see a custom counter tracking the number
- // of such errors in the Dataflow Monitoring UI, as the example pipeline runs.
+ // Randomly introduce occasional parse errors.
  if (random.nextInt(parseErrorRate) == 0) {
  System.out.println("Introducing a parse error.");
  event = "THIS LINE REPRESENTS CORRUPT DATA AND WILL CAUSE A PARSE ERROR";

diff --git a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/GameStatsTest.java
@@ -38,7 +38,7 @@
  * Tests of GameStats.
  * Because the pipeline was designed for easy readability and explanations, it lacks good
  * modularity for testing. See our testing documentation for better ideas:
- * https://cloud.google.com/dataflow/pipelines/testing-your-pipeline.
+ * https://beam.apache.org/documentation/pipelines/test-your-pipeline/
  */
 @RunWith(JUnit4.class)
 public class GameStatsTest implements Serializable {

diff --git a/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java b/examples/java8/src/test/java/org/apache/beam/examples/complete/game/HourlyTeamScoreTest.java
@@ -45,7 +45,7 @@
  * Tests of HourlyTeamScore.
  * Because the pipeline was designed for easy readability and explanations, it lacks good
  * modularity for testing. See our testing documentation for better ideas:
- * https://cloud.google.com/dataflow/pipelines/testing-your-pipeline.
+ * https://beam.apache.org/documentation/pipelines/test-your-pipeline/
  */
 @RunWith(JUnit4.class)
 public class HourlyTeamScoreTest implements Serializable {