Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Katas - Convert task description from HTML to Markdown #11736

Merged
merged 42 commits into from
May 19, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
dce14f8
Support ZetaSQL DATE type as a Beam LogicalType
robinyqiu Mar 24, 2020
401f213
[BEAM-6733] Add pipeline option to flush bundle data before checkpoin…
mxm May 11, 2020
f4a0f66
Remove all answer placeholder checks as they can be confusing at time…
henryken May 16, 2020
849721f
Update course in Stepik
henryken May 16, 2020
fc5c981
[BEAM-10018] Fix timestamps in windowing kata
iht May 16, 2020
db5004c
[BEAM-10018] Kata failing due to failed parsing
iht May 16, 2020
af2d850
Convert html task description to md for "Hello Beam" and "Core Transf…
henryken May 17, 2020
f214352
Remove unused import
iht May 17, 2020
b18ea2a
Add missing dependency
iht May 17, 2020
80bc613
Fix member variable name in Kata documentation
iht May 17, 2020
45a0b85
Fix placeholder location
iht May 17, 2020
ab42e55
Convert html task description to md for "Core Transforms" remaining l…
henryken May 17, 2020
ee4a44e
Convert html task description to md for "Common Transforms" lessons
henryken May 17, 2020
5ea0940
Convert html task description to md for remaining Python Katas lessons
henryken May 17, 2020
d5606be
Convert html task description to md for most of Java Katas lessons
henryken May 17, 2020
f9ae024
Convert html task description to md for Java Katas "Common Transforms…
henryken May 17, 2020
6c73dbe
Convert html task description to md for Java Katas "Core Transforms" …
henryken May 17, 2020
d773f8c
[BEAM-2530] Implement Zeta SQL precommit compile tests and run on jav…
pawelpasterz May 18, 2020
7c80ecb
Merge pull request #11678: [BEAM-6733] Add pipeline option to flush b…
mxm May 18, 2020
64414b8
Python3 fix - convert dict.keys() to list before indexing (#11733)
chamikaramj May 18, 2020
1aa715c
Updates google-apitools and httplib2 (#11726)
tvalentyn May 18, 2020
1f21a4c
Merge pull request #11731 from [BEAM-10018] Fix timestamps in two win…
pabloem May 18, 2020
ddf2927
Merge pull request #11730 from henryken/katas-python-remove-answer-pl…
pabloem May 18, 2020
de9177e
[BEAM-9964] Update CHANGES.md (#11743)
omarismail94 May 18, 2020
47c246b
Merge pull request #11272: [BEAM-9641] Support ZetaSQL DATE type as a…
apilloud May 18, 2020
76fbe45
[BEAM-9577] Artifact v2 support for uber jars. (#11708)
robertwb May 18, 2020
7c81b93
Populate all SpannerIO batching parameters in display data.
nielm Apr 26, 2020
9ded9e2
Fix capitalization, clarify descriptions
TheNeuralBit May 14, 2020
192e9ad
fix capitalization, clarify description Grouped
TheNeuralBit May 14, 2020
30a68f5
Refactor to extract single method for popuplating displayData
nielm May 18, 2020
decd50a
[BEAM-9821] Populate all SpannerIO batching parameters in display dat…
TheNeuralBit May 19, 2020
c89f188
Convert html task description to md for "Hello Beam" and "Core Transf…
henryken May 17, 2020
3f5a48c
Convert html task description to md for "Core Transforms" remaining l…
henryken May 17, 2020
714d82f
Convert html task description to md for "Common Transforms" lessons
henryken May 17, 2020
1d55f6f
Convert html task description to md for remaining Python Katas lessons
henryken May 17, 2020
1a5afa1
Convert html task description to md for most of Java Katas lessons
henryken May 17, 2020
23f419f
Convert html task description to md for Java Katas "Common Transforms…
henryken May 17, 2020
353abda
Convert html task description to md for Java Katas "Core Transforms" …
henryken May 17, 2020
b739cb7
Resolve merge conflict
henryken May 19, 2020
2c0c4c9
Merge remote-tracking branch 'origin/katas-convert-html-desc-to-md' i…
henryken May 19, 2020
80a490e
Update Python Katas on Stepik
henryken May 19, 2020
bfcf1b4
Update Beam Katas Java on Stepik
henryken May 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Convert html task description to md for most of Java Katas lessons
  • Loading branch information
henryken committed May 17, 2020
commit d5606be2bbea63c69b1d4800e9963e5cbb65c766
Original file line number Diff line number Diff line change
Expand Up @@ -16,21 +16,18 @@
~ limitations under the License.
-->

<html>
<h2>Word Count Pipeline</h2>
<p>
<b>Kata:</b> Create a pipeline that counts the number of words.
</p>
<p>
Please output the count of each word in the following format:
</p>
<pre>
word:count
ball:5
book:3
</pre>
<br>
Word Count Pipeline
-------------------

**Kata:** Create a pipeline that counts the number of words.

Please output the count of each word in the following format:
```text
word:count
ball:5
book:3
```

<div class="hint">
Refer to your katas above.
</div>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,14 @@
~ limitations under the License.
-->

<html>
<h2>Built-in I/Os</h2>
<p>
Beam SDKs provide many out of the box I/O transforms that can be used to read from many
different sources and write to many different sinks.
</p>
<p>
See the <a href="https://beam.apache.org/documentation/io/built-in/">Beam-provided I/O
Transforms</a> page for a list of the currently available I/O transforms.
</p>
<p>
<b>Note:</b> There is no kata for this task. Please click the "Check" button and
proceed to the next task.
</p>
</html>
Built-in I/Os
-------------

Beam SDKs provide many out of the box I/O transforms that can be used to read from many different
sources and write to many different sinks.

See the [Beam-provided I/O Transforms](https://beam.apache.org/documentation/io/built-in/) page for
a list of the currently available I/O transforms.

**Note:** There is no kata for this task. Please click the "Check" button and proceed to the next
task.
Original file line number Diff line number Diff line change
Expand Up @@ -16,32 +16,29 @@
~ limitations under the License.
-->

<html>
<h2>TextIO Read</h2>
<p>
When you create a pipeline, you often need to read data from some external source, such as a file
or a database. Likewise, you may want your pipeline to output its result data to an external
storage system. Beam provides read and write transforms for a number of common data storage types.
If you want your pipeline to read from or write to a data storage format that isn’t supported by
the built-in transforms, you can implement your own read and write transforms.
</p>
<p>
To read a PCollection from one or more text files, use TextIO.read() to instantiate a transform
and use TextIO.Read.from(String) to specify the path of the file(s) to be read.
</p>
<p>
<b>Kata:</b> Read the 'countries.txt' file and convert each country name into uppercase.
</p>
<br>
TextIO Read
-----------

When you create a pipeline, you often need to read data from some external source, such as a file
or a database. Likewise, you may want your pipeline to output its result data to an external
storage system. Beam provides read and write transforms for a number of common data storage types.
If you want your pipeline to read from or write to a data storage format that isn’t supported by
the built-in transforms, you can implement your own read and write transforms.

To read a PCollection from one or more text files, use TextIO.read() to instantiate a transform
and use TextIO.Read.from(String) to specify the path of the file(s) to be read.

**Kata:** Read the 'countries.txt' file and convert each country name into uppercase.

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html">
TextIO</a> and its corresponding
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html#read--">
TextIO.read()</a> method.
</div>

<div class="hint">
Refer to the Beam Programming Guide
<a href="https://beam.apache.org/documentation/programming-guide/#pipeline-io-reading-data">
"Reading input data"</a> section for more information.
</div>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -16,38 +16,34 @@
~ limitations under the License.
-->

<html>
<h2>Hello Beam Pipeline</h2>
<p>
Apache Beam is an open source, unified model for defining both batch and streaming data-parallel
processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the
pipeline. The pipeline is then executed by one of Beam’s supported distributed processing
back-ends, which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.
</p>
<p>
Beam is particularly useful for Embarrassingly Parallel data processing tasks, in which the
problem can be decomposed into many smaller bundles of data that can be processed independently
and in parallel. You can also use Beam for Extract, Transform, and Load (ETL) tasks and pure data
integration. These tasks are useful for moving data between different storage media and data
sources, transforming data into a more desirable format, or loading data onto a new system.
</p>
<p>
To learn more about Apache Beam, refer to
<a href="https://beam.apache.org/get-started/beam-overview/">Apache Beam Overview</a>.
</p>
<p>
<b>Kata:</b> Your first kata is to create a simple pipeline that takes a hardcoded input element
"Hello Beam".
</p>
<br>
Welcome To Apache Beam
----------------------

Apache Beam is an open source, unified model for defining both batch and streaming data-parallel
processing pipelines. Using one of the open source Beam SDKs, you build a program that defines the
pipeline. The pipeline is then executed by one of Beam’s supported distributed processing back-ends,
which include Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow.

Beam is particularly useful for Embarrassingly Parallel data processing tasks, in which the problem
can be decomposed into many smaller bundles of data that can be processed independently and in
parallel. You can also use Beam for Extract, Transform, and Load (ETL) tasks and pure data
integration. These tasks are useful for moving data between different storage media and data
sources, transforming data into a more desirable format, or loading data onto a new system.

To learn more about Apache Beam, refer to
[Apache Beam Overview](https://beam.apache.org/get-started/beam-overview/).

**Kata:** Your first kata is to create a simple pipeline that takes a hardcoded input element
"Hello Beam".

<div class="hint">
Hardcoded input can be created using
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Create.html">
Create</a>.
</div>

<div class="hint">
Refer to the Beam Programming Guide
<a href="https://beam.apache.org/documentation/programming-guide/#creating-pcollection-in-memory">
"Creating a PCollection from in-memory data"</a> section for more information.
</div>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -16,44 +16,45 @@
~ limitations under the License.
-->

<html>
<h2>Early Triggers</h2>
<p>
Triggers allow Beam to emit early results, before all the data in a given window has arrived.
For example, emitting after a certain amount of time elapses, or after a certain number of
elements arrives.
</p>
<p>
<b>Kata:</b> Given that events are being generated every second and a fixed window of 1-day
duration, please implement an early trigger that emits the number of events count immediately
after new element is processed.
</p>
<br>
Early Triggers
--------------

Triggers allow Beam to emit early results, before all the data in a given window has arrived. For
example, emitting after a certain amount of time elapses, or after a certain number of elements
arrives.

**Kata:** Given that events are being generated every second and a fixed window of 1-day duration,
please implement an early trigger that emits the number of events count immediately after new
element is processed.

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/AfterWatermark.AfterWatermarkEarlyAndLate.html#withEarlyFirings-org.apache.beam.sdk.transforms.windowing.Trigger.OnceTrigger-">
withEarlyFirings</a> to set early firing triggers.
</div>

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/FixedWindows.html">
FixedWindows</a> with 1-day duration using
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/AfterWatermark.html#pastEndOfWindow--">
AfterWatermark.pastEndOfWindow()</a> trigger.
</div>

<div class="hint">
Set the <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/Window.html#withAllowedLateness-org.joda.time.Duration-">
allowed lateness</a> to 0 with
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/Window.html#discardingFiredPanes--">
discarding accumulation mode</a>.
</div>

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Combine.html#globally-org.apache.beam.sdk.transforms.CombineFnBase.GlobalCombineFn-">
Combine.globally</a> and
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Count.html#combineFn--">
Count.combineFn</a> to calculate the count of events.
</div>

<div class="hint">
Refer to the Beam Programming Guide
<a href="https://beam.apache.org/documentation/programming-guide/#event-time-triggers">
"Event time triggers"</a> section for more information.
</div>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -16,63 +16,57 @@
~ limitations under the License.
-->

<html>
<h2>Event Time Triggers</h2>
<p>
When collecting and grouping data into windows, Beam uses triggers to determine when to emit the
aggregated results of each window (referred to as a pane). If you use Beam’s default windowing
configuration and default trigger, Beam outputs the aggregated result when it estimates all data
has arrived, and discards all subsequent data for that window.
</p>
<p>
You can set triggers for your PCollections to change this default behavior. Beam provides a
number of pre-built triggers that you can set:
</p>
<div>
<ul>
<li>Event time triggers</li>
<li>Processing time triggers</li>
<li>Data-driven triggers</li>
<li>Composite triggers</li>
</ul>
</div>
<p>
Event time triggers operate on the event time, as indicated by the timestamp on each data
element. Beam’s default trigger is event time-based.
</p>
<p>
The AfterWatermark trigger operates on event time. The AfterWatermark trigger emits the contents
of a window after the watermark passes the end of the window, based on the timestamps attached
to the data elements. The watermark is a global progress metric, and is Beam’s notion of input
completeness within your pipeline at any given point. AfterWatermark.pastEndOfWindow() only fires
when the watermark passes the end of the window.
</p>
<p>
<b>Kata:</b> Given that events are being generated every second, please implement a trigger that
emits the number of events count within a fixed window of 5-second duration.
</p>
<br>
Event Time Triggers
-------------------

When collecting and grouping data into windows, Beam uses triggers to determine when to emit the
aggregated results of each window (referred to as a pane). If you use Beam’s default windowing
configuration and default trigger, Beam outputs the aggregated result when it estimates all data
has arrived, and discards all subsequent data for that window.

You can set triggers for your PCollections to change this default behavior. Beam provides a number
of pre-built triggers that you can set:

* Event time triggers
* Processing time triggers
* Data-driven triggers
* Composite triggers

Event time triggers operate on the event time, as indicated by the timestamp on each data element.
Beam’s default trigger is event time-based.

The AfterWatermark trigger operates on event time. The AfterWatermark trigger emits the contents
of a window after the watermark passes the end of the window, based on the timestamps attached to
the data elements. The watermark is a global progress metric, and is Beam’s notion of input
completeness within your pipeline at any given point. AfterWatermark.pastEndOfWindow() only fires
when the watermark passes the end of the window.

**Kata:** Given that events are being generated every second, please implement a trigger that emits
the number of events count within a fixed window of 5-second duration.

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/FixedWindows.html">
FixedWindows</a> with 5-second duration using
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/AfterWatermark.html#pastEndOfWindow--">
AfterWatermark.pastEndOfWindow()</a> trigger.
</div>

<div class="hint">
Set the <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/Window.html#withAllowedLateness-org.joda.time.Duration-">
allowed lateness</a> to 0 with
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/Window.html#discardingFiredPanes--">
discarding accumulation mode</a>.
</div>

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Combine.html#globally-org.apache.beam.sdk.transforms.CombineFnBase.GlobalCombineFn-">
Combine.globally</a> and
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Count.html#combineFn--">
Count.combineFn</a> to calculate the count of events.
</div>

<div class="hint">
Refer to the Beam Programming Guide
<a href="https://beam.apache.org/documentation/programming-guide/#event-time-triggers">
"Event time triggers"</a> section for more information.
</div>
</html>
Original file line number Diff line number Diff line change
Expand Up @@ -16,48 +16,50 @@
~ limitations under the License.
-->

<html>
<h2>Window Accumulation Mode</h2>
<p>
When you specify a trigger, you must also set the the window’s accumulation mode. When a trigger
fires, it emits the current contents of the window as a pane. Since a trigger can fire multiple
times, the accumulation mode determines whether the system accumulates the window panes as the
trigger fires, or discards them.
</p>
<p>
<b>Kata:</b> Given that events are being generated every second and a fixed window of 1-day
duration, please implement an early trigger that emits the number of events count immediately
after new element is processed in accumulating mode.
</p>
<br>
Window Accumulation Mode
------------------------

When you specify a trigger, you must also set the the window’s accumulation mode. When a trigger
fires, it emits the current contents of the window as a pane. Since a trigger can fire multiple
times, the accumulation mode determines whether the system accumulates the window panes as the
trigger fires, or discards them.

**Kata:** Given that events are being generated every second and a fixed window of 1-day duration,
please implement an early trigger that emits the number of events count immediately after new
element is processed in accumulating mode.

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/2.13.0/org/apache/beam/sdk/transforms/windowing/Window.html#accumulatingFiredPanes--">
accumulatingFiredPanes()</a> to set a window to accumulate the panes that are produced when the
trigger fires.
</div>

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/AfterWatermark.AfterWatermarkEarlyAndLate.html#withEarlyFirings-org.apache.beam.sdk.transforms.windowing.Trigger.OnceTrigger-">
withEarlyFirings</a> to set early firing triggers.
</div>

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/FixedWindows.html">
FixedWindows</a> with 1-day duration using
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/AfterWatermark.html#pastEndOfWindow--">
AfterWatermark.pastEndOfWindow()</a> trigger.
</div>

<div class="hint">
Set the <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/windowing/Window.html#withAllowedLateness-org.joda.time.Duration-">
allowed lateness</a> to 0.
</div>

<div class="hint">
Use <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Combine.html#globally-org.apache.beam.sdk.transforms.CombineFnBase.GlobalCombineFn-">
Combine.globally</a> and
<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/transforms/Count.html#combineFn--">
Count.combineFn</a> to calculate the count of events.
</div>

<div class="hint">
Refer to the Beam Programming Guide
<a href="https://beam.apache.org/documentation/programming-guide/#event-time-triggers">
"Event time triggers"</a> section for more information.
</div>
</html>
Loading