Skip to content

Latest commit

 

History

History
 
 

flink-connector-wikiedits

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

flink-connector-wikiedits

A non-parallel source that parses a live stream of Wikipedia edits.

Meta data about the edits is mirrored to the IRC channel #en.wikipedia. The source establishes a connection to this IRC channel and parses the messages into WikipediaEditEvent instances.

The purpose of this source is to ease the setup of demos of the DataStream API with live data.

The original idea is from the Hello Samza project of Apache Samza. The Samza code for this is located in the samza-hello-samza repository.

Example

Add the following dependency to your project:

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-wikiedits</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>

You can use the source like regular sources:

StreamExecutionEnvironment env = StreamExecutionEnvironment
    .getExecutionEnvironment();

DataStream<WikipediaEditEvent> edits = env
    .addSource(new WikipediaEditsSource());

Remember that it is non-parallel source and as such it will run with parallelism 1.