Skip to content

ramanbagga/ddf-flink

 
 

Repository files navigation

DDF with Flink

This project depends on DDF and uses Apache Flink engine.

DDF

Distributed DataFrame: Productivity = Power x Simplicity For Big Data Scientists & Engineers


Getting Started

This project depends on DDF v1.4.0-SNAPSHOT and requires its installation to run. To get DDF version 1.4.0-SNAPSHOT, clone DDF repo and checkout the tuplejump-integration branch.

$ git clone [email protected]:ddf-project/DDF.git
$ cd DDF
$ git fetch
$ git checkout tuplejump-integration

No changes are required when installing DDF using maven.

Before installing DDF using SBT, add a new line after line#482 in project/RootBuild.scala, (don't miss adding the comma at the end of line#482)

  ),

publishArtifact in (Compile, packageDoc) := false

This is to avoid the error in publishing docs through SBT.

DDF can be installed by,

$ bin/run-once.sh
//using maven
$ mvn package install -DskipTests
//or using sbt
$ sbt publishLocal

Installing ddf-with-flink can be done by

$ git clone [email protected]:tuplejump/ddf-with-flink.git
$ cd ddf-with-flink
$ bin/run-once.sh
$ mvn package install -DskipTests

Running tests

Tests can be run either through SBT or Maven,

$ sbt test
$ mvn test

//running a single test

$ sbt "testOnly *FlinkDDFManagerSpec*"

$ mvn test -Dsuites='io.ddf.flink.FlinkDDFManagerSpec'

Starting ddf-shell with flink engine

Execute the following only after installing ddf-with-flink

$ sbt package
$ bin/ddf-shell

SBT package is required since it generates the lib_managed which is required for running the scripts.

Running the example,

$ sbt package
$ bin/run-flink-example io.ddf.flink.examples.FlinkDDFExample

SBT package is required since it generates the lib_managed which is required for running the scripts.

####Todo

  1. Test the ML method getConfusionMatrix
  2. Implement transformPython and flattenDDF for TransformationHandler and also test the R functions.
  3. Implement the methods r2score, residuals, roc and rmse for MLMetricsSupporter

About

DDF with Flink Implementation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 80.4%
  • Shell 13.7%
  • Java 5.5%
  • Batchfile 0.4%