Apache spark is an open source popular big data processing engine. It has provided machine learning library to make machine learning scalable and easy. The aim of the project is to show how to use apache spark machine learning library.
If you are new with apache spark, please learn basic about the tool and see some example codes before getting started.
The following tools need to installed before using the project
- Java sdk >=7.0
- Maven >=3.0.0
- Scala sdk >=2.11
- Apache spark >=2.0.0 (if you want to submit in spark local machine)
- Install java from here if java is not installed
- Install Maven by following the instructions
- Download and install apache spark
- Open any terminal
- Clone project
git clone [email protected]:shihabuddinbuet/machine-learning-spark.git
- Run
cd machine-learning-spark
- Run
mvn clean -DskipTests package
to build the project - Submit the jar in spark for any of the main apps
- Md shihab uddin - Initial work - shihabuddinbuet
See also the list of contributors who participated in this project.