Simple Spark Application

A simple Spark application that counts the occurrence of each word in a corpus and then counts the occurrence of each character in the most popular words. Includes the same program implemented in Java and Scala.

To make a jar:

mvn package

To run from a gateway node in a CDH5 cluster:

spark-submit --class com.cloudera.sparkwordcount.SparkWordCount --master local \
  target/sparkwordcount-0.0.1-SNAPSHOT.jar <input file> 2

This will run the application in a single local process. If the cluster is running a Spark standalone cluster manager, you can replace "--master local" with "--master spark:https://<master host>:<master port>".

If the cluster is running YARN, you can replace "--master local" with "--master yarn".

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Spark Application

About

Releases

Packages

Contributors 3

Languages

License

sryza/simplesparkapp

Folders and files

Latest commit

History

Repository files navigation

Simple Spark Application

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages