Yeema / WordCount Public

Notifications You must be signed in to change notification settings
Fork 0
Star 0

using Hadoop to rank vocabulary by Aa-Zz

0 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
input		input
output		output
src		src
Makefile		Makefile
README.md		README.md
execute.sh		execute.sh
螢幕快照 2018-12-21 下午4.23.14.png		螢幕快照 2018-12-21 下午4.23.14.png

Repository files navigation

WordCount

Goal : using hadoop to calculate the occurrences of each start letter and assign each job to corresponded reducer
Implementation :

mapper
reducer
customized partitioner and comparator

Calculate occurrences of each start letter

Words are separated by white characters
Ignore words that are not started by an alphabet
Use 2 reducers, first reducer process words start with Aa~Gg, and second reducer process remaining words
Result should be case sensitive
Sort by A → a → B → b ....

Components

Mapper: generate <K, V> pair
src/WordCountMapper.java
Partitioner : assign specific job to each reduce
src/WordCountPartitioner.java
Key Comparator : compare function for sorting
src/WordCountKeyComparator.java
Return negative value (e.g. -1) for ascending order
Return zero for equal order
Return positive value (e.g. 1) for descending order
Reducer: aggregate value of same key and output final result
src/WordCountReducer.java

Execution (in WordCount/)

make clean;

make

sh execute.sh

need to modify input path
lab5-judge-wordcount

About

using Hadoop to rank vocabulary by Aa-Zz

parallel-computing comparator hadoop-mapreduce parallel-programming partitioner

Report repository

Releases

No releases published

Packages

No packages published

Languages