Skip to content

adelavina/BookBuilder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

BookBuilder with MapReduce

This application aims to solve the problem described here: https://codekata.com/kata/kata14-tom-swift-under-the-milkwood/

The problem is initially splitted into two and thus solved as two diferent solutions tied together sequentially.

The first problem: BookReading

The first problem forces the application to read through a large amount of text data in order to build a map as shown by the description of the main problem. I aim to obtain a list of trigrams used by the books sampled and merge them into a consistent data structure. Provided that this structure is sorted somehow, the solution will be deterministic, in this case, this means that for any given set of books the solution may run N times and the resulting map will be equivalent every time.

The first problem is solved with a MapReduce implementation. The Mapping class will read each word on every book and emit a , line where each will be a BiGram and each will be the third word that completes the Trigram. The Reducer instances will, for each , emit a a row, where the will be the concatenation of every provided by the Map implementation.

The second problem: BookWriting

The second problem consists on writing text structures (Sentences/Paragraphs) from a generated random BiGram seed and for each iteration pick a word that could follow that BiGram seed. That means that every time a BiGram is generated the TriGram is pseudo-randomly completed with one of the possible following words. This problem is NOT deterministic, for any given TriGram map, the solution will be one of N, randomly selected.

The second problem is solved by an in-memory implementation of a HashMap<Bigram,List> that will list for each combination of two words a list of possible words that could follow. A random map initial point is taken and then each iteration on the BuildSentence method will pick a one of the available List of words on the HashMap. Sentences and parapraphs are ended when there are no further possibilities on the HashMap to continue the Trigram builds, or when the amount of words per sentence or sentences per paragraphs, or paragraphs per book is reached.

The implementation

The application allows the user to scan the books and generate the in-file map only once and then call the solution to the second problem N times to generate possibly N different books. Saving Map-Reduce time when possible.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages