Skip to content

mjalajel/luigi-mapreduce

Repository files navigation

luigi-mapreduce

An attempt do create real MapReduce with luigi

Problem

Create a map-reduce job to find the max number for the 1st field in a large csv file

Components

  • 1x mapper:
    • input: 0.large-file.txt
    • output: 1.part-00.txt … part-09.txt
  • 10x transformer:
    • input: part-xx.txt
    • output: 2.transformed-xx.txt
  • 10x reducer:
    • input: transformed-xx.txt
    • output: 3.solved-xx.txt
  • 1x collector:
    • input: solved-xx.txt
    • output: 4.solution.txt

Challenges:

  1. All transformers and groupers should run in parallel [resolved in 3.luigi-mapreduce ]
  2. Transformers and groups should start working before mapper finishes [resolved with a hack in 4.luigi-mapreduce-smooth]

About

An attempt do create *real* MapReduce with luigi

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages