Group 10 Project, Fall 2020, CS 6240: Large-Scale Parallel Data Processing, Khoury College of Computer Sciences, Northeastern University
-
Updated
Jul 12, 2021 - Scala
Group 10 Project, Fall 2020, CS 6240: Large-Scale Parallel Data Processing, Khoury College of Computer Sciences, Northeastern University
Parsing the common crawl database using Scala and Spark
Spark code used for my Master's Thesis. Run on AWS EMR clusters
Half-baked implementation of a cluster manager for EMR.
Hadoop Map Reduce
Hadoop MapReduce Programs using Scala to process log files.
Offline Elasticsearch index generator
Infrastructure: The projects herein simplify the repeated use of a variety of frameworks, and cloud services & platforms.
Link Prediction is about predicting the future connections in a graph. In this project, Link Prediction is about predicting whether two authors will be collaborating for their future paper or not given the graph of authors who collaborated for atleast one paper together.
A boilerplate for spark projects with docker support for local development and scripts for emr support.
Add a description, image, and links to the emr topic page so that developers can more easily learn about it.
To associate your repository with the emr topic, visit your repo's landing page and select "manage topics."