Skip to content

Spark: Work with Big Data and Build Machine Learning Models at Scale

Notifications You must be signed in to change notification settings

tweichle/Spark-for-Big-Data

Repository files navigation

Spark-for-Big-Data

Udacity Course

This repository demonstrates how to use Spark to work with big data and build machine learning models at scale.

Goals

  • Practice processing and cleaning datasets to get comfortable with Spark’s SQL and dataframe APIs (Spark SQL, PySpark).
  • Debug and optimize for data skewness when running on a cluster.
  • Use Spark’s Machine Learning Library (MLlib) to train machine learning models at scale.