Notes on Apache Spark (pyspark)
-
Updated
Mar 3, 2019 - HTML
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Notes on Apache Spark (pyspark)
Apache Spark™ and Scala Workshops
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Scalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Workshop Big Data en Español
MLFlow End to End Workshop at Chandigarh University
Rails application for the Archives Unleashed Cloud.
Dockerizing and Consuming an Apache Livy environment
Example applications of spark-trend-calculus
Serene Data Integration Platform
NiFi, Data Engineering, Data Ingest, REST, ETL, Mapping, ELT, SQL, Spark, Kafka for Good
Example applications of GDELT mass media intelligence data
UC Davis Distributed Computing with Spark SQL (with Databricks) and Databricks Apache Spark SQL for Data Analysts
Github blog about AI and Big Data
Time series forecasting using Prophet and Apache Spark
Infant Mortality Data Prediction and Analysis
Exploratory Analysis of Amazon Product Reviews Dataset comprising of various categories spanning over 14 years
Created by Matei Zaharia
Released May 26, 2014