Skip to content

A code-based tutorial for production level data streaming with PySpark plus Optimus for data cleaning, Confluent Kafka, & Apache Drill using Docker and Cassandra (NoSQL DB) for storage; This allows for for fast feature engineering and data cleaning.

Notifications You must be signed in to change notification settings

daddydrac/PySpark-Confluent-Kafka-Apache-Drill-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySpark + Optimus, Confluent Kafka, Apache Drill, Cassandra/NoSQL + Docker code example

A code-based tutorial on setting up production grade data streams with PySpark, Optimus, Confluent Kafka, & Drill using Docker, with Cassandra (NoSQL) as storage.

(See code and README.md's in nested folders)

About

A code-based tutorial for production level data streaming with PySpark plus Optimus for data cleaning, Confluent Kafka, & Apache Drill using Docker and Cassandra (NoSQL DB) for storage; This allows for for fast feature engineering and data cleaning.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published