Stars
Data validation library for PySpark 3.0.0
Spark-Radiant is Apache Spark Performance and Cost Optimizer
Using Scala to create a Spark UDF designed to be callable from PySpark.
List of projects that provide terminal user interfaces
A whitespace formatter for different query languages
Compare tables within or across databases
Automated data quality suggestions and analysis with Deequ on AWS Glue
SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
Example code for running Spark and Hive jobs on EMR Serverless.
Serverless app to track USCIS case status
Open Source Development Platform for building robust type-safe distributed systems with declarative infrastructure
(educational) build your own disk based KV store
The kubectl plugin which allows us to test IRSA configuration AWS sa
⚡ Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
12 weeks, 26 lessons, 52 quizzes, classic Machine Learning for all
Hey this is the repo that has all the queries and data for my video game training series!