Stars
Accessing Data with JPA :: Learn how to work with JPA data persistence using Spring Data JPA.
FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
❄ A flake8 plugin that helps you to simplify code
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
A proof of concept for how to set up a codebase for an analytics org.
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"
Python Socket.IO server and client
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
lakeFS - Data version control for your data lake | Git for data
A lightweight opinionated ETL framework, halfway between plain scripts and Apache Airflow
Flenser is a simple, minimal, automated exploratory data analysis tool.
Example code for the book Fluent Python, 1st Edition (O'Reilly, 2015)
Text and supporting code for Think Stats, 2nd Edition
notebooks produced throughout the Udacity's Nanodegree Data Engineering Course
Pandas, Polars, and Spark DataFrame comparison for humans and more!
A curated list of awesome big data frameworks, ressources and other awesomeness.
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (…
Data Cleaning Libraries with Python
Big Data for Data Engineers Coursera Specialization from Yandex
Udacity Data Engineer Nano Degree - Project-5 (Data Pipelines)
Learn Apache Spark in Scala, Python (PySpark) and R (SparkR) by building your own cluster with a JupyterLab interface on Docker. ⚡
Apache Spark - A unified analytics engine for large-scale data processing