Block or Report
Block or report josephmachado
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Sort by: Recently starred
This project demonstrates an end-to-end solution for processing and analyzing real-time conversations data from a JSON file using GCP services and infrastructure automation, showcasing data storage…
jless is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
A template repository to create a data project with IAC, CI/CD, Data migrations, & testing
Example repo to create end to end tests for data pipeline.
Repo for CDC with debezium blog post
Sample repo for startdataengineering DE 101 free course
Code for blog at https://www.startdataengineering.com/post/python-for-de/
Cost Efficient Data Pipelines with DuckDB
Project for "Data pipeline design patterns" blog.
Near real time ETL to populate a dashboard.
Simple stream processing pipeline
Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/
Beginner data engineering project - batch edition
Code for "Efficient Data Processing in Spark" Course
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Sample project to demonstrate data engineering best practices
Code to demonstrate data engineering metadata & logging best practices
Possibly the fastest DataFrame-agnostic quality check library in town.
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
A custom end-to-end data engineering pipeline for customer churn
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)