Lists (1)
Sort Name ascending (A-Z)
Starred repositories
Simple repo to demonstrate how to submit a spark job to EMR from Airflow
An attempt to answer the age old interview question "What happens when you type google.com into your browser and press enter?"
The best place to learn data engineering. Built and maintained by the data engineering community.
Python or SQL for data transformation
This project demonstrates an end-to-end solution for processing and analyzing real-time conversations data from a JSON file using GCP services and infrastructure automation, showcasing data storage…
Sample repo for startdataengineering DE 101 free course
Code for blog at https://www.startdataengineering.com/post/python-for-de/
Cost Efficient Data Pipelines with DuckDB
Free full version of exam testing engine vumingo
All Algorithms implemented in Python
This repo contains all the code used in the Python for Data Engineering Course
Data on Malaysian parliamentary election results + dataviz with the consolidated datasets
Data which, to the best of my knowledge, I am the first / only to collate and make freely available in a machine-readable way. I will delete files for which I discover a better previous source.
open data for blog content at https://www.startdataengineering.com/
josephmachado / soho
Forked from alexandrevicenzi/sohoMinimalist Hugo theme based on Hyde
Simple repo to demonstrate how to submit a spark job
Simple example showing how to trigger a spark job with AWS Lambda
Making data pipelines idempotent
Example repo to create end to end tests for data pipeline.
Repository showing how to automate data testing as part of CI
Multiple node presto cluster on docker container
Repo to explain development, CI/CD cycle in dbt
Near real time ETL to populate a dashboard.