Skip to content
View josephmachado's full-sized avatar
:octocat:
Working
:octocat:
Working

Block or report josephmachado

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
43 results for source starred repositories
Clear filter

Event data simulator. Generates a stream of pseudo-random events from a set of users, designed to simulate web traffic.

Scala 502 130 Updated Mar 16, 2022
Jupyter Notebook 3 1 Updated Oct 11, 2024
Dockerfile 9 1 Updated Dec 11, 2023

Step by step instructions to create a production-ready data pipeline

Jupyter Notebook 24 7 Updated Sep 25, 2024

PySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster

Python 419 214 Updated Oct 15, 2024

🐍 Quick reference guide to common patterns & functions in PySpark.

443 145 Updated Feb 21, 2023

JupyterLab computational environment.

TypeScript 14,164 3,383 Updated Nov 1, 2024

Simple ETL demonstrated with literate programming

Python 7 Updated Aug 20, 2024

The fastest way to create an HTML app

Jupyter Notebook 5,413 230 Updated Nov 2, 2024

Repository for Data Engineering Interview Series

Jupyter Notebook 21 Updated Oct 17, 2024

personal how-tos for common DE tasks

Python 4 Updated Aug 23, 2024

Primary repository for NYC DCP's Data Engineering team

Python 22 Updated Nov 2, 2024

A systematic approach to creating better documentation.

HTML 876 162 Updated Aug 30, 2024

Code for my "Efficient Data Processing in SQL" book.

Python 47 17 Updated Aug 6, 2024

Code for data quality with greatexpectations blog

Python 11 1 Updated Jul 30, 2024

Code for "Advanced data transformations in SQL" free live workshop

Jupyter Notebook 64 19 Updated Oct 22, 2024

This project demonstrates an end-to-end solution for processing and analyzing real-time conversations data from a JSON file using GCP services and infrastructure automation, showcasing data storage…

Python 8 1 Updated Apr 29, 2024

Python or SQL for data transformation

Python 8 Updated Jul 4, 2024

jless is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.

Rust 4,769 91 Updated Sep 7, 2024

Simple repo to demonstrate how to submit a spark job to EMR from Airflow

Python 32 23 Updated Oct 18, 2020

A template repository to create a data project with IAC, CI/CD, Data migrations, & testing

HTML 235 101 Updated Jul 11, 2024

Example repo to create end to end tests for data pipeline.

Python 21 4 Updated Jun 14, 2024

Repo for CDC with debezium blog post

Python 26 12 Updated Sep 15, 2024

End to end data engineering project

Python 49 17 Updated Oct 27, 2022

Sample repo for startdataengineering DE 101 free course

35 23 Updated Jun 24, 2024

Code for blog at https://www.startdataengineering.com/post/python-for-de/

Python 54 60 Updated Jun 7, 2024

Cost Efficient Data Pipelines with DuckDB

C 44 65 Updated Jul 31, 2024

Code for dbt tutorial

142 73 Updated May 31, 2024

Project for "Data pipeline design patterns" blog.

Python 41 6 Updated Aug 6, 2024
Next