Skip to content
View josephmachado's full-sized avatar
Block or Report

Block or report josephmachado

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This project demonstrates an end-to-end solution for processing and analyzing real-time conversations data from a JSON file using GCP services and infrastructure automation, showcasing data storage…

Python 5 1 Updated Apr 29, 2024

jless is a command-line JSON viewer designed for reading, exploring, and searching through JSON data.

Rust 4,602 87 Updated Jun 1, 2024

Simple repo to demonstrate how to submit a spark job to EMR from Airflow

Python 30 23 Updated Oct 18, 2020

A template repository to create a data project with IAC, CI/CD, Data migrations, & testing

HTML 202 88 Updated Jun 14, 2024

Example repo to create end to end tests for data pipeline.

Python 20 3 Updated Jun 14, 2024

Repo for CDC with debezium blog post

Python 18 9 Updated Jun 14, 2024

End to end data engineering project

Python 48 16 Updated Oct 27, 2022

Sample repo for startdataengineering DE 101 free course

13 5 Updated Jun 24, 2024

Code for blog at https://www.startdataengineering.com/post/python-for-de/

Python 27 20 Updated Jun 7, 2024

Cost Efficient Data Pipelines with DuckDB

C 32 22 Updated Jun 14, 2024

Code for dbt tutorial

127 65 Updated May 31, 2024

Project for "Data pipeline design patterns" blog.

Python 40 5 Updated Feb 18, 2023

Near real time ETL to populate a dashboard.

Python 63 27 Updated Jun 17, 2024

Simple stream processing pipeline

Python 80 23 Updated Jun 17, 2024

Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/

Python 9 1 Updated May 24, 2024

Beginner data engineering project - batch edition

HTML 420 102 Updated Jun 25, 2024

Code for "Efficient Data Processing in Spark" Course

Python 146 31 Updated May 29, 2024

Code for blog at: https://www.startdataengineering.com/post/docker-for-de/

C 23 9 Updated Apr 29, 2024
Python 15 Updated Apr 26, 2024

Sample project to demonstrate data engineering best practices

Python 142 21 Updated Feb 24, 2024

Code to demonstrate data engineering metadata & logging best practices

Python 14 3 Updated Mar 12, 2024

Possibly the fastest DataFrame-agnostic quality check library in town.

Python 127 15 Updated Jul 1, 2024

A suite of utilities for converting to and working with CSV, the king of tabular file formats.

Python 5,896 605 Updated May 22, 2024

A custom end-to-end data engineering pipeline for customer churn

HTML 7 Updated Sep 6, 2023

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 9,841 2,853 Updated Jul 2, 2024