Skip to content
View ndamulelonemakh's full-sized avatar
🍸
Solution explorer
🍸
Solution explorer
Block or Report

Block or report ndamulelonemakh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Data Engineering Arsenal

State of the art tools for ingesting, transforming and querying analytical data
33 repositories

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Python 16 2 Updated Feb 15, 2023

FUSE filesystem over Google Drive

OCaml 5,477 349 Updated Apr 14, 2024

Free Data Engineering course!

Jupyter Notebook 23,832 5,099 Updated Jul 17, 2024

Exercise files for Microsoft Data Engineer curriculum

PowerShell 340 372 Updated Jul 8, 2024

Scrape Facebook public pages without an API key

Python 2,283 616 Updated Jun 22, 2024

Advanced python library to scrap Twitter (tweets, users) from unofficial API

Python 578 67 Updated Jul 25, 2023

Twitter for Python!

Python 10,380 4,606 Updated May 14, 2024

The Metadata Platform for your Data Stack

Java 9,483 2,811 Updated Jul 20, 2024

🎭 Playwright integration for Scrapy

Python 929 101 Updated Jul 18, 2024

This is a repo with links to everything you'd ever want to learn about data engineering

9,877 1,330 Updated Jul 10, 2024

Workflow Engine for Kubernetes

Go 14,616 3,129 Updated Jul 19, 2024

A guide for technical professionals looking to start consulting

1,218 140 Updated Jun 11, 2024

Apache Flink

Java 23,550 13,120 Updated Jul 19, 2024

Apache Spark - A unified analytics engine for large-scale data processing

Scala 38,973 28,107 Updated Jul 20, 2024

🐍 Quick reference guide to common patterns & functions in PySpark.

380 127 Updated Feb 21, 2023

Simple drawings illustrating the main concepts of Microsoft Fabric to empower anyone to build stuff on Fabric.

HTML 79 17 Updated Jun 27, 2024

This repo has all the resources you need to become an amazing analytics engineer!

46 6 Updated Mar 23, 2024

Convert PDF to markdown quickly with high accuracy

Python 14,546 755 Updated Jul 18, 2024

Transform datasets at scale. Optimize datasets for fast AI model training.

Python 261 25 Updated Jul 19, 2024

Real-time monitor and web admin for Celery distributed task queue

Python 6,296 1,075 Updated Jul 9, 2024

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Python 14,908 3,829 Updated Jul 20, 2024

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

C++ 13,958 3,406 Updated Jul 19, 2024

Always know what to expect from your data.

Python 9,691 1,501 Updated Jul 19, 2024

Apache Superset is a Data Visualization and Data Exploration Platform

TypeScript 60,766 13,149 Updated Jul 19, 2024

Open, Multi-modal Catalog for Data & AI

Java 1,975 281 Updated Jul 19, 2024

A Python Library to support running data quality rules while the spark job is running⚡

Python 155 33 Updated Jul 16, 2024

DuckDB is an analytical in-process SQL database management system

C++ 20,993 1,694 Updated Jul 19, 2024

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

Dockerfile 50 8 Updated Sep 2, 2023

This repository contains all content and code for Astro and Astronomer Software documentation.

Python 53 64 Updated Jul 19, 2024

Open-source vector similarity search for Postgres

C 10,648 482 Updated Jul 19, 2024