-
Mungana AI
- Pretoria
-
13:43
(UTC -12:00) - https://www.linkedin.com/in/ndamulelonemakhavhani/
- @NdamuleloNemakh
- @[email protected]
- https://credly.com/users/ndamulelo-nemakhavhani
Block or Report
Block or report ndamulelonemakh
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseData Engineering Arsenal
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
FUSE filesystem over Google Drive
Free Data Engineering course!
Exercise files for Microsoft Data Engineer curriculum
Scrape Facebook public pages without an API key
Advanced python library to scrap Twitter (tweets, users) from unofficial API
The Metadata Platform for your Data Stack
🎭 Playwright integration for Scrapy
This is a repo with links to everything you'd ever want to learn about data engineering
A guide for technical professionals looking to start consulting
Apache Spark - A unified analytics engine for large-scale data processing
🐍 Quick reference guide to common patterns & functions in PySpark.
Simple drawings illustrating the main concepts of Microsoft Fabric to empower anyone to build stuff on Fabric.
This repo has all the resources you need to become an amazing analytics engineer!
Convert PDF to markdown quickly with high accuracy
Transform datasets at scale. Optimize datasets for fast AI model training.
Real-time monitor and web admin for Celery distributed task queue
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
Always know what to expect from your data.
Apache Superset is a Data Visualization and Data Exploration Platform
Open, Multi-modal Catalog for Data & AI
A Python Library to support running data quality rules while the spark job is running⚡
DuckDB is an analytical in-process SQL database management system
Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.
This repository contains all content and code for Astro and Astronomer Software documentation.
Open-source vector similarity search for Postgres