ndamulelonemakh

Follow

🍸

Solution explorer

Ndamulelo Nemakhavhani ndamulelonemakh

🍸

Solution explorer

Follow

Data technologist in ML & NLP | Azure Cloud Engineer 🏆 | Indigenous language advocate | Knows a thing or two about LLMs🚀

32 followers · 257 following

Mungana AI
Pretoria
13:43 (UTC -12:00)
https://www.linkedin.com/in/ndamulelonemakhavhani/
@NdamuleloNemakh
@[email protected]
https://credly.com/users/ndamulelo-nemakhavhani

Achievements

Achievements

Block or Report

Block or report ndamulelonemakh

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Data Engineering Arsenal

State of the art tools for ingesting, transforming and querying analytical data

33 repositories

aws-samples / aws-glue-streaming-etl-with-apache-iceberg

Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3

Python 16 2 Updated Feb 15, 2023

astrada / google-drive-ocamlfuse

FUSE filesystem over Google Drive

OCaml 5,477 349 Updated Apr 14, 2024

DataTalksClub / data-engineering-zoomcamp

Free Data Engineering course!

Jupyter Notebook 23,832 5,099 Updated Jul 17, 2024

MicrosoftLearning / dp-203-azure-data-engineer

Exercise files for Microsoft Data Engineer curriculum

PowerShell 340 372 Updated Jul 8, 2024

kevinzg / facebook-scraper

Scrape Facebook public pages without an API key

Python 2,283 616 Updated Jun 22, 2024

markowanga / stweet

Advanced python library to scrap Twitter (tweets, users) from unofficial API

Python 578 67 Updated Jul 25, 2023

tweepy / tweepy

Twitter for Python!

Python 10,380 4,606 Updated May 14, 2024

datahub-project / datahub

The Metadata Platform for your Data Stack

Java 9,483 2,811 Updated Jul 20, 2024

scrapy-plugins / scrapy-playwright

🎭 Playwright integration for Scrapy

Python 929 101 Updated Jul 18, 2024

DataExpert-io / data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

9,877 1,330 Updated Jul 10, 2024

argoproj / argo-workflows

Workflow Engine for Kubernetes

Go 14,616 3,129 Updated Jul 19, 2024

sdg-1 / consulting-handbook

A guide for technical professionals looking to start consulting

1,218 140 Updated Jun 11, 2024

apache / flink

Apache Flink

Java 23,550 13,120 Updated Jul 19, 2024

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

Scala 38,973 28,107 Updated Jul 20, 2024

kevinschaich / pyspark-cheatsheet

🐍 Quick reference guide to common patterns & functions in PySpark.

380 127 Updated Feb 21, 2023

microsoft / fabricnotes

Simple drawings illustrating the main concepts of Microsoft Fabric to empower anyone to build stuff on Fabric.

HTML 79 17 Updated Jun 27, 2024

DataExpert-io / analytics-engineer-handbook

This repo has all the resources you need to become an amazing analytics engineer!

46 6 Updated Mar 23, 2024

VikParuchuri / marker

Convert PDF to markdown quickly with high accuracy

Python 14,546 755 Updated Jul 18, 2024

Lightning-AI / litdata

Transform datasets at scale. Optimize datasets for fast AI model training.

Python 261 25 Updated Jul 19, 2024

mher / flower

Real-time monitor and web admin for Celery distributed task queue

Python 6,296 1,075 Updated Jul 9, 2024

airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Python 14,908 3,829 Updated Jul 20, 2024

apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing

C++ 13,958 3,406 Updated Jul 19, 2024

great-expectations / great_expectations

Always know what to expect from your data.

Python 9,691 1,501 Updated Jul 19, 2024

apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform

TypeScript 60,766 13,149 Updated Jul 19, 2024

unitycatalog / unitycatalog

Open, Multi-modal Catalog for Data & AI

Java 1,975 281 Updated Jul 19, 2024

Nike-Inc / spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡

Python 155 33 Updated Jul 16, 2024

duckdb / duckdb

DuckDB is an analytical in-process SQL database management system

C++ 20,993 1,694 Updated Jul 19, 2024

dominikhei / Local-Data-LakeHouse

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

Dockerfile 50 8 Updated Sep 2, 2023

astronomer / docs

This repository contains all content and code for Astro and Astronomer Software documentation.

Python 53 64 Updated Jul 19, 2024

pgvector / pgvector

Open-source vector similarity search for Postgres

C 10,648 482 Updated Jul 19, 2024