-
Lead Data Engineer
- New York
- https://soumilshah.com/
- in/shah-soumil
- channel/UC_eOodxvwS_H7x2uLQa-svw
- https://soumilshah1995.blogspot.com
Highlights
- Pro
Block or Report
Block or report soumilshah1995
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
The Metadata Platform for your Data Stack
Open, Multi-modal Catalog for Data & AI
Build Analytical Applications on Data Lakehouse with Apache Hudi, Daft & Streamlit
Distributed DataFrame for Python designed for the cloud, powered by Rust
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
Hands-on demo for querying Kafka streams using SQL with Trino and data integration with PostgreSQL.
Open Control Plane for Tables in Data Lakehouse
An extremely fast Python linter and code formatter, written in Rust.
A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.
Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
apache hudi delta streamer labs
Apache Hudi and AWS Glue docker compose demo
Local AWS EMR - A local service that imitates AWS EMR
Multi-container environment with Hadoop, Spark and Hive
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Learn and Develop How to ingest data from S3 into Transactional Data lake through event driven approach using Glue and SQS queue and DLQ
Sending Weekly /Daily CSV Reports FROM Hudi Datalake to Customers via Email using Glue and SNS OR SES
Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.
The official home of the Presto distributed SQL query engine for big data
Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.
An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes
A scalable nearest neighbor search library in Apache Spark