-
Lead Data Engineer
- New York
- https://soumilshah.com/
- in/shah-soumil
- channel/UC_eOodxvwS_H7x2uLQa-svw
- https://soumilshah1995.blogspot.com
Highlights
- Pro
Block or Report
Block or report soumilshah1995
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse-
event-driven-dms-failure-alerts
-
A Simple Config-Driven Python Template for Rapid DMS to S3 Integration | Single Task per Table Strategy
Python GNU General Public License v3.0 UpdatedMar 26, 2024 -
emr-serverless-airflow-deltastreamer-jobs
-
DeltaStreamer-Airflow-EMR-Xtable
BSD 3-Clause "New" or "Revised" License UpdatedMar 23, 2024 -
-
Simplified Delta Streamer Job Management: A Structured Approach for Efficient Data Processing
Apache License 2.0 UpdatedMar 16, 2024 -
xtable-with-emr-serverless Public
stable-with-emr-serverless
GNU General Public License v3.0 UpdatedMar 16, 2024 -
onetable-deltastreamer-glue Public
onetable-deltastreamer-glue
Creative Commons Zero v1.0 Universal UpdatedMar 9, 2024 -
glue-dot-interactive-session-template
BSD 2-Clause "Simplified" License UpdatedMar 9, 2024 -
openhouse Public
Forked from linkedin/openhouseOpen Control Plane for Tables in Data Lakehouse
Java BSD 2-Clause "Simplified" License UpdatedMar 8, 2024 -
onetable-delta-multimodal-index-builder
Python MIT License UpdatedMar 6, 2024 -
one-table-with-deltastreamer Public
one table-with-deltastreamer
Python BSD 2-Clause "Simplified" License UpdatedMar 5, 2024 -
aws-hudi-delta-iceberg-interoperability
Jupyter Notebook MIT License UpdatedMar 2, 2024 -
ruff Public
Forked from astral-sh/ruffAn extremely fast Python linter and code formatter, written in Rust.
Rust MIT License UpdatedFeb 28, 2024 -
-
Learn How to Integerate Hudi Spark job with Airflow and MinIO
-
sling-to-starrocks-demo Public
sling-to-starrocks-demo
Python GNU General Public License v3.0 UpdatedFeb 16, 2024 -
sqlglot Public
Forked from tobymao/sqlglotPython SQL Parser and Transpiler
Python MIT License UpdatedFeb 13, 2024 -
sling-etl-cli-demo Public
sling-etl-cli-demo
-
sling-cli Public
Forked from slingdata-io/sling-cliSling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.
Go GNU General Public License v3.0 UpdatedFeb 11, 2024 -
hudi-minio-starrpcks-superset
-
StarRocks-Hudi-Minio Public
StarRocks+Hudi+Minio
-
pache Hudi Table Services | Hands on labs
Python GNU General Public License v3.0 UpdatedFeb 3, 2024 -
-
hudi-and-glue-locally Public
Forked from Wuerike/hudi-and-glue-locallyApache Hudi and AWS Glue docker compose demo
Jupyter Notebook UpdatedJan 12, 2024 -
vectordb Public
Forked from kagisearch/vectordbA minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.
Python MIT License UpdatedJan 11, 2024 -
Dynamic Hudi Delta Streamer Jobs with JDBC Puller for PostgreSQL Tables, Bringing All Tables into Hudi and Running Jobs in Parallel
-
From Datalake to Microservices: Unleashing the Power of Apache Hudi's Record Level Index with FastAPI and Spark Connect
-
Get Started with Hudi CLI Locally Using Docker in Minutes and Connect to Your S3 Data
Apache License 2.0 UpdatedDec 30, 2023 -
HUDI + Spark+ DBT + Glue Hive Metastore Run Locally