Skip to content
View soumilshah1995's full-sized avatar
🎯
happy
🎯
happy

Highlights

  • Pro
Block or Report

Block or report soumilshah1995

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

The Metadata Platform for your Data Stack

Java 9,457 2,806 Updated Jul 12, 2024

Open, Multi-modal Catalog for Data & AI

Java 1,927 272 Updated Jul 11, 2024

DeltaHudiTransformations

2 Updated Mar 20, 2024

Build Analytical Applications on Data Lakehouse with Apache Hudi, Daft & Streamlit

Python 5 1 Updated May 10, 2024

Distributed DataFrame for Python designed for the cloud, powered by Rust

Rust 1,878 119 Updated Jul 12, 2024

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.

Scala 2,012 882 Updated Jul 12, 2024

Hands-on demo for querying Kafka streams using SQL with Trino and data integration with PostgreSQL.

Python 6 1 Updated Dec 3, 2023

Open Control Plane for Tables in Data Lakehouse

Java 272 43 Updated Jul 11, 2024

An extremely fast Python linter and code formatter, written in Rust.

Rust 28,955 940 Updated Jul 12, 2024

A minimal Python package for storing and retrieving text using chunking, embeddings, and vector search.

Python 593 26 Updated Jun 28, 2024

Python SQL Parser and Transpiler

Python 6,031 605 Updated Jul 11, 2024

Sling is a CLI tool that extracts data from a source storage/database and loads it in a target storage/database.

Go 307 16 Updated Jul 2, 2024

apache hudi delta streamer labs

Python 3 3 Updated Jun 27, 2024

Apache Hudi and AWS Glue docker compose demo

Jupyter Notebook 3 4 Updated Jan 12, 2024

Local AWS EMR - A local service that imitates AWS EMR

Python 24 10 Updated Jul 5, 2023

Multi-container environment with Hadoop, Spark and Hive

Shell 182 135 Updated Jan 6, 2024

An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR

Python 171 60 Updated Nov 15, 2023

docker_compose_glue3.0

Python 1 3 Updated Aug 27, 2023

Learn and Develop How to ingest data from S3 into Transactional Data lake through event driven approach using Glue and SQS queue and DLQ

4 3 Updated Aug 20, 2023

Sending Weekly /Daily CSV Reports FROM Hudi Datalake to Customers via Email using Glue and SNS OR SES

Python 1 Updated Aug 13, 2023

Sample CI/CD pipeline for using GitHub Actions with Amazon EMR Serverless Spark.

Python 5 1 Updated Apr 7, 2023

The official home of the Presto distributed SQL query engine for big data

Java 15,763 5,287 Updated Jul 12, 2024

Chat with your documents on your local device using GPT models. No data leaves your device and 100% private.

Python 19,625 2,196 Updated Jun 28, 2024

An easy-to-use Python utility class for accessing incremental data from Hudi Data Lakes

Python 2 Updated Jun 18, 2023

A scalable nearest neighbor search library in Apache Spark

Scala 257 65 Updated Mar 29, 2019
Next