Starred repositories
The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applications with advanced filtering capabilities. It seamlessly inte…
This repository contains two Python scripts that demonstrate how to create a chatbot using Streamlit, OpenAI GPT-3.5-turbo, and Activeloop's Deep Lake.
A highly efficient daemon for streaming data from Kafka into Delta Lake
A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.
Spark Streaming application with enhanced Kafka Streaming consumer metrics exposed using Spark 3 PrometheusServlet
Amplify your team's potential with customizable and secure AI assistants.
Diffusion Bee is the easiest way to run Stable Diffusion locally on your M1 Mac. Comes with a one-click installer. No dependencies or technical knowledge needed.
Go/gRPC service designed to enable generic rate limit scenarios from different types of applications.
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
Distributed database specialized in exporting key/value data from Hadoop
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
kuhnen / spark-glue
Forked from bbenzikry/spark-glueSpark releases with AWS Glue support
The AWS Glue Data Catalog is a fully managed, Apache Hive Metastore compatible, metadata repository. Customers can use the Data Catalog as a central repository to store structural and operational m…
metrics-datadog
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Base classes to use when writing tests with Spark
Docker image for Spark history server on Kubernetes
jahstreet / incubator-livy
Forked from apache/incubator-livyMirror of Apache livy (Incubating)
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Livy is an open source REST interface for interacting with Apache Spark from anywhere
Redshift Auto Schema is a Python library that takes a delimited flat file or parquet file as input, parses it, and provides a variety of functions that allow for the creation and validation of tabl…
Spark on Kubernetes infrastructure Helm charts repo
Spark on Kubernetes infrastructure Docker images repo
Create and modify Tableau workbook and datasource files
A production-grade HBase ORM library that makes accessing HBase clean, fast and fun (Can also be used as Bigtable ORM)