Stars
Fast, Simple and a cost effective tool to replicate data from Postgres to Data Warehouses, Queues and Storage
A demo for Ventura Analytics meetup - scheduling dbt jobs with Airflow
The most popular ClickHouse plugin for Airflow. 🔝 Top-1% downloads on PyPI: https://pypi.org/project/airflow-clickhouse-plugin! Based on mymarilyn/clickhouse-driver.
Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
An extremely fast Python linter and code formatter, written in Rust.
⚡ Workflow Automation Platform. Orchestrate & Schedule code in any language, run anywhere, 500+ plugins. Alternative to Zapier, Rundeck, Camunda, Airflow...
A high-performance observability data pipeline.
Venice, Derived Data Platform for Planet-Scale Workloads.
An asyncio ClickHouse Python Driver with native (TCP) interface support.
ClickHouse dialect for SQLAlchemy
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
Resources for my talk at Data Con LA 2023: "Predicting Purchases, Rare Diseases, and More: Using Ordinal Regression to Estimate Rare Event Probabilities"
Apache Airflow - OpenApi Client for Python
Python library providing function decorators for configurable backoff and retry
Who Are You? Bayesian Prediction of Racial Category Using Surname and Geolocation
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.
Jupyter Notebook tutorials using astronomical databases and virtual observatory tools
Vertica dialect for SQLAlchemy using the vertica-python client
Free Data Engineering course!
Podman: A tool for managing OCI containers and pods.
Project demonstrating how to automate Prefect 2.0 deployments to AWS ECS Fargate
The open-source alert management and AIOps platform
Official native Python client for the Vertica Analytics Database.
OpenTelemetry Python API and SDK
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
PyPi module for Graphlet AI Knowledge Graph Factory