Skip to content
View frankcash's full-sized avatar
🗻
Moving tons of data.
🗻
Moving tons of data.

Organizations

@MarquezProject
Block or Report

Block or report frankcash

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Open source platform for the machine learning lifecycle

Python 18,023 4,071 Updated Jul 29, 2024

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.

Python 12,278 1,661 Updated Jul 21, 2024

Useful macros when performing data audits

308 38 Updated Jul 25, 2024

The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.

Python 9,169 713 Updated Jul 25, 2024

Lab assignments for Introduction to Data-Centric AI, MIT IAP 2024 👩🏽‍💻

Jupyter Notebook 409 143 Updated Dec 28, 2023

(Legacy) Command Line Interface for Databricks

Python 381 236 Updated Oct 5, 2023

Developer-friendly, serverless vector database for AI applications. Easily add long-term memory to your LLM apps!

Python 3,742 251 Updated Jul 27, 2024

A highly efficient daemon for streaming data from Kafka into Delta Lake

Rust 343 74 Updated Jul 27, 2024

NeuralProphet: A simple forecasting package

Python 3,753 468 Updated Jul 26, 2024

A cluster computing framework for processing large-scale geospatial data

Java 1,827 654 Updated Jul 25, 2024

A curated list of awesome Apache Spark packages and resources.

Shell 1,664 325 Updated Apr 8, 2024

Materials for a 2-day instructor led course on applying machine learning

Jupyter Notebook 198 195 Updated Apr 21, 2021

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports comp…

Python 7,933 1,170 Updated Jul 28, 2024

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks

Python 382 212 Updated Jul 28, 2024

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift

Python 95 52 Updated Jul 28, 2024

The resources of the preparation course for Databricks Data Engineer Associate certification exam

Python 225 388 Updated Jul 27, 2024

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 7,314 1,644 Updated Jul 28, 2024

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow

C++ 25,891 8,688 Updated Jul 27, 2024

CLI tool which enables you to login and retrieve AWS temporary credentials using a SAML IDP

Go 2,050 557 Updated Jul 29, 2024

Snowflake SQLAlchemy

Python 230 149 Updated Jul 22, 2024

Snowflake Connector for Python

Python 571 459 Updated Jul 26, 2024

Astronomer Starship can send your Airflow workloads to new places!

Python 27 5 Updated Jul 15, 2024

An orchestration platform for the development, production, and observation of data assets.

Python 10,860 1,354 Updated Jul 29, 2024

Custom Github Actions

18 19 Updated Jul 16, 2024

An Open Standard for lineage metadata collection

Java 1,661 284 Updated Jul 29, 2024
Python 369 35 Updated Jul 26, 2024

Reads key-value pairs from a .env file and can set them as environment variables. It helps in developing applications following the 12-factor principles.

Python 7,367 419 Updated Jul 23, 2024

Apache Airflow - OpenApi Client for Python

Python 335 49 Updated Jul 25, 2024

Work with remote images registries - retrieving information, images, signing content

Go 7,853 757 Updated Jul 28, 2024

A modular SQL linter and auto-formatter with support for multiple dialects and templated code.

Python 7,456 675 Updated Jul 29, 2024
Next