Skip to content
View bluishglc's full-sized avatar

Block or report bluishglc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

kafkaproxy is a reverse proxy for the wire protocol of Apache Kafka.

Java 71 10 Updated Jun 13, 2023
Scala 35 3 Updated Aug 24, 2022

Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! 🐳

Dockerfile 24 32 Updated May 11, 2021

Yahoo! Cloud Serving Benchmark

Java 4,955 2,252 Updated Nov 14, 2024

Kafka Connect Examples

Shell 42 19 Updated Sep 27, 2022

深度学习经典、新论文逐段精读

27,118 2,449 Updated Aug 8, 2024

Testbench for experimenting with Apache Hive at any data scale.

Java 65 195 Updated Jul 10, 2017

TPC-DS benchmark kit with some modifications/fixes

C 88 65 Updated Aug 13, 2024

A topic-centric list of HQ open datasets.

61,001 9,935 Updated Nov 13, 2024

A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.

Jupyter Notebook 10 1 Updated Aug 22, 2023

This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLD…

Shell 8 15 Updated Jan 30, 2023

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.

Jupyter Notebook 7,844 3,136 Updated Oct 8, 2024

A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.

Shell 16 5 Updated Nov 22, 2022

Backup for NYC TLC data for the DE Zoomcamp course

150 45 Updated Jul 19, 2022

The Metadata Platform for your Data and AI Stack

Java 9,911 2,940 Updated Nov 16, 2024

This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.

Shell 1 Updated Jul 4, 2023

Star Schema Benchmark Tool for Apache Kylin

C 96 47 Updated Aug 26, 2021

Star Schema Benchmark dbgen

C 120 82 Updated Mar 11, 2024

A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype

Java 196 144 Updated Aug 12, 2020

Apache Hadoop docker image

Shell 2,210 1,304 Updated Feb 1, 2024

《spark高级数据分析》练习

Scala 22 42 Updated Jun 9, 2018

New Last.fm Dataset 2020 for music auto-tagging purposes.

Python 28 Updated Jul 6, 2023

A Hadoop cluster based on Docker, including Hive and Spark.

Shell 77 29 Updated Nov 13, 2022

Multi-container environment with Hadoop, Spark and Hive

Shell 203 148 Updated Jan 6, 2024

Docker build for Apache Spark

Dockerfile 675 370 Updated Dec 30, 2021
Next