- Shanghai, China
- https://laurence.blog.csdn.net/
Stars
kafkaproxy is a reverse proxy for the wire protocol of Apache Kafka.
Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! 🐳
Testbench for experimenting with Apache Hive at any data scale.
databricks / tpcds-kit
Forked from gregrahn/tpcds-kitTPC-DS benchmark kit with some modifications/fixes
A topic-centric list of HQ open datasets.
A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.
This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLD…
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.
Backup for NYC TLC data for the DE Zoomcamp course
The Metadata Platform for your Data and AI Stack
This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
New Last.fm Dataset 2020 for music auto-tagging purposes.
A Hadoop cluster based on Docker, including Hive and Spark.
Multi-container environment with Hadoop, Spark and Hive