Stars
Set of hadoop input/output formats for use in combination with hadoop streaming
A java agent to generate method mappings to use with the linux `perf` tool
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
YAML for Java 8 and above. A user-friendly OOP library. Previously known as "Camel".
Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
An open-source and enterprise-level monitoring system.
Confluent's Apache Kafka Golang client
apache-spark-on-k8s / spark
Forked from apache/sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apa…
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads
An open-source columnar data format designed for fast & realtime analytic with big data.
Apache Ranger - To enable, monitor and manage comprehensive data security across the Hadoop platform and beyond
Some information about Apache Kylin interaction with Pentaho Mondrian
Saiku Analytics - The Worlds Greatest Open Source OLAP Browser
Mondrian is an Online Analytical Processing (OLAP) server that enables business users to analyze large quantities of data in real-time.
📈 Capturing JVM- and application-level metrics. So you know what's going on.
Alluxio, data orchestration for analytics and machine learning in the cloud
Refactored version of code.google.com/hadoop-gpl-compression for hadoop 0.20