Skip to content
View anoopj's full-sized avatar

Organizations

@trinodb
Block or Report

Block or report anoopj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Rust port of simdjson

Rust 1,082 84 Updated Aug 13, 2024

Performance-portable, length-agnostic SIMD with runtime dispatch

C++ 4,042 308 Updated Aug 15, 2024

Open, Multi-modal Catalog for Data & AI

Java 2,100 308 Updated Aug 15, 2024

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.

Java 815 136 Updated Aug 15, 2024

BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)

C++ 208 16 Updated May 7, 2024
C++ 676 66 Updated Aug 15, 2024

Delta reader for the Ray open-source toolkit for building ML applications

Python 40 11 Updated Jan 27, 2024

Snowflake dataset containing statistics for 70 million queries over 14 day period

Jupyter Notebook 100 21 Updated Sep 27, 2021
Python 40 12 Updated Jul 25, 2024

Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog

Java 32 13 Updated Dec 5, 2023

Lakehouse storage system benchmark

Scala 62 9 Updated Feb 22, 2023
Python 49 2 Updated Jul 4, 2024

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.

Rust 2,313 129 Updated Jul 12, 2024

If you are looking to become a Google Cloud Engineer , then you are at the right place. GCPSketchnote is series where I share Google Cloud concepts in quick and easy to learn format.

4,692 767 Updated Jun 9, 2023

A cross platform way to express data transformation, relational algebra, standardized record expression and plans.

Python 1,131 148 Updated Aug 15, 2024

ClickHouse® is a real-time analytics DBMS

C++ 36,220 6,719 Updated Aug 15, 2024

Supersonic is an ultra-fast, column oriented query engine library written in C++

C++ 204 43 Updated Oct 2, 2020

Abseil Common Libraries (C++)

C++ 14,606 2,566 Updated Aug 15, 2024

个人整理的Facebook实习面试题目解法,时间范围2016.8-2017.3

Java 53 188 Updated Oct 13, 2019

Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.

Go 2,724 1,357 Updated Aug 15, 2024

Mirror of Apache crail (Incubating)

Java 147 47 Updated Jul 3, 2022

Upserts, Deletes And Incremental Processing on Big Data.

Java 5,284 2,400 Updated Aug 15, 2024

Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB

Java 215 133 Updated Aug 8, 2024

Notes talking about the design and implementation of Apache Spark

5,251 1,840 Updated Apr 2, 2024

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Scala 7,361 1,658 Updated Aug 15, 2024

Secure and fast microVMs for serverless computing.

Rust 24,879 1,744 Updated Aug 15, 2024

A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…

C++ 4,760 579 Updated Aug 10, 2024

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Java 10,044 2,901 Updated Aug 15, 2024

Open source platform for the machine learning lifecycle

Python 18,152 4,102 Updated Aug 15, 2024

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.

C 8,716 1,045 Updated Aug 15, 2024
Next