-
Google
- Bay Area, CA
- http:https://anoopjohnson.com
- @anoopjohnson
Block or Report
Block or report anoopj
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Sort by: Recently starred
Performance-portable, length-agnostic SIMD with runtime dispatch
Open, Multi-modal Catalog for Data & AI
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
BtrBlocks: Efficient Columnar Compression for Data Lakes (SIGMOD 2023 Paper)
Delta reader for the Ray open-source toolkit for building ML applications
Snowflake dataset containing statistics for 70 million queries over 14 day period
Enables synchronizing metadata changes (Create/Drop table/partition) from Hive Metastore to AWS Glue Data Catalog
weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
If you are looking to become a Google Cloud Engineer , then you are at the right place. GCPSketchnote is series where I share Google Cloud concepts in quick and easy to learn format.
A cross platform way to express data transformation, relational algebra, standardized record expression and plans.
ClickHouse® is a real-time analytics DBMS
Supersonic is an ultra-fast, column oriented query engine library written in C++
个人整理的Facebook实习面试题目解法,时间范围2016.8-2017.3
Kubernetes operator for managing the lifecycle of Apache Spark applications on Kubernetes.
Upserts, Deletes And Incremental Processing on Big Data.
Implementations of open source Apache Hadoop/Hive interfaces which allow for ingesting data from Amazon DynamoDB
Notes talking about the design and implementation of Apache Spark
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Secure and fast microVMs for serverless computing.
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Open source platform for the machine learning lifecycle
YugabyteDB - the cloud native distributed SQL database for mission-critical applications.