Skip to content

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

License

Notifications You must be signed in to change notification settings

Yisaer/datafuse

 
 

Repository files navigation

Datafuse

Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture

Datafuse is a Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture written in Rust, inspired by ClickHouse and powered by arrow-rs, built to make it easy to power the Data Cloud.

Principles

  • Fearless

    • No data races, No unsafe, Minimize unhandled errors
  • High Performance

    • Everything is Parallelism
  • High Scalability

    • Everything is Distributed
  • High Reliability

    • Datafuse primary design goal is reliability

Architecture

Datafuse Architecture

Performance

  • Memory SIMD-Vector processing performance only
  • Dataset: 100,000,000,000 (100 Billion)
  • Hardware: AMD Ryzen 7 PRO 4750U, 8 CPU Cores, 16 Threads
  • Rust: rustc 1.53.0-nightly (673d0db5e 2021-03-23)
  • Build with Link-time Optimization and Using CPU Specific Instructions
  • ClickHouse server version 21.4.6 revision 54447
Query FuseQuery (v0.4.1) ClickHouse (v21.4.6)
SELECT avg(number) FROM numbers_mt(100000000000) 3.87 s.
(25.83 billion rows/s., 206.79 GB/s.)
×1.6 slow, (6.04 s.)
(16.57 billion rows/s., 132.52 GB/s.)
SELECT sum(number) FROM numbers_mt(100000000000) 4.86 s.
(20.57 billion rows/s., 164.70 GB/s.)
×1.2 slow, (5.90 s.)
(16.95 billion rows/s., 135.62 GB/s.)
SELECT min(number) FROM numbers_mt(100000000000) 5.61 s.
(17.82 billion rows/s., 142.65 GB/s.)
×2.3 slow, (13.05 s.)
(7.66 billion rows/s., 61.26 GB/s.)
SELECT max(number) FROM numbers_mt(100000000000) 5.61 s.
(17.82 billion rows/s., 142.67 GB/s.)
×2.5 slow, (14.07 s.)
(7.11 billion rows/s., 56.86 GB/s.)
SELECT count(number) FROM numbers_mt(100000000000) 3.12 s.
(32.03 billion rows/s., 256.48 GB/s.)
×1.2 slow, (3.71 s.)
(26.93 billion rows/s., 215.43 GB/s.)
SELECT sum(number+number+number) FROM numbers_mt(100000000000) 17.85 s.
(5.60 billion rows/s., 44.85 GB/s.)
×16.9 slow, (233.71 s.)
(427.87 million rows/s., 3.42 GB/s.)
SELECT sum(number) / count(number) FROM numbers_mt(100000000000) 4.02 s.
(24.86 billion rows/s., 199.10 GB/s.)
×2.4 slow, (9.70 s.)
(10.31 billion rows/s., 82.52 GB/s.)
SELECT sum(number) / count(number), max(number), min(number) FROM numbers_mt(100000000000) 9.60 s.
(10.41 billion rows/s., 83.38 GB/s.)
×3.4 slow, (32.87 s.)
(3.04 billion rows/s., 24.34 GB/s.)
SELECT number FROM numbers_mt(10000000000) ORDER BY number DESC LIMIT 1000 5.34 s.
(1.87 billion rows/s., 14.99 GB/s.)
×2.6 slow, (13.95 s.)
(716.62 million rows/s., 5.73 GB/s.)
SELECT max(number),sum(number) FROM numbers_mt(1000000000) GROUP BY number % 3, number % 4, number % 5 9.03 s.
(110.71 million rows/s., 886.50 MB/s.)
×3.5 fast, (2.60 s.)
(385.28 million rows/s., 3.08 GB/s.)

Note:

  • ClickHouse system.numbers_mt is 16-way parallelism processing, gist
  • FuseQuery system.numbers_mt is 16-way parallelism processing, gist

Status

General

  • SQL Parser
  • Query Planner
  • Query Optimizer
  • Predicate Push Down
  • Limit Push Down
  • Projection Push Down
  • Type coercion
  • Parallel Query Execution
  • Distributed Query Execution
  • Hash GroupBy
  • Merge-Sort OrderBy
  • Joins (WIP)

SQL Support

  • Projection
  • Filter (WHERE)
  • Limit
  • Aggregate Functions
  • Scalar Functions
  • UDF Functions
  • SubQueries
  • Sorting
  • Joins (WIP)
  • Window (TODO)

Getting Started

Contributing

Roadmap

  • 0.1 Support aggregation select (2021.02)
  • 0.2 Support distributed query (2021.03)
  • 0.3 Support group by (2021.04)
  • 0.4 Support order by (2021.04)
  • 0.5 Support join
  • 1.0 Support TPC-H benchmark

Release Status

Datafuse is currently in Alpha and is not ready to be used in production.

We are doing our best to release R1.

License

Datafuse is licensed under Apache 2.0.

About

A Modern Real-Time Data Processing & Analytics DBMS with Cloud-Native Architecture, written in Rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 53.9%
  • TypeScript 17.9%
  • HTML 13.8%
  • SCSS 11.1%
  • JavaScript 1.2%
  • Python 1.2%
  • Other 0.9%