Skip to content

FuseQuery is a Distributed SQL Query Engine at scale

License

Notifications You must be signed in to change notification settings

dantengsky/fuse-query

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

89 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Github Actions Status Github Actions Status codecov.io License

FuseQuery

FuseQuery is a Distributed SQL Query Engine at scale.

New implementation of ClickHouse from scratch in Rust, WIP.

Give thanks to ClickHouse and Arrow.

Features

  • High Performance
  • High Scalability
  • High Reliability

Status

SQL Support

  • Projection
  • Filter (WHERE)
  • Limit
  • Aggregate
  • Common math functions
  • Sorting
  • Subqueries
  • Joins

Architecture

Crate Description Status
distributed Distributed scheduler and executor for planner TODO
optimizers Optimizer for distributed plan WIP
datablocks Vectorized data processing unit WIP
datastreams Async streaming iterators WIP
datasources Interface to the datasource(system.numbers for performance/Remote(S3 or other table storage engine)) WIP
execturos Executor(EXPLAIN/SELECT) for the Pipeline WIP
functions Scalar(Arithmetic/Comparison) and Aggregation(Aggregator) functions WIP
processors Dataflow streaming processor(Pipeline) WIP
planners Distributed plan for queries and DML statements(SELECT/EXPLAIN) WIP
servers Server handler(MySQL/HTTP) MySQL
transforms Query execution transform(Source/Filter/Projection/AggregatorPartial/AggregatorFinal/Limit) WIP

Performance

  • Dataset: 10,000,000,000 (10 Billion), system.numbers_mt
  • Hardware: 8vCPUx16G KVM Cloud Instance
  • Rust: rustc 1.50.0-nightly (f76ecd066 2020-12-15)
Query FuseQuery Cost ClickHouse Cost
SELECT sum(number) [1.77s] [1.34s], 7.48 billion rows/s., 59.80 GB/s
SELECT max(number) [2.83s] [2.33s], 4.34 billion rows/s., 34.74 GB/s
SELECT max(number+1) [6.13s] [3.29s], 3.04 billion rows/s., 24.31 GB/s
SELECT count(number) [1.55s] [0.67s], 15.00 billion rows/s., 119.99 GB/s
SELECT sum(number) / count(number) [2.04s] [1.28s], 7.84 billion rows/s., 62.73 GB/s
SELECT sum(number) / count(number), max(number), min(number) [6.40s] [4.30s], 2.33 billion rows/s., 18.61 GB/s

Note:

  • ClickHouse system.numbers_mt is 8-way parallelism processing
  • FuseQuery system.numbers_mt is 8-way parallelism processing

How to install Rust(nightly)?

$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ rustup toolchain install nightly

How to Run?

Fuse-Query Server

$ make run

12:46:15 [ INFO] Options { log_level: "debug", num_cpus: 8, mysql_handler_port: 3307 }
12:46:15 [ INFO] Fuse-Query Cloud Compute Starts...
12:46:15 [ INFO] Usage: mysql -h127.0.0.1 -P3307

Query with MySQL client

Connect
$ mysql -h127.0.0.1 -P3307
Explain
mysql> explain select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| explain                                                                                                                                                                                                                                                                                                               |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| └─ Limit: 3
  └─ Projection: (number + 1) as c1, (number / 2) as c2
    └─ Filter: ((((number + 1) + (number / 2)) + 1) < 100)
      └─ ReadDataSource: scan parts [8](Read from system.numbers_mt table)                                                                                                             |
| 
  └─ LimitTransform × 1 processor
    └─ Merge (LimitTransform × 8 processors) to (MergeProcessor × 1)
      └─ LimitTransform × 8 processors
        └─ ProjectionTransform × 8 processors
          └─ FilterTransform × 8 processors
            └─ SourceTransform × 8 processors                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)
Select
mysql> select (number+1) as c1, number/2 as c2 from system.numbers_mt(10000000) where (c1+c2+1) < 100 limit 3;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    0 |
|    2 |    0 |
|    3 |    1 |
+------+------+
3 rows in set (0.06 sec)

How to Test?

$ make test

About

FuseQuery is a Distributed SQL Query Engine at scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 99.8%
  • Other 0.2%