Skip to content

sunchao/datafusion

 
 

Repository files navigation

DataFusion: Modern Distributed Compute Platform implemented in Rust

License Version Docs Gitter chat

DataFusion is a modern distributed compute platform implemented in Rust. It is very much inspired by Apache Spark and has a similar programming style through the use of DataFrames and SQL.

DataFusion can also be used as a crate dependency in your project if you want the ability to perform SQL queries and DataFrame style data manipulation in-process against your own data sources. In that respect, DataFusion is inspired by Apache Calcite in the Java world.

Project Home Page

The project home page is now at https://datafusion.rs and contains the roadmap as well as documentation for using this crate or running DataFusion as a distributed cluster. I am using GitHub issues to track development tasks and feedback.

Prerequisites

  • Rust nightly
  • Thrift (required by parquet-rs crate) - instructions here

Building DataFusion

See BUILDING.md.

Gitter

There is a Gitter channel where you can ask questions about the project or make feature suggestions too.

Contributing

Contributors are welcome! Please see CONTRIBUTING.md for details.

About

A modern distributed compute platform implemented in Rust

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Rust 98.0%
  • Shell 1.4%
  • Other 0.6%