A Java binding to Apache Arrow DataFusion


This project is still work in progress, and currently it works with Arrow 9.0 and DataFusion 7.0 version. It is build and verified in CI against Java 11 and 17. You may check out the docker run instructions where Java 17 jshell is used to run interactively.

How to use in your code

The artifacts are published to maven central, so you can use like any normal Java libraries:

dependencies {
        group = "io.github.datafusion-contrib",
        name = "datafusion-java",
        version = "0.12.0" // or latest version, checkout

To test it out, you can use this piece of demo code:

import org.apache.arrow.datafusion.DataFrame;
import org.apache.arrow.datafusion.SessionContext;
import org.apache.arrow.datafusion.SessionContexts;

public class DataFusionDemo {

    public static void main(String[] args) throws Exception {
        try (ExecutionContext executionContext = ExecutionContexts.create()) {
            executionContext.sql("select sqrt(65536)").thenCompose(DataFrame::show).join();
plugins {

repositories {

tasks {
  application {

dependencies {
    group = "io.github.datafusion-contrib",
    name = "datafusion-java",
    version = "0.12.0"
Run result

$ ./gradlew run
> Task :compileKotlin UP-TO-DATE
> Task :compileJava UP-TO-DATE
> Task :processResources NO-SOURCE
> Task :classes UP-TO-DATE

> Task :run
successfully created tokio runtime
| sqrt(Int64(65536)) |
| 256                |
successfully shutdown tokio runtime

3 actionable tasks: 1 executed, 2 up-to-date
16:43:34: Execution finished 'run'.

How to run the interactive demo

1. Run using Docker (with jshell)

First build the docker image:

docker build -t datafusion-example .
Use '                                                                  0.0s

Then run using Docker:

docker run --rm -it datafusion-example
Dec 27, 2021 2:52:22 AM java.util.prefs.FileSystemPreferences$1 run
INFO: Created user preferences directory.
|  Welcome to JShell -- Version 11.0.13
|  For an introduction type: /help intro

jshell> import org.apache.arrow.datafusion.*

jshell> var context = ExecutionContexts.create()
context ==> org.apache.arrow.datafusion.DefaultSessionContext@4229bb3f

jshell> var df = context.sql("select 1.1 + cos(2.0)").join()
df ==> org.apache.arrow.datafusion.DefaultDataFrame@1a18644

jshell> import org.apache.arrow.memory.*

jshell> var allocator = new RootAllocator()
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http: for further details.
allocator ==> Allocator(ROOT) 0/0/0/9223372036854775807 (res/actual/peak/limit)

jshell> var r = df.collect(allocator).join()
02:52:46.882 [main] INFO  org.apache.arrow.datafusion.DefaultDataFrame - successfully completed with arr length=538
r ==> org.apache.arrow.vector.ipc.ArrowFileReader@5167f57d

jshell> var root = r.getVectorSchemaRoot()
root ==> org.apache.arrow.vector.VectorSchemaRoot@4264b240

jshell> r.loadNextBatch()
$8 ==> true

jshell> var v = root.getVector(0)
v ==> [0.6838531634528577]

2. Build from source

Note you must have local Rust and Java environment setup.

Run the example in one line:

./gradlew run

Or roll your own test example:

// public class ExampleMain {
public static void main(String[] args) throws Exception {
  try (ExecutionContext context = ExecutionContexts.create();
      BufferAllocator allocator = new RootAllocator()) {
    DataFrame dataFrame = context.sql("select 1.5 + sqrt(2.0)");

private void onReaderResult(ArrowReader reader) {
  try {
    VectorSchemaRoot root = reader.getVectorSchemaRoot();
    Schema schema = root.getSchema();
    while (reader.loadNextBatch()) {
      Float8Vector vector = (Float8Vector) root.getVector(0);
      for (int i = 0; i < root.getRowCount(); i += 1) {"value {}={}", i, vector.getValueAsDouble(i));
    // close to release resource
  } catch (IOException e) {
    logger.warn("got IO Exception", e);
// } /* end of ExampleMain */

To build the library:

./gradlew build