Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream Kernel #311

Open
Max-Meldrum opened this issue Feb 20, 2022 · 3 comments
Open

Stream Kernel #311

Max-Meldrum opened this issue Feb 20, 2022 · 3 comments
Labels
domain: performance Anything related to Arcon performance epic proposal Design proposal for Arcon

Comments

@Max-Meldrum
Copy link
Member

Moving forward we will need a stream kernel for the data processing layer that is specifically designed for Arcon. This issue serves as a direction (I.e not prioritised in short term).

Related issues:

#277
#246
#214

Kernel

The Kernel represents an application-level OS that manages Task scheduling, memory management, and I/O. The idea is to cooperatively schedule a set of tasks on a single core in order to get better CPU utilisation + locality between tasks + avoid context switches. As noted here, storage and networking are no longer the bottlenecks in a modern data center, but CPUs are.

Running a Thread-Per-Core model with cooperative scheduling is not something unique itself. Down below are some data-parallel systems that execute it with success.

  1. ScyllaDB
  2. Hazelcast Jet
  3. Redpanda

Rough Overview Sketch

kernel_arch

The Kernel has the following context that is shared between tasks that it executes:

  1. Time (Watermark)
  2. Epoch
  3. Logger
  4. State Backend(s)

Task

A cooperative async long-running future that drives the execution of a dataflow node. A Task must always check if it should yield back control to other tasks in order to not block progress.

// source example
let source = ...;
let task: Task = async move {
   loop {
     while let Some(elem) = source.poll_next().await {
       // output elem to next task.
       // yield back control to scheduler at times..
     }
  }
};

Tasks may send their output in 3 different ways:

  1. Intra-Kernel with Rc<RefCell<Vec<T>>>
  2. Local Inter-Kernel with explicit queues (e.g., spsc)
  3. Remote Inter-Kernel over the wire

Application-level Task Scheduling

API levels

Suggested by @segeljakt

High-level API: builtin operators (map, filter, window, keyby)
Mid-level API: operator constructors + event handlers
Low-level API: tasks/async functions + channels

Async-friendly Runtime

Currently, it is hard to support async interfaces/crates. Two prime examples are source implementations and supporting state backends that are async.

// Rough sketching of source, operator, and state interfaces.

#[async_trait]
pub trait Source {
    type Item: ArconType;
    async fn poll_next(&mut self) -> Result<Poll<Self::Item>>;
}

#[async_trait]
pub trait Operator {
    async fn handle_elem(&mut self, elem: ArconElement<Self::IN>);
}

#[async_trait]
pub trait ValueIndex<V: Value> {
   async fn get(&mut self) -> Result<Option<V>>;
   async fn put(&mut self, value: V) -> Result<()>;
}

Glommio

Glommio is a Seastar inspired TPC cooperative threading framework built in Rust. It relies on Linux and its io_uring functionality. This is the only notable downside of adopting Gloomio for Arcon. That is, making it a Linux only system. But then again, data-intensive systems such as Arcon are supposed to run on Linux anyway.

Another downside is that Glommio runs on the assumption that the machine has NVMe storage. Specific OS + Hardware requirements will make it harder to run or to contribute to Arcon.

Pros:

  1. Configurable scheduling priority (latency matters vs. not)
  2. First-class io_uring and Direct I/O support (future backend)
  3. Designed to be used in data and I/O intensive systems (Arcon)
  4. Offers placement strategies (NUMA etc..)

Cons:

  1. Linux + Hardware Requirements
  2. Not as many users/contributors as for example tokio

Article about Gloomio may be found here.

Other Async runtime candidates

  1. monoio
  2. tokio
@Max-Meldrum Max-Meldrum added domain: performance Anything related to Arcon performance epic proposal Design proposal for Arcon labels Feb 20, 2022
@Max-Meldrum Max-Meldrum pinned this issue Feb 20, 2022
@Max-Meldrum Max-Meldrum added this to Near-term in Arcon Roadmap Feb 21, 2022
@Max-Meldrum
Copy link
Member Author

Tokio

The other approach is to adopt tokio and use a similar approach to the actix runtime where they spawn a runtime per thread and combine it with a LocalSet to support !Send futures.

Going with tokio as the default kernel runtime is a bit safer and makes it easier to contribute but also play around with Arcon. I believe it should be possible to support glommio later on with a glommio_rt feature flag.

@segeljakt
Copy link
Member

segeljakt commented Mar 4, 2022

I wonder if we could combine kompact and tokio to get "the best of both worlds". Tokio integrates us with the async ecosystem which I think is crucial for streaming applications. Kompact on the other hand gives us networking (while tokio only gives us sockets).

If it's possible, I think a nice solution could be to implement some type of network channel using kompact which tokio tasks on different machines can use for communication.

@Max-Meldrum although distributed Arcon is not currently planned, what do you think?

@Max-Meldrum
Copy link
Member Author

I wonder if we could combine kompact and tokio to get "the best of both worlds". Tokio integrates us with the async ecosystem which I think is crucial for streaming applications. Kompact on the other hand gives us networking (while tokio only gives us sockets).

If it's possible, I think a nice solution could be to implement some type of network channel using kompact which tokio tasks on different machines can use for communication.

@Max-Meldrum although distributed Arcon is not currently planned, what do you think?

yeah, worth seeing if it's possible to combine or perhaps reuse implementation parts of the networking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: performance Anything related to Arcon performance epic proposal Design proposal for Arcon
Projects
Arcon Roadmap
Near-term
Development

No branches or pull requests

2 participants