feat: Partitioned execution #409

bjchambers · 2023-06-01T19:36:44Z

Summary

The current "execution plan" is close, but not quite correctly adapted for use when describing the steps necessary to perform a computation.
Ideally, we would introduce a physical plan that is more similar to relational query engines, allowing us to leverage existing techniques and creating the options to run on existing systems.

For now, the plan is to introduce these and move execution towards running them directly, and then (separately) work towards compiling queries directly to physical plans.

bjchambers · 2023-06-01T19:38:07Z

#407 was a refactoring to move the ScalarValue into a more accessible location for the physical plans. For now, the intention is to use the ScalarValue within the physical plan to represent literal values. In the long term we may want to revisit that and use a better encoding that would be more aligned with logical plans, but can revisit once the basic plumbing is laid out a bit better.

This is part of #409.

This is part of #409. Introduces `Pipeline` information to the physical plan. This indicates which steps are part of a linear sequence, and should (ideally) be executed together. Also implements a pipeline "scheduler" to determine the pipeline for each step, in a new `sparrow-backend` crate. As the physical plan is built-up, the code should go in this "compiler backend" package, which can own optimization and conversion of logical plans to physical plans.

This introduces the key components of partitioned execution. - `sparrow-scheduler` provides functionality for managing the separate pipelines within the query plan and morsel-driven parallelism. It managing a thread-pool of workers pinned to specific CPUs pulling tasks from local queues. - `sparrow-transforms` will provide implementations of the "transforms" (project, select, etc.) and a pipeline for executing the transforms. - `sparrow-execution` will pull everything together to provide partitioned execution. This is part of #409.

bjchambers added the enhancement New feature or request label Jun 1, 2023

bjchambers self-assigned this Jun 1, 2023

bjchambers added a commit that referenced this issue Jun 1, 2023

feat: introducing physical plans

28bf38c

This is part of #409.

bjchambers added a commit that referenced this issue Jun 1, 2023

feat: introducing physical plans

de8721f

This is part of #409.

bjchambers mentioned this issue Jun 1, 2023

feat: introducing physical plans #410

Merged

bjchambers added a commit that referenced this issue Jun 2, 2023

feat: introducing physical plans (#410)

165a426

This is part of #409.

bjchambers mentioned this issue Jun 2, 2023

feat: introduce pipeline scheduler #413

Merged

bjchambers mentioned this issue Jun 30, 2023

feat: Use object store and async, byte-range reads #465

Open

18 tasks

bjchambers changed the title ~~feat: Introduce physical plans suitable for partitioned & distributed execution~~ feat: Partitioned execution Jul 21, 2023

This was referenced Jul 21, 2023

feat: Initial partitioned execution #528

Merged

Testing: Write loom tests for scheduling and add to CI #544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Partitioned execution #409

feat: Partitioned execution #409

bjchambers commented Jun 1, 2023 •

edited

Loading

bjchambers commented Jun 1, 2023

feat: Partitioned execution #409

feat: Partitioned execution #409

Comments

bjchambers commented Jun 1, 2023 • edited Loading

bjchambers commented Jun 1, 2023

bjchambers commented Jun 1, 2023 •

edited

Loading