orc

multiprocessing data pipeline orchestrator

Experimental! Work in progress.

This package helps you to orchestrate compute-heavy data transformations using multiprocessing, offloading the work to separate Python processes.

Usage

Map

.map(map_fn) applies map_fn on each value in the pipeline.

import itertools
from orc import Pipeline

# initialize pipeline
pipeline = (Pipeline()
            .map(lambda x: x * 2)
            .map(lambda x: x + 1))

# create an infinite stream of numbers: 0, 1, 2...
stream = itertools.count()

# process results, 16 at a time
for item in pipeline.run(input_stream=stream, num_processes=16):
    print(item)

Reduce

.reduce(reduce_fn, init_fn) collects all items and applies reduce_fn to all.

from orc import Pipeline

def accumulate(x: int, agg: int):
    return x + agg, lambda prev: prev + x

pipeline = (Pipeline()
            .map(lambda x: x * 2)
            .map(lambda x: x + 1)
            .reduce(accumulate, lambda: 0))

Authors

All Rights Reserved
Copyright (c) 2024 Mice Pápai

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
orc.egg-info		orc.egg-info
orc		orc
tests		tests
.coveragerc		.coveragerc
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
req.txt		req.txt
req_dev.txt		req_dev.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

orc

Usage

Map

Reduce

Authors

About

Releases

Packages

Languages

License

qwhex/orc

Folders and files

Latest commit

History

Repository files navigation

orc

Usage

Map

Reduce

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages