Skip to content

qwhex/orc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

orc

multiprocessing data pipeline orchestrator

Experimental! Work in progress.

This package helps you to orchestrate compute-heavy data transformations using multiprocessing, offloading the work to separate Python processes.

Usage

Map

.map(map_fn) applies map_fn on each value in the pipeline.

import itertools
from orc import Pipeline

# initialize pipeline
pipeline = (Pipeline()
            .map(lambda x: x * 2)
            .map(lambda x: x + 1))

# create an infinite stream of numbers: 0, 1, 2...
stream = itertools.count()

# process results, 16 at a time
for item in pipeline.run(input_stream=stream, num_processes=16):
    print(item)

Reduce

.reduce(reduce_fn, init_fn) collects all items and applies reduce_fn to all.

from orc import Pipeline

def accumulate(x: int, agg: int):
    return x + agg, lambda prev: prev + x

pipeline = (Pipeline()
            .map(lambda x: x * 2)
            .map(lambda x: x + 1)
            .reduce(accumulate, lambda: 0))

Authors

All Rights Reserved
Copyright (c) 2024 Mice Pápai

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages