Better parallelization using Dask? #15

jgostick · 2020-04-19T19:05:39Z

We've had some great success with dask for simple single-machine parallelization. I think the current version uses something a bit problematic right, requiring the script to be in a main section right? Dask does not require any of this.

TomTranter · 2020-04-19T22:19:52Z

Yup, can you post an example? I also started looking at shared memory arrays for something else but only works in linux

jgostick · 2020-04-20T01:30:23Z

We've been using it to do operations in chunks, like this:

import numpy as np
from dask import delayed, compute

# This decorator tells dask to delay computation of this function
@delayed
def sumnums(arr, num):
    arr = arr + num
    return arr


arr = np.zeros([20, 20])
res = []
for chunk in np.split(arr, 2):
    # This loop creates a list of 'delayed' functions
    res.append(sumnums(chunk, 1))
temp = compute(res)  # Here we tell dask to actually do the calc, in parallel
new_arr = np.vstack(temp[0])  # Now we massage the result back into an array
assert np.all(new_arr.shape == arr.shape)

jgostick added the enhancement New feature or request label Apr 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better parallelization using Dask? #15

Better parallelization using Dask? #15

jgostick commented Apr 19, 2020

TomTranter commented Apr 19, 2020

jgostick commented Apr 20, 2020 •

edited

Loading

Better parallelization using Dask? #15

Better parallelization using Dask? #15

Comments

jgostick commented Apr 19, 2020

TomTranter commented Apr 19, 2020

jgostick commented Apr 20, 2020 • edited Loading

jgostick commented Apr 20, 2020 •

edited

Loading