-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better parallelization using Dask? #15
Labels
enhancement
New feature or request
Comments
Yup, can you post an example? I also started looking at shared memory arrays for something else but only works in linux |
We've been using it to do operations in chunks, like this: import numpy as np
from dask import delayed, compute
# This decorator tells dask to delay computation of this function
@delayed
def sumnums(arr, num):
arr = arr + num
return arr
arr = np.zeros([20, 20])
res = []
for chunk in np.split(arr, 2):
# This loop creates a list of 'delayed' functions
res.append(sumnums(chunk, 1))
temp = compute(res) # Here we tell dask to actually do the calc, in parallel
new_arr = np.vstack(temp[0]) # Now we massage the result back into an array
assert np.all(new_arr.shape == arr.shape) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We've had some great success with dask for simple single-machine parallelization. I think the current version uses something a bit problematic right, requiring the script to be in a main section right? Dask does not require any of this.
The text was updated successfully, but these errors were encountered: