-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiprocessing of operations on cubes through decomposition and recombination #4752
Comments
It seems most of the goals here would best be achieved by better supporting the |
I think that is not really the point of this ticket.. So, we'd like a generic operation that can take a cube function and parallelise the operation as stated. Simple element-wise cube calculations can be achieved using 'map_blocks', and plenty of users have already used that themselves. As it's not very tricky, we haven't yet moved to provide any generic facility to convert a cube function in that way. ( IMHO : the possible saving of intermediate results to disk, and even the 'splitting' (like a rechunk) are maybe separable operations, and not really core to the main task here. |
Perhaps I was thrown off by the use of multiprocessing. So to be clear, you don't mean |
I don't think so. |
I meant "multiprocessing" in the sense of "having work done by multiple processes", with no intention of implying a specific library |
Another point : It seems logical to me that this is best provided as a "wrapper" to Dask.map_overlap. So, that approach could be very capable, but totally Dask-specific. From offline discussion with @bjlittle, rather than just cave in and say "Iris lazy is really just Dask", we might instead choose to have this facility live somewhere else, i.e. not in "core Iris". ( Actually there is also special-case benefit, since the expected partners here would rather run this with a release-version Iris ) |
I'll be interested to see what you come up with. From my personal experience, it seems that almost any non-trivial application will require direct dask programming for the core array parts, while the metadata part would benefit from not being split up since the individual parts will usually have the same units etc. |
In order to maintain a backlog of relevant issues, we automatically label them as stale after 500 days of inactivity. If this issue is still important to you, then please comment on this issue and the stale label will be removed. Otherwise this issue will be automatically closed in 28 days time. |
Just re-reviewing this in context of discussion with @hdyson et al ... Personally, I think this idea only flies if we can provide a generalised operation that is easier for the user than just "pull out the arrays and use Dask yourself". So I don't think we can realistically hope to just "solve" that problem... |
Discussed offline with @trexfeathers @pp-mo @hdyson. Something to make clear publicly - our prioritisation is currently driven by:
With the resource we have available, there are inevitably plenty of perfectly valid issues that just can't be addressed within a reasonable time frame, hence the
Stale
|
✨ Feature Request
Iris could offer a clear, tested, robust method for multiprocessing cubes by splitting them into pieces, multiprocessing the pieces and then stitching them back together.
Motivation
Working with a team who use Iris on their code aimed at doing exactly this, it didn't feel like a use case specific to their problems but something more general.
Additional context
Requirements:
Ideas:
The text was updated successfully, but these errors were encountered: