Loading in large file sets with multi-core processing and 'iris.load' #4406
-
Hi everyone, I have a question regarding splitting My question: I'd like to reduce this time by using multiple processors (currently using 1, but have access to more). I've looked at parallelisation but I don't think that's entirely appropriate as I'm trying to create a singular large object, rather than multiple smaller processes (but please correct me if I've understood that wrong!). Can anyone recommend a way of increasing the processor count to use on a single command? Bit more in-depth detail: Happy to provide more context if needed! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 2 replies
-
Hi @dannymcculloch, are you certain this isn't already happening? Iris is built on top of the Dask package, which in most (🤞) cases takes care of multi-processing automatically. I've just tested on some UM files, merging them together in Iris, and all my processors are engaged while this is happening. Given this use of Dask, I'd advise extreme caution if you're thinking of adding your own layer of parallelism as they can interact unpredictably (with both layers attempting to engage all processors, giving you If you need your script(s) to run faster, you should therefore hopefully be able to just throw more processors at the problem without any further work. There are also opportunities for optimisation with Python, depending on your specific operations:
|
Beta Was this translation helpful? Give feedback.
-
Archiving "answered" Q+As |
Beta Was this translation helpful? Give feedback.
-
Sorry @pp-mo, but would you be able to post a link to the relevant Q&A here please? |
Beta Was this translation helpful? Give feedback.
Hi @dannymcculloch, are you certain this isn't already happening? Iris is built on top of the Dask package, which in most (🤞) cases takes care of multi-processing automatically. I've just tested on some UM files, merging them together in Iris, and all my processors are engaged while this is happening.
Given this use of Dask, I'd advise extreme caution if you're thinking of adding your own layer of parallelism as they can interact unpredictably (with both layers attempting to engage all processors, giving you
n-processes
squared!).If you need your script(s) to run faster, you should therefore hopefully be able to just throw more processors at the problem without any further work.
There ar…