How to efficiently run in parallel #17

ma-sadeghi · 2020-04-28T14:21:05Z

I'm trying to run pytrax in parallel. I already put my script body inside a block:

if __name__ == "__main__":
    # Body

However, I don't get significant speed up when changing the num_proc argument. The image I'm running the simulation on is roughly 200^3 voxels and I use 100,000 walkers and 1,000 time steps. Here are the run times for num_proc = [1, 2, 4, 8] (the machine has 8 physical cores):

Elapsed time in seconds: 33.01
Elapsed time in seconds: 33.29
Elapsed time in seconds: 27.83
Elapsed time in seconds: 25.13

The text was updated successfully, but these errors were encountered:

TomTranter · 2020-04-28T15:18:56Z

1000 time steps isn't very much so there's a bit of overhead to increasing processors and you should see better gains for longer simulations

jgostick · 2020-04-28T15:39:34Z

Indeed, 1000 time steps won't even give you valid results...should be 100,000 or more right?

TomTranter · 2020-04-28T15:52:20Z

I guess that's a bit of trial and error but certainly 1000 isn't enough even for a relatively small image. Each step is along one axis only so right away you're down to 333 for each direction and your image is around that size. You can plot the msd and increase steps until it straightens - also be careful of walkers getting stuck at the edges as well as in blind pores as when they leave the image they travel in a reflected copy of the image. You want to make sure you are only probing the largest fully connected cluster of voxels really - bit of a limitation

ma-sadeghi · 2020-04-28T18:01:49Z

It doesn't let me to go that far. Here's the output for nw=10,000 and nt=20,000:

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/process.py", line 205, in _sendback_result
    exception=exception))
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/queues.py", line 364, in put
    self._writer.send_bytes(obj)
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "effective_prop.py", line 103, in <module>
    tau_rw = calc_tau_rw(im=crop(void, frac=0.1), nt=20000, nw=10000, ax=1, num_proc=nproc)
  File "effective_prop.py", line 20, in calc_tau_rw
    rw.run(nt=nt, nw=nw, same_start=False, stride=1, num_proc=num_proc)
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/site-packages/pytrax/__RandomWalk__.py", line 294, in run
    mapped_coords = list(pool.map(self._run_walk, batches))
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

jgostick · 2020-04-28T18:05:06Z

seems like an 'int' overflow issue. That number is exactly (2^32)/2.

ma-sadeghi · 2020-04-28T18:08:07Z

This the best that I could get: nw=10,000 and nt=10,000 (num_procs = [1, 2, 4, 8])

Elapsed time in seconds: 19.82
Elapsed time in seconds: 22.27
Elapsed time in seconds: 21.29
Elapsed time in seconds: 18.77

TomTranter · 2020-04-29T07:43:32Z

What about if you increase stride.

…

On Tue, 28 Apr 2020, 19:08 Amin Sadeghi, ***@***.***> wrote: This the best that I could get: nw=10,000 and nt=10,000 (num_procs = [1, 2, 4, 8]) Elapsed time in seconds: 19.82 Elapsed time in seconds: 22.27 Elapsed time in seconds: 21.29 Elapsed time in seconds: 18.77 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#17 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABV5YRMCAE45ZIP3TSW57ELRO4LRLANCNFSM4MS4AFEA> .

ma-sadeghi · 2020-04-29T22:57:32Z

Thanks @TomTranter. It's much better with stride=10 or even stride=100. Here's the result for nt=100,000, nw=10,000, stride=100:

Elapsed time in seconds: 112.59
Elapsed time in seconds: 66.02
Elapsed time in seconds: 48.41
Elapsed time in seconds: 45.39

TomTranter · 2020-09-25T10:37:51Z

It's hard to profile multiprocessed code but the fact that stride makes a big difference would suggest that the data transfer is slowing it down not the computation. I have experimented with shared memory arrays in some other code which may solve this problem. Alternatively it may be time to overhaul the multiprocessing backend and look at dask as @jgostick suggests

pppppink · 2023-07-18T15:04:56Z

Does the particle's stride affect the accuracy of calculating the tortuosity, and if so, to what extent? Additionally, does this parallel program require the addition of the multiprocessing library and its corresponding code to run, besides the num_proc parameter? I have also noticed that increasing num_proc does not speed up the computation. My particles have dimensions (31500, 28000), and I would like to have more if possible.

TomTranter · 2023-07-18T15:18:05Z

I haven't looked at this code for a while but I think stride is just for reporting so shouldn't affect accuracy. multiprocessing is a standard python library so you should have it already. There's some set up involved though so doesn't speed up small simulations and parallelizes by walkers not by time so if you are running long simulations with few walkers will make no difference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to efficiently run in parallel #17

How to efficiently run in parallel #17

ma-sadeghi commented Apr 28, 2020 •

edited

Loading

TomTranter commented Apr 28, 2020

jgostick commented Apr 28, 2020

TomTranter commented Apr 28, 2020 •

edited

Loading

ma-sadeghi commented Apr 28, 2020

jgostick commented Apr 28, 2020

ma-sadeghi commented Apr 28, 2020

TomTranter commented Apr 29, 2020 via email

ma-sadeghi commented Apr 29, 2020

TomTranter commented Sep 25, 2020

pppppink commented Jul 18, 2023

TomTranter commented Jul 18, 2023

How to efficiently run in parallel #17

How to efficiently run in parallel #17

Comments

ma-sadeghi commented Apr 28, 2020 • edited Loading

TomTranter commented Apr 28, 2020

jgostick commented Apr 28, 2020

TomTranter commented Apr 28, 2020 • edited Loading

ma-sadeghi commented Apr 28, 2020

jgostick commented Apr 28, 2020

ma-sadeghi commented Apr 28, 2020

TomTranter commented Apr 29, 2020 via email

ma-sadeghi commented Apr 29, 2020

TomTranter commented Sep 25, 2020

pppppink commented Jul 18, 2023

TomTranter commented Jul 18, 2023

ma-sadeghi commented Apr 28, 2020 •

edited

Loading

TomTranter commented Apr 28, 2020 •

edited

Loading