Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to efficiently run in parallel #17

Open
ma-sadeghi opened this issue Apr 28, 2020 · 11 comments
Open

How to efficiently run in parallel #17

ma-sadeghi opened this issue Apr 28, 2020 · 11 comments

Comments

@ma-sadeghi
Copy link
Member

ma-sadeghi commented Apr 28, 2020

Hey @TomTranter,

I'm trying to run pytrax in parallel. I already put my script body inside a block:

if __name__ == "__main__":
    # Body

However, I don't get significant speed up when changing the num_proc argument. The image I'm running the simulation on is roughly 200^3 voxels and I use 100,000 walkers and 1,000 time steps. Here are the run times for num_proc = [1, 2, 4, 8] (the machine has 8 physical cores):

Elapsed time in seconds: 33.01
Elapsed time in seconds: 33.29
Elapsed time in seconds: 27.83
Elapsed time in seconds: 25.13
@TomTranter
Copy link
Collaborator

1000 time steps isn't very much so there's a bit of overhead to increasing processors and you should see better gains for longer simulations

@jgostick
Copy link
Member

Indeed, 1000 time steps won't even give you valid results...should be 100,000 or more right?

@TomTranter
Copy link
Collaborator

TomTranter commented Apr 28, 2020

I guess that's a bit of trial and error but certainly 1000 isn't enough even for a relatively small image. Each step is along one axis only so right away you're down to 333 for each direction and your image is around that size. You can plot the msd and increase steps until it straightens - also be careful of walkers getting stuck at the edges as well as in blind pores as when they leave the image they travel in a reflected copy of the image. You want to make sure you are only probing the largest fully connected cluster of voxels really - bit of a limitation

@ma-sadeghi
Copy link
Member Author

It doesn't let me to go that far. Here's the output for nw=10,000 and nt=20,000:

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/process.py", line 205, in _sendback_result
    exception=exception))
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/queues.py", line 364, in put
    self._writer.send_bytes(obj)
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/multiprocessing/connection.py", line 393, in _send_bytes
    header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "effective_prop.py", line 103, in <module>
    tau_rw = calc_tau_rw(im=crop(void, frac=0.1), nt=20000, nw=10000, ax=1, num_proc=nproc)
  File "effective_prop.py", line 20, in calc_tau_rw
    rw.run(nt=nt, nw=nw, same_start=False, stride=1, num_proc=num_proc)
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/site-packages/pytrax/__RandomWalk__.py", line 294, in run
    mapped_coords = list(pool.map(self._run_walk, batches))
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/process.py", line 483, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 598, in result_iterator
    yield fs.pop().result()
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 435, in result
    return self.__get_result()
  File "/opt/anaconda3/envs/pmeal/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
struct.error: 'i' format requires -2147483648 <= number <= 2147483647

@jgostick
Copy link
Member

seems like an 'int' overflow issue. That number is exactly (2^32)/2.

@ma-sadeghi
Copy link
Member Author

This the best that I could get: nw=10,000 and nt=10,000 (num_procs = [1, 2, 4, 8])

Elapsed time in seconds: 19.82
Elapsed time in seconds: 22.27
Elapsed time in seconds: 21.29
Elapsed time in seconds: 18.77

@TomTranter
Copy link
Collaborator

TomTranter commented Apr 29, 2020 via email

@ma-sadeghi
Copy link
Member Author

Thanks @TomTranter. It's much better with stride=10 or even stride=100. Here's the result for nt=100,000, nw=10,000, stride=100:

Elapsed time in seconds: 112.59
Elapsed time in seconds: 66.02
Elapsed time in seconds: 48.41
Elapsed time in seconds: 45.39

@TomTranter
Copy link
Collaborator

It's hard to profile multiprocessed code but the fact that stride makes a big difference would suggest that the data transfer is slowing it down not the computation. I have experimented with shared memory arrays in some other code which may solve this problem. Alternatively it may be time to overhaul the multiprocessing backend and look at dask as @jgostick suggests

@pppppink
Copy link

Does the particle's stride affect the accuracy of calculating the tortuosity, and if so, to what extent? Additionally, does this parallel program require the addition of the multiprocessing library and its corresponding code to run, besides the num_proc parameter? I have also noticed that increasing num_proc does not speed up the computation. My particles have dimensions (31500, 28000), and I would like to have more if possible.

@TomTranter
Copy link
Collaborator

I haven't looked at this code for a while but I think stride is just for reporting so shouldn't affect accuracy. multiprocessing is a standard python library so you should have it already. There's some set up involved though so doesn't speed up small simulations and parallelizes by walkers not by time so if you are running long simulations with few walkers will make no difference

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants