rmcdhf_mpi performance issue #98

mkkarlsen · 2023-01-12T10:03:14Z

Hi.

I am experiencing some poor performance when running rmcdhf_mpi.

I run on a single node with multiple tasks per node. Performance seems worse the more tasks I reserve (not the case for rangular and rci). I have for example recorded 61 min/iteration when using 10 tasks, and then <18 min/iteration when using 4 tasks in an otherwise identical run.

I have set

export MPI_TMP="/cluster/home/username/Grasp/workdir/tmp_mpi"

and the directory "tmp_mpi" is created within the working directory with the subdirectories "000", "001".. etc.

The poorer performance with more processing power leads me to believe that the read speed is the limiting factor.

Any ideas what I can do to fix this?

Thanks,
Martin

jongrumer · 2023-01-12T11:32:11Z

Hi Martin,

Yes, the read speed is indeed one of the most limiting factors, and the self-consistent process is in general tricky parallelize, with many bottlenecks. While e.g. rangular or rci is trivial since the matrix is set up once (and then diagonalized in the case of ci). You can try the new rmcdhf_mem_mpi if you have enough RAM to store the MCP files.

Cheers,
Jon

AnjaApp · 2023-01-12T12:09:20Z

Hi, I would like to add the tip to try out your calculations on a machine that uses an SSD for memory storage. HDDs experience a major drop in the read speed for larger calculations. I was able to carry out calculations with GRASP, which required up to 20 Tb of storage on an SSD and still maintained 100% CPU usage due to the fast read speed.
You could also try to converge the input wave function as best as possible before you start your calculation to reduce the number of needed iterations.
Edit: You could also try out the ZF method to reduce the work load (see manual p.299)

jongrumer · 2023-01-12T14:46:49Z

Thanks @AnjaApp, great advice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rmcdhf_mpi performance issue #98

rmcdhf_mpi performance issue #98

mkkarlsen commented Jan 12, 2023

jongrumer commented Jan 12, 2023

AnjaApp commented Jan 12, 2023 •

edited

Loading

jongrumer commented Jan 12, 2023

rmcdhf_mpi performance issue #98

rmcdhf_mpi performance issue #98

Comments

mkkarlsen commented Jan 12, 2023

jongrumer commented Jan 12, 2023

AnjaApp commented Jan 12, 2023 • edited Loading

jongrumer commented Jan 12, 2023

AnjaApp commented Jan 12, 2023 •

edited

Loading