Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rmcdhf_mpi performance issue #98

Open
mkkarlsen opened this issue Jan 12, 2023 · 3 comments
Open

rmcdhf_mpi performance issue #98

mkkarlsen opened this issue Jan 12, 2023 · 3 comments

Comments

@mkkarlsen
Copy link

Hi.

I am experiencing some poor performance when running rmcdhf_mpi.

I run on a single node with multiple tasks per node. Performance seems worse the more tasks I reserve (not the case for rangular and rci). I have for example recorded 61 min/iteration when using 10 tasks, and then <18 min/iteration when using 4 tasks in an otherwise identical run.

I have set

export MPI_TMP="/cluster/home/username/Grasp/workdir/tmp_mpi"

and the directory "tmp_mpi" is created within the working directory with the subdirectories "000", "001".. etc.

The poorer performance with more processing power leads me to believe that the read speed is the limiting factor.

Any ideas what I can do to fix this?

Thanks,
Martin

@jongrumer
Copy link
Member

Hi Martin,

Yes, the read speed is indeed one of the most limiting factors, and the self-consistent process is in general tricky parallelize, with many bottlenecks. While e.g. rangular or rci is trivial since the matrix is set up once (and then diagonalized in the case of ci). You can try the new rmcdhf_mem_mpi if you have enough RAM to store the MCP files.

Cheers,
Jon

@AnjaApp
Copy link

AnjaApp commented Jan 12, 2023

Hi, I would like to add the tip to try out your calculations on a machine that uses an SSD for memory storage. HDDs experience a major drop in the read speed for larger calculations. I was able to carry out calculations with GRASP, which required up to 20 Tb of storage on an SSD and still maintained 100% CPU usage due to the fast read speed.
You could also try to converge the input wave function as best as possible before you start your calculation to reduce the number of needed iterations.
Edit: You could also try out the ZF method to reduce the work load (see manual p.299)

@jongrumer
Copy link
Member

Thanks @AnjaApp, great advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants