Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for SLURM? #33

Open
flatstik opened this issue May 7, 2022 · 10 comments
Open

Support for SLURM? #33

flatstik opened this issue May 7, 2022 · 10 comments
Labels
enhancement New feature or request

Comments

@flatstik
Copy link

flatstik commented May 7, 2022

Is anybody working with SLURM implementation of openmopac for using clusters (boost + openmpi support)

@godotalgorithm
Copy link
Collaborator

I don't think SLURM itself is relevant to this question. This seems to be more a question of how MOPAC can be used effectively in multi-core and multi-node environments, which are often accessed through workload management systems such as SLURM.

The historical development of MOPAC has not focused much on adapting to computing hardware beyond personal computers. As a result, MOPAC's single-core performance is very good, and the simple way to maximize computational throughput with MOPAC is to run jobs on 1 core and fill your computing resources with such 1-core jobs. This simple strategy can run into problems if memory limitations prevent the desired number of simultaneous MOPAC jobs from running. In this case, the subset of dense linear algebra operations that are run as BLAS/LAPACK operations are threaded to improve the performance of the dense linear algebra bottleneck when there are multiple local cores on a machine per job (controlled by OMP_NUM_THREADS and/or MKL_NUM_THREADS as with other OpenMP-based and/or MKL-based multithreading).

MPI-based distributed memory is not presently supported by MOPAC. We are tentatively planning to support partial memory distribution in the future if the eventual dense linear algebra refactoring of MOPAC is successful. Specifically, we plan to isolate and wrap all dense matrix operations and memory locations, which would make it easier to switch between different hardware implementations and memory footprints of the dense linear algebra. Such an abstraction would make it possible to distribute the dense matrices in MOPAC using MPI & BLACS, which are usually the leading-order memory bottleneck, and also to support other memory models such as pinned GPU memory which is a now-standard feature of modern GPGPU programming. There are presently no plans for distributing memory in MOPAC beyond its dense linear algebra.

@flatstik
Copy link
Author

flatstik commented May 18, 2022

Support for multi-core and multi-node environments would be great. The other programs are way too expensive and I'd like to see openmopac as alternative to those pricey closed source programs.. Also the GPU support for this is kind of mandatory to have...

@susilehtola
Copy link
Collaborator

xtb is open source and is OpenMP parallellized.

@flatstik
Copy link
Author

Sure, but it's not nearly as good as openmopac for various reasons

@godotalgorithm
Copy link
Collaborator

I appreciate this ongoing discussion, and I think it might lead to other interesting follow-on discussions (e.g. the relative merits/capabilities of MOPAC and xTB), but a GitHub Issue might not be the best place for such open-ended discussions. I recently activated the GitHub Discussions board for MOPAC, and I would encourage conversations to branch there as appropriate.

To follow-up on topic, what are specific examples of multi-core or multi-node functionality in commercial, close-source software that MOPAC should seek to replicate and/or consider as a point of reference? Do you have EMPIRE in mind, or are there other distributed semiempirical codes, or are you referring more generally to some other non-semiempirical quantum chemistry code?

@flatstik
Copy link
Author

flatstik commented May 20, 2022

Not so much of specific examples, but for example DP5 uses Tinker and Gaussian for optimizing geometries of ligands. Most of my MOZYME runs take over month to achieve SCF field. Week for large PM7 optimizations -- and I do not want to use PM6 or MNDO or whatever oldies are present in most of the programs available. If cluster or "supercomputer" is available - wouldn't you be interested of doing your computations in days rather than in months?

As a side note: I asked (vainly) from Jimmy to implement function to Jimmy for NMR-prediction in specific solvent (which can be done with optimizing the compound first with EPS) He didn't respond to that at all.

@davidoskky
Copy link

Is this linear algebra parallelization actually useful or should I simply discard it and run single threaded?
I have compiled mopac with openmp and imkl support.
I have run an hydrogen position optimization using mozyme on a large protein with different parameters to see what would happen.
I have set separately OMP_NUM_THREADS, MKL_NUM_THREADS and the THREADS keyword to 1, 8 and 16.

In my testing, running single threaded was faster than running multithreaded every time. I'll point out that the number of processes started was not equal to the number of threads (I guess maybe 16 threads were just too many) and that out of all those only one was actually using the cpu during most of the computation. There was plenty of memory available, so that should not be the bottleneck.

@godotalgorithm
Copy link
Collaborator

The multi-threading only affects the performance of conventional MOPAC calculations, not MOZYME. You should see performance differences in conventional calculations when you change the number of threads on a multi-core processor, and please report it if you do not. The implementation of MOZYME is purely serial and based on nonstandard sparse linear algebra operations - it does not use any dense linear algebra operations that would benefit from multi-threaded math libraries. The underlying algorithm of MOZYME is conceptually amenable to parallelization because of its spatial localization, but introducing threaded programming into MOPAC itself would be a major undertaking and there is presently no development support that would justify it.

@davidoskky
Copy link

Thank you, I will conduct further testing to understand the feasibility of working with such large systems without using MOZYME with parallelization.
If that is not possible, I'm afraid I will have to fall back to Gaussian for my calculations.

@godotalgorithm
Copy link
Collaborator

What range of problem sizes are you working with? Despite being single-core calculations, MOZYME calculations are usually faster than conventional, multi-threaded MOPAC calculations beyond a few hundred atoms. You can still make productive use of multiple cores with MOZYME by running multiple single-core jobs simultaneously.

In reverting to using Gaussian, are you switching from full-protein simulations to extracting small QM regions? You may want to consider a middle ground where you use MOZYME for large QM regions extracted from the full protein, with the artificially terminated residues clamped by geometric constraints. This would be faster than calculating the full protein, and give you better control over finite-size effects than a much smaller QM region, although sufficiently small QM regions enable the use of higher levels of theory than semiempirical models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants