Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly support the Intel compiler with the CMake build #68

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

mortenpi
Copy link
Member

@mortenpi mortenpi commented Jun 2, 2021

(This only affects the CMake build)

Currently, we always pass -fno-automatic as a compiler flag, even if the user adds their own flags (by setting CMAKE_Fortran_FLAGS). This is a problem for e.g. ifort which has a different name for that flag.

With this change, if the user decides to customize the flags by passing their own CMAKE_Fortran_FLAGS, we no longer set -fno-automatic automatically, which solves that problem. The only thing to note though is that the user then needs to explicitly pass -fno-automatic.

Question to anyone who might know this: do we actually need -fno-automatic for GRASP? It changes the way SAVE attributes are handled.. but is there any part in GRASP that actually requires this flag?

Fix #68

@mortenpi mortenpi requested a review from jongrumer June 2, 2021 05:28
CMakeLists.txt Outdated Show resolved Hide resolved
@mortenpi mortenpi marked this pull request as draft June 2, 2021 05:41
@jongrumer
Copy link
Member

I'm pretty sure that flag is (or at least was) needed, but don't remember on top of my head why.

@cffischer
Copy link
Member

The -fno-automatic flag was needed because early FORTRAN codes always saved values when a routine was exited whereas F90 does not. I suspect the need is reduced but I am not sure it has been tested. Flags always depend on the compiler.

@mortenpi
Copy link
Member Author

mortenpi commented Jun 6, 2021

Alright, new approach (since checking whether the user has modified CMAKE_Fortran_FLAGS wasn't reliable):

  • We still automatically append -fno-automatic to CMAKE_Fortran_FLAGS if we detect that it's gfortran.
  • With ifort we append -save instead
  • Other compilers will print a warning and won't append anything automatically.
  • You can disable the automatic append completely by passing -DGRASP_DEFAULT_FLAGS=FALSE to cmake

@jongrumer could you check that this does the right thing in a live ifort environment?

@mortenpi mortenpi marked this pull request as ready for review June 6, 2021 02:36
Copy link
Member

@jongrumer jongrumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me!

@jongrumer
Copy link
Member

jongrumer commented Jun 8, 2021

Ok - I know this should be in a separate PR, but to speed things up a bit - I added the mkdir fix we found in mpi90/sys_mkdir and also included the -mkl flag in the default ifortran flags in CMakeLists.txt to turn on MKL. With the new freely available ifort, now also including MPI and MKL (!), this is of course the way to do it if one is using ifort. Just make sure you install both the Base kit and the HPC kit (the former contains MKL and the latter includes the compiler and MPI). Just remove these two commits if you (@mortenpi) think this is completely out of line. Will be interesting to see if there are any speedups when running with just intel all the way. A quick test is given further below.

Intel Ifort + MPI/MKL (HPC) instructions for Linux: https://software.intel.com/content/www/us/en/develop/documentation/installation-guide-for-intel-oneapi-toolkits-linux/top/installation/install-using-package-managers/apt.html#apt_PACKAGES

Mac and Windows have to download executables (note that the Mac version does not seem to ship with MPI).

Compiling with Cmake, and using the new Intel Ifort API kits (Base + HPC), including the -mkl flag above via the addition to CMakeList.txt, I get the following linked libraries for e.g. rmcdhf-mpi. Seems sort of fine, but I'm not entirely sure why e.g. libgfortran.so.4 and openblas is still in there...needs further investigations.

ldd rmcdhf_mpi
	linux-vdso.so.1 (0x00007ffd2f969000)
	libmkl_intel_lp64.so.1 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so.1 (0x0000151c141f3000)
	libmkl_intel_thread.so.1 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_thread.so.1 (0x0000151c108fe000)
	libmkl_core.so.1 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.so.1 (0x0000151c07363000)
	libiomp5.so => /opt/intel/oneapi/compiler/2021.2.0/linux/compiler/lib/intel64_lin/libiomp5.so (0x0000151c06f4c000)
	libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x0000151c06cf1000)
	liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x0000151c0646b000)
	libmpifort.so.12 => /opt/intel/oneapi/mpi/2021.2.0//lib/libmpifort.so.12 (0x0000151c060ad000)
	libmpi.so.12 => /opt/intel/oneapi/mpi/2021.2.0//lib/release/libmpi.so.12 (0x0000151c04de7000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000151c04be3000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x0000151c049db000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000151c047bc000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000151c0441e000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000151c0402d000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x0000151c03e15000)
	/lib64/ld-linux-x86-64.so.2 (0x0000151c14f58000)
	libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x0000151c01b6f000)
	libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x0000151c01790000)
	libfabric.so.1 => /opt/intel/oneapi/mpi/2021.2.0//libfabric/lib/libfabric.so.1 (0x0000151c0154a000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x0000151c01303000)

And this is what it looks like for a gfortran/openMPI build (no surprises, GNU all the way)

ldd rmcdhf_mpi
	linux-vdso.so.1 (0x00007ffe37bd4000)
	libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x0000149863537000)
	liblapack.so.3 => /usr/lib/x86_64-linux-gnu/liblapack.so.3 (0x0000149862cb1000)
	libmpi_mpifh.so.20 => /usr/lib/x86_64-linux-gnu/libmpi_mpifh.so.20 (0x0000149862a5a000)
	libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x000014986267b000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00001498622dd000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00001498620c5000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000149861cd4000)
	libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x000014985fa2e000)
	libmpi.so.20 => /usr/lib/x86_64-linux-gnu/libmpi.so.20 (0x000014985f73c000)
	libopen-pal.so.20 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.20 (0x000014985f48a000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000014985f26b000)
	libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x000014985f024000)
	/lib64/ld-linux-x86-64.so.2 (0x0000149863a71000)
	libopen-rte.so.20 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.20 (0x000014985ed9c000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x000014985eb94000)
	libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x000014985e957000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000014985e753000)
	libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x000014985e550000)
	libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x000014985e345000)
	libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x000014985e13b000)

Quick test case -- IN PROGRESS!
A simple RMCDHF_MPI + RCI_MPI + TRANSITIONS_MPI (8 processes) run on OI 2p4 with SDT excitations from 2p4 only, first layer (3s,3p,3d,4f,5g,6h) with a reduction in the RMCDHF run and full list in RCI, gives with the two setups above the following timings (time stamps are given when the individual program is started + total exec time at the end).

ifort (-O3 -save -mkl) + MPI + MKL and using Intel mpirun
---------------
      LAYER: as1
 NEW SHELLS: 3s,3p,3d,4f,5g,6h
  OPTIMIZED: 3s* 3p* 3d* 4f* 5g* 6h*

 == Tue Jun  8 15:55:32 CEST 2021 == rcsfgenerate
 == Tue Jun  8 15:55:33 CEST 2021 == rangular
 == Tue Jun  8 15:55:33 CEST 2021 == rwfnestimate
 == Tue Jun  8 15:55:33 CEST 2021 == rmcdhf (Iteration number  11)
 == Tue Jun  8 15:55:35 CEST 2021 == rci
 == Tue Jun  8 15:55:46 CEST 2021 == jj2lsj
 == Tue Jun  8 15:55:47 CEST 2021 == rtransition
 == Tue Jun  8 15:55:56 CEST 2021 == done
 
Total Execution time - 0 hours 0 min 25 sec
 
gfortran-9 (-O3 -fno-automatic) + OpenMPI and using GNU mpirun
---------------------
      LAYER: as1
 NEW SHELLS: 3s,3p,3d,4f,5g,6h
  OPTIMIZED: 3s* 3p* 3d* 4f* 5g* 6h*

 == Tue Jun  8 15:50:11 CEST 2021 == rcsfgenerate + rcsfinteract
 == Tue Jun  8 15:50:12 CEST 2021 == rangular
 == Tue Jun  8 15:50:12 CEST 2021 == rwfnestimate
 == Tue Jun  8 15:50:12 CEST 2021 == rmcdhf (Iteration number  11)
 == Tue Jun  8 15:50:27 CEST 2021 == rci
 == Tue Jun  8 15:51:10 CEST 2021 == jj2lsj
 == Tue Jun  8 15:51:10 CEST 2021 == rtransition
 == Tue Jun  8 15:51:18 CEST 2021 == done
 
Total Execution time - 0 hours 1 min 8 sec

@mortenpi
Copy link
Member Author

mortenpi commented Jun 9, 2021

Ok, this is actually cool. With a proper Intel ifort+MKL+MPI installation, just doing

FC=ifort BLA_VENDOR=Intel10_64lp_seq ./configure.sh

seems to automatically configure a CMake build that uses MKL (via FindBLAS) and also links against the Intel MPI.

I am not quite sure that adding -mkl is the right way to go. If you don't specify BLA_VENDOR=Intel10_64lp_seq, FindBLAS will still try to link against the system OpenBLAS (if available). Maybe the more correct thing would be to set BLA_VENDOR if we detect the Intel compiler?

@mortenpi mortenpi changed the title Allow overriding default compiler flag in CMake Properly support the Intel compiler with the CMake build Jun 9, 2021
@jongrumer
Copy link
Member

jongrumer commented Jun 9, 2021

Ok great! I'll try setting BLA_VENDOR then, but seems unreasonably complicated...but we still need to set -mkl to make sure MKL is used also for all the other things, or what are your thoughts there? Just remembered that there might be an mklvars.sh that should be sourced...at least there used to be something like that.

EDIT: Seems like -mkl should be enough, at least if you properly sourced the source /opt/intel/oneapi/setvars.sh - https://software.intel.com/content/www/us/en/develop/articles/using-mkl-in-intel-compiler-mkl-qmkl-options.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants