POT3D v4.0.0:

- Added ability to compile with the NVIDIA cuSparse library for ~2x performance boost when using NVIDIA GPUs. See the new build.sh for details. - Converted most OpenACC `do` loops to `do concurrent`. Requires adding the `-stdpar=gpu` flag for the nvfortran compiler. - Added/updated some python scripts and updated README and build instructions.
predsci · Mar 3, 2022 · 911ce25 · 911ce25
1 parent 5e8ee69
commit 911ce25
Show file tree

Hide file tree

Showing 7 changed files with 702 additions and 236 deletions.
diff --git a/README.md b/README.md
@@ -1,95 +1,101 @@
 ![POT3D](pot3d_logo.png)
-
+ 
 # POT3D: High Performance Potential Field Solver #
 Predictive Science Inc. 
 www.predsci.com
-
+ 
 ## OVERVIEW ##
-
-`POT3D` is a Fortran code that computes potential field solutions to approximate the solar coronal magnetic field using observed photospheric magnetic fields as a boundary condition. It can be used to generate potential field source surface (PFSS), potential field current sheet (PFCS), and open field (OF) models. It has been (and continues to be) used for numerous studies of coronal structure and dynamics. The code is highly parallelized using [MPI](https://www.mpi-forum.org) and is GPU-accelerated using MPI+[OpenACC](https://www.openacc.org/).  The [HDF5](https://www.hdfgroup.org/solutions/hdf5) file format is used for input/output.
-
+ 
+`POT3D` is a Fortran code that computes potential field solutions to approximate the solar coronal magnetic field using observed photospheric magnetic fields as a boundary condition. It can be used to generate potential field source surface (PFSS), potential field current sheet (PFCS), and open field (OF) models. It has been (and continues to be) used for numerous studies of coronal structure and dynamics. The code is highly parallelized using [MPI](https://www.mpi-forum.org) and is GPU-accelerated using MPI+[OpenACC](https://www.openacc.org/), along with an option to use the [NVIDIA cuSparse library](https://developer.nvidia.com/cusparse). The [HDF5](https://www.hdfgroup.org/solutions/hdf5) file format is used for input/output.
+ 
 `POT3D` is the potential field solver for the WSA model in the CORHEL software suite publicly hosted at the [Community Coordinated Modeling Center (CCMC)](https://ccmc.gsfc.nasa.gov/models/modelinfo.php?model=CORHEL/MAS/WSA/ENLIL). 
 A version of `POT3D` that includes GPU-acceleration with both MPI+OpenACC and MPI+[OpenMP](https://www.openmp.org//) was released as part of the Standard Performance Evaluation Corporation's (SPEC) beta version of the [SPEChpc(TM) 2021 benchmark suites](https://www.spec.org/hpc2021). 
-
+ 
 Details of the `POT3D` code can be found in the following publications: 
-
+ 
  - *Variations in Finite Difference Potential Fields*. 
- Caplan, R.M., Downs, C., Linker, J.A., and Mikic, Z. Submitted to Ap.J. (2021)
+ Caplan, R.M., Downs, C., Linker, J.A., and Mikic, Z. [Ap.J. 915,1 44 (2021)](https://iopscience.iop.org/article/10.3847/1538-4357/abfd2f)
  - *From MPI to MPI+OpenACC: Conversion of a legacy FORTRAN PCG solver for the spherical Laplace equation*. 
  Caplan, R.M., Mikic, Z., and Linker, J.L. [arXiv:1709.01126](https://arxiv.org/abs/1709.01126) (2017)
-
+ 
 --------------------------------
-
+ 
 ## HOW TO BUILD POT3D ##
-
-Modify the file `build.sh` to set the `HDF5` library paths/flags and compiler flags compatible with your system environment. 
-Then, run `./build.sh`.
-
+
+Copy the file `build.sh` to `my_build.sh`. 
+Modify `my_build.sh` to set the `HDF5` library paths/flags and compiler flags compatible with your system environment. 
+Then, run `./my_build.sh`.
+
 See comments in `build.sh` for more details.
-
+ 
 ### Validate Installation ###
-
+ 
 After building the code, you can test it is working by running `./validate.sh`. 
 This will perform 2 runs of a small case using 1 and 2 MPI ranks respectively.
-
+ 
 The runs are performed in `testsuite/validation/run/` and the second run overwrites the first.
-
+ 
 Each result will be checked against a reference solution (in `/runs/validation/validation`) and a PASS/FAIL message will be displayed.
-
+ 
 --------------------------------
-
+ 
 ## HOW TO USE POT3D ##
-
+ 
 ### Setting Input Options
-
+ 
 POT3D uses a namelist in an input text file called `pot3d.dat` to set all parameters of a run. See the provided `pot3d_input_documentation.txt` file for details on the various parameter options. For any run, an input 2D data set in HDF5 format is required for the lower radial magnetic field (`Br`) boundary condition. Examples of this file are contained in the `examples` and `testsuite` folders.
-
+ 
 ### Launching the Code ###
-
+ 
 To run `POT3D`, set the desired run parameters in a `pot3d.dat` text file, then copy or link the `pot3d` executable into the same directory as `pot3d.dat`
 and run the command: 
  `<MPI_LAUNCHER> -np <N> ./pot3d ` 
 where `<N>` is the total number of MPI ranks to use (typically equal to the number of CPU cores) and `<MPI_LAUNCHER>` is your MPI run command (e.g. `mpiexec`,`mpirun`, `ibrun`, `srun`, etc). 
 For example: `mpiexec -np 1024 ./pot3d`
-
+ 
 **Important!** 
-For CPU runs, make sure `ifprec=2` is set in the `pot3d.dat` input file. 
-For GPU runs, make sure `ifprec=1` is set in the `pot3d.dat` input file.
-
+For CPU runs, set `ifprec=2` in the `pot3d.dat` input file. 
+For GPU runs, set `ifprec=1` in the `pot3d.dat` input file, unless you build with the `cuSparse` library option, in which case you should set `ifprec=2`.
+ 
 ### Running POT3D on GPUs ###
-
+ 
 For standard cases, one should launch the code such that the number of MPI ranks per node is equal to the number of GPUs per node 
 e.g. 
 `mpiexec -np <N> --ntasks-per-node 4 ./pot3d` 
 or 
 `mpiexec -np <N> --npersocket 2 ./pot3d` 
 
-Note! To run efficiently, it is critical that `ifprec=1` is set in `pot3d.dat`.
-
+If the `cuSparse` library option was used to build the code, than set `ifprec=2` in `pot3d.dat`. 
+If the `cuSparse` library option was NOT used to build the code, it is critical to set `ifprec=1` for efficient performance.
+
 ### Memory Requirements ###
-
+ 
 To estimate how much memory (RAM) is needed for a run, compute: 
-
+ 
 `memory-needed = nr*nt*np*8*15/1024/1000/1000 GB` 
-
+ 
 where `nr`, `nt`, and `np` are the chosen problem sizes in the `r`, `theta`, and `phi` dimension. 
-Note that this estimate is when using `ifprec=1`. If using `ifprec=2`, the required memory is ~2x higher.
-
+Note that this estimate is when using `ifprec=1`. If using `ifprec=2`, the required memory is ~2x higher on the CPU, and even higher when using `cuSparse` on the GPU.
+ 
 ### Solution Output ###
-
+ 
 Depending on the input parameters, `POT3D` can have various outputs. Typically, the three components of the potential magnetic field is output as `HDF5` files. In every run, the following two text files are output:
 
  - `pot3d.out` An output log showing grid information and magnetic energy diagnostics.
  - `timing.out` Time profile information of the run.
-
+
+### Helpful Scripts ###
+
+Some useful python scripts for reading and plotting the POT3D input data, and reading the output data can be found in the `scripts` folder. 
+
 -----------------------------
-
+ 
 ## EXAMPLES and TESTSUITE ##
-
+ 
 ### Examples ###
-
+ 
 In the `examples` folder, we provide ready-to-run examples of three use cases of `POT3D` in the following folders:
-
+ 
 1. **`/potential_field_source_surface`** 
 A standard PFSS run with a source surface radii of 2.5 Rsun.
 2. **`/potential_field_current_sheet`** 
@@ -100,7 +106,7 @@ An example of computing the "open field" model from the solar surface out to 30
 ### Testsuite ###
 
 In the `testsuite` folder, we provide test cases of various sizes that can be used to validate and test the performance of `POT3D`. 
-Each test case contains an `input` folder with the run input files, a `run` folder used to run the test, and a `validation` folder containing the output diagnotics used to validate the test, as well as a text file named `validation_run_information.txt` containing information on how the validation run was computed (system, compiler, number of ranks, etc.) with performance details. Note that all tests use `ifprec=1` so that they can validate GPU runs, therefore CPU performance will not be optimal.
+Each test case contains an `input` folder with the run input files, a `run` folder used to run the test, and a `validation` folder containing the output diagnotics used to validate the test, as well as a text file named `validation_run_information.txt` containing information on how the validation run was computed (system, compiler, number of ranks, etc.) with performance details. Note that all tests are set to use `ifprec=1` only. An option to use `ifprec=2` will be added later.
 
 To run a test, use the included script `run_test.sh` as: 
 `run_test.sh <TEST> <NP>` 
@@ -110,16 +116,16 @@ The following is a list of the included tests, and their problem size and memory
 
 1. **`validation`** 
 Grid size: 63x91x225 = 1.28 million cells 
-Memory (RAM) needed: ~1 GB 
+Memory (RAM) needed (using `ifprec=1`): ~1 GB 
 2. **`small`** 
 Grid size: 133x361x901 = 43.26 million cells 
-Memory (RAM) needed: ~6 GB
+Memory (RAM) needed (using `ifprec=1`): ~6 GB
 3. **`medium`** 
 Grid size: 267x721x1801 = 346.7 million cells 
-Memory (RAM) needed: ~41 GB 
+Memory (RAM) needed (using `ifprec=1`): ~41 GB 
 4. **`large`** 
 Grid size: 535x1441x3601 = 2.78 billion cells 
-Memory (RAM) needed: ~330 GB 
+Memory (RAM) needed (using `ifprec=1`): ~330 GB 
 
 Note that these tests will *not* output the 3D magnetic field results of the run, so no extra disk space is needed. 
 Instead, the validation is done with the magnetic energy diagnostics in the `pot3d.out` file. 

diff --git a/build.sh b/build.sh
@@ -1,45 +1,58 @@
 #!/bin/bash
-
 ###################################################################
-# This build assumes that you have an "mpif90" in your PATH that is
-# set up to use your chosen MPI library and compiler.
+# This build assumes that you have an "mpif90" in your PATH 
+#  set up to use your chosen MPI library and compiler.
 ##################################################################
-
 #################################################################
-# Please set the location of your HDF5 include and library files. 
-# Make sure the HDF5 library is compiled with 
-# the same compiler currently being used and that the 
-# library is in your run-time environment (e.g. LD_LIBRARY_PATH).
+# Please set the location of HDF5 include/library files and
+# the linker flags to match your installed version.
+#
+# Note! The HDF5 library needs to have been compiled with
+# the same compiler being used here and is loaded in the run-time
+# environment (e.g. LD_LIBRARY_PATH).
 #################################################################
 
-# Ubuntu 20.x
+# Ubuntu 20.x:
 HDF5_INCLUDE_DIR="/usr/include/hdf5/serial"
 HDF5_LIB_DIR="/usr/lib/x86_64-linux-gnu"
+HDF5_LIB_FLAGS="-lhdf5_serial_fortran -lhdf5_serialhl_fortran -lhdf5_serial -lhdf5_serial_hl"
 
-# Locally installed
+# Locally installed older version example:
 #HDF5_INCLUDE_DIR="/opt/psi/nv/ext_deps/deps/hdf5/include"
 #HDF5_LIB_DIR="/opt/psi/nv/ext_deps/deps/hdf5/lib"
+#HDF5_LIB_FLAGS="-lhdf5_fortran -lhdf5hl_fortran -lhdf5 -lhdf5_hl"
 
 ###########################################################################
-# Please set the HDF5 linker flags to match your installed version of hdf5.
+# Please set the compile flags based on your compiler and hardware setup.
 ###########################################################################
 
-# Ubuntu 20.x
-HDF5_LIB_FLAGS="-lhdf5_serial_fortran -lhdf5_serialhl_fortran -lhdf5_serial -lhdf5_serial_hl"
-
-# Locally installed
-#HDF5_LIB_FLAGS="-lhdf5_fortran -lhdf5hl_fortran -lhdf5 -lhdf5_hl"
+FFLAGS="-O3"
 
 ###########################################################################
-# Please set the compile flags based on your compiler and hardware setup.
 # Examples:
-# GNU (CPU): FFLAGS="-O3 -mtune=native "
-# NVFORTRAN (CPU): FFLAGS="-O3"
-# NVFORTRAN (GPU): FFLAGS="-O3 -acc=gpu -gpu=cc61,cc75,cuda11.4 -Minfo=accel"
-# IFORT (CPU): FFLAGS="-O3 -fp-model precise -assume byterecl -heap-arrays -xCORE-AVX2 -axCORE-AVX512"
+# GCC (CPU MPI only): FFLAGS="-O3 -march=native"
+# GCC (CPU MPI+threads): FFLAGS="-O3 -march=native 
+# -ftree-parallelize-loops=${OMP_NUM_THREADS}"
+# NVIDIA HPC SDK (CPU MPI only): FFLAGS="-O3 -march=native"
+# NVIDIA HPC SDK (CPU MPI+threads): FFLAGS="-O3 -march=native 
+# -stdpar=multicore -acc=multicore"
+# NVIDIA HPC SDK (GPU MPI+GPU): FFLAGS="-O3 -march=native 
+# -stdpar=gpu -acc=gpu -Minfo=accel 
+# -gpu=cc80,cuda11.6,nomanaged"
+# INTEL HPC SDK (CPU MPI only): FFLAGS="-O3 -xHost -assume byterecl 
+# -heap-arrays"
+# INTEL HPC SDK (CPU MPI+threads): FFLAGS="-O3 -xHost -assume byterecl 
+# -heap-arrays -mp"
 ###########################################################################
 
-FFLAGS="-O3"
+###########################################################################
+# If using NV HPC SDK for GPUs, with CUDA version >= 11.3, you can set 
+# the following to "1" to link the cuSparse library, allowing you to set
+# 'ifprec=2' in 'pot3d.dat' to yield ~2x speed improvement! 
+# Warning! Using ifprec=2 takes much more GPU memory than ifprec=1.
+###########################################################################
+
+POT3D_CUSPARSE=0
 
 ###########################################################################
 ###########################################################################
@@ -50,6 +63,7 @@ POT3D_HOME=$PWD
 cd ${POT3D_HOME}/src
 cp Makefile.template Makefile
 sed -i "s#<FFLAGS>#${FFLAGS}#g" Makefile
+sed -i "s#<POT3D_CUSPARSE>#${POT3D_CUSPARSE}#g" Makefile
 sed -i "s#<HDF5_INCLUDE_DIR>#${HDF5_INCLUDE_DIR}#g" Makefile
 sed -i "s#<HDF5_LIB_DIR>#${HDF5_LIB_DIR}#g" Makefile
 sed -i "s#<HDF5_LIB_FLAGS>#${HDF5_LIB_FLAGS}#g" Makefile

diff --git a/scripts/psi_data_reader_3d.py b/scripts/psi_data_reader_3d.py
@@ -0,0 +1,56 @@
+#!/usr/bin/env python
+import h5py as h5
+import numpy as np
+import argparse
+import psihdf as ps
+
+def argParsing():
+
+ parser = argparse.ArgumentParser(description='Read PSI POT3D 3D hdf5 data.')
+
+ parser.add_argument("psi_3D_hdf5_file_name",
+ help='Name of 3D PSI POT3D HDF5 file (e.g. br.h5).')
+
+ args = parser.parse_args()
+
+ return parser.parse_args()
+
+def main():
+
+ ## Get input agruments:
+ args = argParsing()
+
+ rvec, tvec, pvec, data = ps.rdhdf_3d(args.psi_3D_hdf5_file_name)
+
+ rmin = np.min(rvec)
+ rmax = np.max(rvec)
+ tmin = np.min(tvec)
+ tmax = np.max(tvec)
+ pmin = np.min(pvec)
+ pmax = np.max(pvec)
+
+ NR = rvec.size
+ NT = tvec.size
+ NP = pvec.size
+
+ print('Opened file:'+args.psi_3D_hdf5_file_name)
+ print('NR: '+str(NR))
+ print('NT: '+str(NT))
+ print('NP: '+str(NP))
+ print('min(r): '+str(np.min(rvec)))
+ print('max(r): '+str(np.max(rvec))) 
+ print('min(theta): '+str(np.min(tvec)))
+ print('max(theta): '+str(np.max(tvec)))
+ print('min(phi): '+str(np.min(pvec)))
+ print('max(phi): '+str(np.max(pvec)))
+ print('min(data): '+str(np.min(data)))
+ print('max(data): '+str(np.max(data)))
+ print('mean(data): '+str(np.mean(data)))
+ print('Example data point:')
+ print('{}\t{}\t{}\t{}'.format('r[3]', 'theta[4]', 'phi[5]', 'data[5,4,3]'))
+ print('{}\t{}\t{}\t{}'.format(rvec[3], tvec[4], pvec[5], data[5,4,3]))
+
+if __name__ == '__main__':
+ main()
+
+
diff --git a/src/Makefile.template b/src/Makefile.template
@@ -1,23 +1,43 @@
 FC = mpif90
 
-FFLAGS = <FFLAGS> -I<HDF5_INCLUDE_DIR>
+POT3D_CUSPARSE=<POT3D_CUSPARSE>
 
-OBJS = number_types.o \
+ifeq ($(POT3D_CUSPARSE),1)
+ IF_DEF = -DCUSPARSE
+ CC = nvc
+ CCFLAGS = -O3 -acc=gpu
+ FPARADD = -cudalib=cusparse
+else
+ IF_DEF = 
+ FPARADD =
+endif
+
+FFLAGS = <FFLAGS> $(FPARADD) -I<HDF5_INCLUDE_DIR>
+
+OBJS0 = number_types.o \
  zm_parse_modules.o \
  zm_parse.o \
  zm_sds_modules.o \
- zm_sds.o \
- pot3d.o
+ zm_sds.o
+
+ifeq ($(POT3D_CUSPARSE),1)
+ OBJS = $(OBJS0) lusol_cusparse.o pot3d_cpp.o
+else
+ OBJS = $(OBJS0) pot3d_cpp.o
+endif
 
 LDFLAGS = -L<HDF5_LIB_DIR> <HDF5_LIB_FLAGS>
 
-all: $(OBJS)
+all: $(OBJS)
  $(FC) $(FFLAGS) $(OBJS) $(LDFLAGS) -o pot3d
- rm *.mod *.o 2>/dev/null
+ rm -f *.mod *.o 2>/dev/null
 
 clean:
  rm pot3d 2>/dev/null
- rm -f *.mod *.o 2>/dev/null
+ rm -f *.mod *.o pot3d.f 2>/dev/null
+
+pot3d_cpp.f: pot3d.F
+ $(FC) -E -cpp $(IF_DEF) > pot3d_cpp.f $<
 
 number_types.o: number_types.f
  $(FC) -c $(FFLAGS) $<
@@ -34,6 +54,9 @@ zm_sds_modules.o: zm_sds_modules.f
 zm_sds.o: zm_sds.f zm_sds_modules.f number_types.f
  $(FC) -c $(FFLAGS) $<
 
-pot3d.o: pot3d.f
+lusol_cusparse.o: lusol_cusparse.c 
+ $(CC) -c $(CCFLAGS) ${FPARADD} $<
+
+pot3d_cpp.o: pot3d_cpp.f
  $(FC) -c $(FFLAGS) $<