Update README

acbbullock · Apr 28, 2023 · 9abeb89 · 9abeb89
1 parent 37d5de8
commit 9abeb89
Showing 1 changed file with 7 additions and 7 deletions.
diff --git a/README.md b/README.md
@@ -193,18 +193,18 @@ Note: with `init`, the biases are initialized to zero prior to training, and the
 
 The only dependency of this project is the Intel MKL distribution of LAPACK. With a system installation of [Intel oneAPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html) Base and HPC toolkits (including MKL), the project can be built and run on Windows 10/11 and Linux with [fpm](https://github.com/fortran-lang/fpm) from the project root using a single command, assuming the shell environment has sourced the oneAPI environment variables beforehand.
 
-On Windows, the project can be built and run using the command
+To target a multi-core CPU with the AVX2 instruction set for best performance, the project may be built and run on Windows 10/11 using the command
 
 ```powershell
-fpm run --compiler ifort --flag "/O3 /arch:CORE-AVX2 /Qcoarray /Qcoarray-num-images:n /heap-arrays:0" --link-flag "mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib"
+fpm run --compiler ifort --flag "/O3 /arch:CORE-AVX2 /Qcoarray /Qcoarray-num-images:n /heap-arrays:0 /Qparallel /Qmkl:parallel /Qopenmp /Qopenmp-simd /fp:fast" --link-flag "mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib"
 ```
 
-for a CPU with the AVX2 instruction set extension for best performance, where `n` is the number of images to use. The `O3` flag enables the highest optimization level, the `arch` flag specifies which instruction sets to target, the `Qcoarray` flag enables the coarray feature of Fortran 2008 with `Qcoarray-num-images:n` specifying the number of images to use, and the `heap-arrays:0` flag puts all automatic arrays on the heap, which may be necessary to avoid stack overflows for larger systems but can be omitted for smaller systems. The link flag specifies the MKL and OpenMP runtime libraries for static linking.
-
-Similarly, the project may be built and run on Linux using the command
+and on Linux using the command
 
 ```bash
-fpm run --compiler ifort --flag "-O3 -arch CORE-AVX2 -coarray -coarray-num-images=n -heap-arrays 0" --link-flag "-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -liomp5"
+fpm run --compiler ifort --flag "-O3 -march=core-avx2 -coarray -coarray-num-images=n -heap-arrays 0 -parallel -qmkl=parallel -qopenmp -qopenmp-simd -fp-model=fast" --link-flag "-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -liomp5 -lpthread -lm -ldl"
 ```
 
-with identical features. The link lines are provided by the [Intel Link Line Advisor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html).
+with equivalent features.
+
+Here, the AVX2 instructions may be replaced with `-xHost` (`/QxHost`) or another instruction set, and `n` is the number of images to execute, which generally should equal the number of CPU cores available. The `heap-arrays` option may be omitted for smaller systems, but is necessary to avoid stack overflows for larger systems (unless `ulimit` is sufficiently raised on Linux). Finally, the link flag specifies the MKL and OpenMP runtime libraries for static linking, provided by the [Intel Link Line Advisor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html).