Skip to content

Commit

Permalink
Update README
Browse files Browse the repository at this point in the history
  • Loading branch information
acbbullock committed May 5, 2023
1 parent 691aa47 commit 8ff56b5
Showing 1 changed file with 18 additions and 4 deletions.
22 changes: 18 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -193,18 +193,32 @@ Note: with `init`, the biases are initialized to zero prior to training, and the

The only dependency of this project is the Intel MKL distribution of LAPACK. With a system installation of [Intel oneAPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/toolkits.html) Base and HPC toolkits (including MKL), the project can be built and run on Windows 10/11 and Linux with [fpm](https://github.com/fortran-lang/fpm) from the project root using a single command, assuming the shell environment has sourced the oneAPI environment variables beforehand.

To target a multi-core CPU with the AVX2 instruction set for best performance, the project may be built and run on Windows 10/11 using the command
To target an $n$ core CPU with SIMD instructions, the project can be built and run on Windows 10/11 using the command

```powershell
fpm run --compiler ifort --flag "/O3 /arch:CORE-AVX2 /Qcoarray /Qcoarray-num-images:n /heap-arrays:0 /Qmkl:parallel /Qopenmp /Qopenmp-simd /fp:precise" --link-flag "mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib"
fpm run --compiler ifort --flag "/Qcoarray /Qcoarray-num-images:n /Qopenmp /Qopenmp-simd" --link-flag "mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib"
```

and on Linux using the command

```bash
fpm run --compiler ifort --flag "-O3 -march=core-avx2 -coarray -coarray-num-images=n -heap-arrays 0 -qmkl=parallel -qopenmp -qopenmp-simd -fp-model=precise" --link-flag "-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -liomp5 -lpthread -lm -ldl"
fpm run --compiler ifort --flag "-coarray -coarray-num-images=n -qopenmp -qopenmp-simd" --link-flag "-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -liomp5 -lpthread -lm -ldl"
```

with equivalent features.

Here, the AVX2 instructions may be replaced with `-xHost` (`/QxHost`) or another instruction set, and `n` is the number of images to execute, which generally should equal the number of CPU cores available. The `heap-arrays` option may be omitted for smaller systems, but is necessary to avoid stack overflows for larger systems (unless `ulimit` is sufficiently raised on Linux). We then enable the generation of multi-threaded code with OpenMP and SIMD compilation. Finally, the link flag specifies the MKL and OpenMP runtime libraries for static linking, provided by the [Intel Link Line Advisor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html).
Here, `n` is the number of images to execute, which generally should equal the number of CPU cores available. We then enable the generation of multi-threaded code with OpenMP and SIMD compilation. Finally, the link flag specifies the MKL and OpenMP runtime libraries for static linking, provided by the [Intel Link Line Advisor](https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html).

To target an $n$ core CPU and an Intel GPU for acceleration, the project can be built and run on Windows 10/11 using the command

```powershell
fpm run --compiler ifx --flag "/Qcoarray /Qcoarray-num-images:n /Qiopenmp /Qopenmp-targets:spir64 /Qopenmp-target-do-concurrent" --link-flag "mkl_lapack95_lp64.lib mkl_intel_lp64.lib mkl_intel_thread.lib mkl_core.lib libiomp5md.lib OpenCL.lib"
```

and on Linux using the command

```bash
fpm run --compiler ifx --flag "-coarray -coarray-num-images=n -fiopenmp -fopenmp-targets=spir64 -fopenmp-target-do-concurrent" --link-flag "-Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_lapack95_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a -liomp5 -lOpenCL -lpthread -lm -ldl"
```

with equivalent features.

0 comments on commit 8ff56b5

Please sign in to comment.