Ginkgo is a high-performance linear algebra library for manycore systems, with a focus on sparse solution of linear systems. It is implemented using modern C++ (you will need at least C++11 compliant compiler to build it), with GPU kernels implemented in CUDA.
An extensive database of up-to-date benchmark results is available in the performance data repository. Visualizations of the database can be interactively generated using the Ginkgo Performance Explorer web application. The benchmark results are automatically updated using the CI system to always reflect the current state of the library.
For Ginkgo core library:
- cmake 3.9+
- C++11 compliant compiler, one of:
- gcc 5.3+, 6.3+, 7.3+, 8.1+
- clang 3.9+
- Apple LLVM 8.0+ (TODO: verify)
The Ginkgo CUDA module has the following additional requirements:
- CUDA 9.0+
- Any host compiler restrictions your version of CUDA may impose also apply here. For the newest CUDA version, this information can be found in the CUDA installation guide for Linux or CUDA installation guide for Mac Os X
In addition, if you want to contribute code to Ginkgo, you will also need the following:
- clang-format 5.0.1+ (ships as part of clang)
Windows is currently not supported, but we are working on porting the library there. If you are interested in helping us with this effort, feel free to contact one of the developers. (The library itself doesn't use any non-standard C++ features, so most of the effort here is in modifying the build system.)
TODO: Some restrictions will also apply on the version of C and C++ standard libraries installed on the system. We need to investigate this further.
Use the standard cmake build procedure:
mkdir build; cd build
cmake -G "Unix Makefiles" [OPTIONS] .. && make
Replace [OPTIONS]
with desired cmake options for your build.
Ginkgo adds the following additional switches to control what is being built:
-
-DGINKGO_DEVEL_TOOLS={ON, OFF}
sets up the build system for development (requires clang-format, will also download git-cmake-format), default isON
-
-DGINKGO_BUILD_TESTS={ON, OFF}
builds Ginkgo's tests (will download googletest), default isON
-
-DGINKGO_BUILD_BENCHMARKS={ON, OFF}
builds Ginkgo's benchmarks (will download gflags and rapidjson), default isON
-
-DGINKGO_BUILD_EXAMPLES={ON, OFF}
builds Ginkgo's examples, default isON
-
-DGINKGO_BUILD_REFERENCE={ON, OFF}
build reference implementations of the kernels, useful for testing, default isOFF
-
-DGINKGO_BUILD_OMP={ON, OFF}
builds optimized OpenMP versions of the kernels, default isOFF
-
-DGINKGO_BUILD_CUDA={ON, OFF}
builds optimized cuda versions of the kernels (requires CUDA), default isOFF
-
-DGINKGO_BUILD_DOC={ON, OFF}
creates an HTML version of Ginkgo's documentation from inline comments in the code. The default isOFF
. -
-DGINKGO_DOC_GENERATE_PDF={ON, OFF}
generates a PDF version of Ginkgo's documentation from inline comments in the code. The default isOFF
. -
-DGINKGO_DOC_GENERATE_DEV={ON, OFF}
generates the developer version of Ginkgo's documentation. The default isOFF
. -
-DGINKGO_SET_CUDA_HOST_COMPILER={ON, OFF}
instructs the build system to explicitly set CUDA's host compiler to match the compiler used to build the the rest of the library (otherwise the nvcc toolchain uses its default host compiler). Setting this option may help if you're experiencing linking errors due to ABI incompatibilities. The default isOFF
. -
-DGINKGO_EXPORT_BUILD_DIR={ON, OFF}
adds the Ginkgo build directory to the CMake package registry. The default isOFF
. -
-DCMAKE_INSTALL_PREFIX=path
sets the installation path formake install
. The default value is usually something like/usr/local
-
-DGINKGO_VERBOSE_LEVEL=integer
sets the verbosity of Ginkgo.0
disables all output in the main libraries,1
enables a few important messages related to unexpected behavior (default).
-
-DBUILD_SHARED_LIBS={ON, OFF}
builds ginkgo as shared libraries (OFF
) or as dynamic libraries (ON
), default isON
-
-DGINKGO_CUDA_ARCHITECTURES=<list>
where<list>
is a semicolon (;
) separated list of architectures. Supported values are:Auto
Kepler
,Maxwell
,Pascal
,Volta
CODE
,CODE(COMPUTE)
,(COMPUTE)
Auto
will automatically detect the present CUDA-enabled GPU architectures in the system.Kepler
,Maxwell
,Pascal
andVolta
will add flags for all architectures of that particular NVIDIA GPU generation.COMPUTE
andCODE
are placeholders that should be replaced with compute and code numbers (e.g. forcompute_70
andsm_70
COMPUTE
andCODE
should be replaced with70
. Default isAuto
. For a more detailed explanation of this option see theARCHITECTURES
specification list section in the documentation of the CudaArchitectureSelector CMake module.
For example, to build everything (in debug mode), use:
mkdir build; cd build
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Debug -DGINKGO_DEVEL_TOOLS=ON \
-DGINKGO_BUILD_TESTS=ON -DGINKGO_BUILD_REFERENCE=ON -DGINKGO_BUILD_OMP=ON -DGINKGO_BUILD_CUDA=ON ..
make
NOTE: Currently, the only verified CMake generator is Unix Makefiles
.
Other generators may work, but are not officially supported.
You need to compile ginkgo with -DGINKGO_BUILD_TESTS=ON
option to be able to run the
tests. Use the following command inside the build folder to run all tests:
make test
The output should contain several lines of the form:
Start 1: path/to/test
1/13 Test #1: path/to/test ............................. Passed 0.01 sec
To run only a specific test and see more details results (e.g. if a test failed) run the following from the build folder:
./path/to/test
where path/to/test
is the path returned by make test
.
In addition to the unit tests designed to verify correctness, Ginkgo also
includes a benchmark suite for checking its performance on the system. To
compile the benchmarks, the flag -DGINKGO_BUILD_BENCHMARKS=ON
has to be set during
the cmake
step. In addition, the ssget
command-line
utility has to be installed on the
system.
The benchmark suite tests Ginkgo's performance using the SuiteSparse matrix collection and artificially generated matrices. The suite sparse collection will be downloaded automatically when the benchmarks are run. Please note that the entire collection requires roughly 100GB of disk storage in its compressed format, and roughly 25GB of additional disk space for intermediate data (such us uncompressing the archive). Additionally, the benchmark runs usually take a long time (SpMV benchmarks on the complete collection take roughly 24h using the K20 GPU), and will stress the system.
The benchmark suite is invoked using the make benchmark
command in the build
directory. The behavior of the suite can be modified using environment
variables. Assuming the bash
shell is used, these can either be specified via
the export
command to persist between multiple runs:
export VARIABLE="value"
...
make benchmark
or specified on the fly, on the same line as the make benchmark
command:
env VARIABLE="value" ... make benchmark
Since make
sets any variables passed to it as temporary environment variables,
the following shorthand can also be used:
make benchmark VARIABLE="value" ...
A combination of the above approaches is also possible (e.g. it may be useful to
export
the SYSTEM_NAME
variable, and specify the others at every benchmark
run).
Supported environment variables are described in the following list:
BENCHMARK={spmv, solver, preconditioner}
- The benchmark set to run. Default isspmv
.spmv
- Runs the sparse matrix-vector product benchmarks on the SuiteSparse collection.solver
- Runs the solver benchmarks on the SuiteSparse collection. The matrix format is determined by running thespmv
benchmarks first, and using the fastest format determined by that benchmark. The maximum number of iterations for the iterative solvers is set to 10,000 and the requested residual reduction factor to 1e-6.preconditioner
- Runs the preconditioner benchmarks on artificially generated block-diagonal matrices.
DRY_RUN={true, false}
- If set totrue
, prepares the system for the benchmark runs (downloads the collections, creates the result structure, etc.) and outputs the list of commands that would normally be run, but does not run the benchmarks themselves. Default isfalse
.EXECUTOR={reference,cuda,omp}
- The executor used for running the benchmarks. Default iscuda
.SEGMENTS=<N>
- Splits the benchmark suite into<N>
segments. This option is useful for running the benchmarks on an HPC system with a batch scheduler, as it enables partitioning of the benchmark suite and running it concurrently on multiple nodes of the system. If specified,SEGMENT_ID
also has to be set. Default is1
.SEGMENT_ID=<I>
- used in combination with theSEGMENTS
variable.<I>
should be an integer between 1 and<N>
. If specified, only the<I>
-th segment of the benchmark suite will be run. Default is1
.SYSTEM_NAME=<name>
- the name of the system where the benchmarks are being run. This option only changes the directory where the benchmark results are stored. It can be used to avoid overwriting the benchmarks if multiple systems share the same filesystem, or when copying the results between systems. Default isunknown
.
Once make benchmark
completes, the results can be found in
<Ginkgo build directory>/benchmark/results/${SYSTEM_NAME}/
. The files are
written in the JSON format, and can be analyzed using any of the data
analysis tools that support JSON. Alternatively, they can be uploaded to an
online repository, and analyzed using Ginkgo's free web tool
Ginkgo Performance Explorer (GPE).
(Make sure to change the "Performance data URL" to your repository if using
GPE.)
To install Ginkgo into the specified folder, execute the following command in the build folder
make install
If the installation prefix (see CMAKE_INSTALL_PREFIX
) is not writable for your
user, e.g. when installing Ginkgo system-wide, it might be necessary to prefix
the call with sudo
.
After the installation, CMake can find ginkgo with find_package(Ginkgo)
.
An example can be found in the install_test
.
Note: If the installed ginkgo was built statically and with CUDA,
CUDA
needs to be specified as a language in order for CMake to work properly.
Refer to ABOUT-LICENSING.md for details.