该项目主要集中于对Fortran/C/Cpp/Cuda/Python语言间混合编程的实现与测试。
具体可以参见Cuda toolkits 文档
在这里特别列出Cuda API:
- CUDA Runtime API The CUDA runtime API.
- CUDA Driver API The CUDA driver API.
- CUDA Math API The CUDA math API.
- cuBLAS The cuBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs.
- NVBLAS The NVBLAS library is a multi-GPUs accelerated drop-in BLAS (Basic Linear Algebra Subprograms) built on top of the NVIDIA cuBLAS Library.
- nvJPEG The nvJPEG Library provides high-performance GPU accelerated JPEG decoding functionality for image formats commonly used in deep learning and hyperscale multimedia applications.
- cuFFT The cuFFT library user guide.
- nvGRAPH The nvGRAPH library user guide.
- cuRAND The cuRAND library user guide.
- cuSPARSE The cuSPARSE library user guide.
- NPP NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance.
- NVRTC (Runtime Compilation) NVRTC is a runtime compilation library for CUDA C++. It accepts CUDA C++ source code in character string form and creates handles that can be used to obtain the PTX. The PTX string generated by NVRTC can be loaded by cuModuleLoadData and cuModuleLoadDataEx, and linked with other modules by cuLinkAddData of the CUDA Driver API. This facility can often provide optimizations and performance not possible in a purely offline static compilation.
- Thrust The Thrust getting started guide.
- cuSOLVER The cuSOLVER library user guide.
在Fortran领域,我们众所周知的便是netlib的BLAS/LAPACK及OpenBLAS库,以及mkl及amsl数值库了;然后如果想涉及到并行与MPI的线性计算库,可以学学PETSc/tao。相对来说,C/C++的计算库极多(虽然可能底层也会调用MKL或者OpenBLAS/SuiteSparse等等)。最经典的,常用的C数值计算库为GSL,而C++对应的便是Eigen3,以及提供两种语言接口的OpenCV。当然,最终,我们推荐的是Boost及Eigen组合,以及支持GPU的ATen。
这个问题你可以在Fortran Coder群里问,如果你熟悉了Fortran/C编程,只要数据类型一致,接口匹配,混合编程不难,特别是使用同一套c/fortran编译器,比如pgi/pgc,gcc/g++/gfortran,icc/ifort。
虽然Python提供了Cython与ctypes,但是相比而言,使用Boost.Python/Boost.Numpy以及DataFrame,于对用户开发更加便捷。通过暴露接口的形式可以灵活控制C/C++对象与Python的交互。如果你曾经开发过COM组建,在Windows平台使多种语言可以便捷调用C++对象,那么这次对接受C/C++向Python的交互是十分乐意,因为它如此简洁。