Skip to content
Denis Alevi edited this page Aug 24, 2020 · 1 revision

cpp_standalone codeobject templates

In cpp_standalone, codeobjects define a single function that runs that codeobject. In that function, there exists a {{pointer_lines}} template variables in the templates in order to add keywords to pointer definitions (e.g. __restrict__ in order to get some compiler optimizations). This looks like this:

In the template file,

{{pointer_lines}}

is replaced by multiple lines of the form

double __restrict__ _ptr_some_variable = some_variable;
double ptr_other_not_restrict_var = other_not_restrict_var;

cuda_standalone codeobject templates

For cuda_standalone, the codeobjects are a bit more complicated. There is a host function wihtout parameters (just as in the cpp_standalone files). But this host function calls a kernel function that runs the code on the GPU.

For cuda_standalone, we don't use the {{pointer_lines}} template variable (it is just ignored I believe). Instead we define the following variables in the templates, which are replaced with code lines in brian2cuda/device.py.

  • %CONSTANTS%

    This is also defined in cpp_standalone codeobjects and is used in the same way: Constant variables needed in the host code of the cuda_standalone kernel are defined here. In the statemonitor.cu kernel, this is for example used to extract the raw pointer variable from a thrust device vector object:

    ///// CONSTANTS /////
    double* const _array_statemonitor_t = thrust::raw_pointer_cast(&dev_dynamic_array_statemonitor_t[0]);
  • %HOST_PARAMETERS%

    These are function arguments passed to the kernel called in the host code. For the statemonitor example, this would be replaced by:

    kernel_statemonitor_codeobject<<<..., ...>>>(
        ...,  // parameters that are hard coded in the template
        ///// HOST_PARAMETERS /////
        _array_statemonitor_t,
        );
  • %DEVICE_PARAMETERS%

    These are the paramters in the device kernel definition. For the statemonitor, it looks like this:

    __global__ void
    kernel_statemonitor_codeobject(
        ...,  // parameters that are hard coded in the tempalte
        ///// DEVICE_PARAMETERS /////
          double* par__array_statemonitor_t,
        )
    { // kernel implementation }
  • %KERNEL_PARAMETERS%

    These are the cuda_standalone equivalent to cpp_standalones {{pointer_lines}}. Here, we can add keywords such as __restrict (the cuda version of __restrict__). Currently, we are not doing that yet (see #53). Instead we are just renaming the pointer such that they are compatible with the pointer names used in {{scalar_code}} and {{vector_code}}, which is the same as in cpp_standalone. For the statemonitor, it looks like this:

    ///// KERNEL_PARAMETERS /////
      double* _ptr_array_statemonitor_t = par__array_statemonitor_t;

If t is used in the {{scalar_code}} or {{vector_code}} in the template, it will be called _ptr_array_statemonitor_t. Currently we do copy a bunch of variable like this even when they are not used. This is actually the case f _ptr_array_statemonitor_t. I once tested performance when removing the unnecessary KERNEL_PARAMETERS and didn't see any improvement. So they might just get optimized out by the compiler. But there might be some effect on register usage in the kernel that could be investigated, see #69.