Add cuda implementation for TimedArray #181

denisalevi · 2020-08-26T18:15:07Z

PR for #47

We copied the CPPCodeGenerator class instead of copying because of function implementation issues (see PR #39). This just updates some changes in CPPCodeGenerator.

See brian-team/brian2@ffde29e Function namespace is dealt with in CPPStandaloneDevice instead of CPPCodeGenerator. We inherit the change from CPPStandaloneDevice, but didn't update the change in CUDACodeGenerator (which doesn not inherit from CPPCodeGenerator (PR #39))

See brian-team/brian2@ee3356f

- Only works for TimedArray in device code - Bug: timedarray seems to not always use the codeobject owner's clock, needs fix

Clocks only run on the host. It seems my previous attemt at copying time variables to the host when needed did not work in all cases.

- Original commit that removed clock sync: 658925b - With the commit before this one, I implemented a better (and working) way to pass time variables by value to kernels

With 51b467c, the rand/n function names changed and where not caught anymore in the CUDAStandaloneDevice

denisalevi · 2020-08-26T21:07:33Z

brian2cuda/cuda_generator.py

+ #support_code.append('__device__ {0} _namespace{1};'.format(type_str,
+ #  ns_key))
+ #pointers.append('_namespace{0} = {1};'.format(ns_key, ns_key))


While implementing TimedArray, I came across some code that has been updated in Brian like 5 years ago but never made it into Brian2CUDA 😅. So I've been updating some things and came across this section here on function namespaces (guess this changed in brian-team/brian2#380). But I don't think I actually understand what the use case is where I need these namespaces. @mstimberg Could you shed some light here? :)

This function is copied from CPPCodeGenerator to CUDACodeGenerator. I have deleted the support_code line here because our variables are defined globally in objects.cu anyways and since we have both host and device pointers, it depends on the template which variable should be use (our synapse generator template is running on the host, all others codeobjects run on the device). But I didn't quite get why cpp needs this file global variable in the first place, aren't the variables in objects.cpp global as well?

And we don't use the pointers list, we do all that in brian2cuda/device.py.

Question is: what situations am I breaking here? For the TimedArray implementation, I need the device pointer d_timedarray_values array that is global anyways, no need for the _namespace... version.

Final tests are running right now, but I don't think I saw any test failing because of a namespace issue.

Just came across a documentation section on user defined namespaces. I guess that is it. And I found the Brian2 tests, which are not marked as standalone-compatible and therefore not run. I'll adapt them. Ok, I have a starting point now, I'll look into it.

Yes, if I remember correctly this is mostly so that the values from the namespace are accessible under a known name for user-written code. And particularly the name is the same for runtime and standalone mode, so in principle you could share the C++ implementation which was relevant when we still supported weave.

I played around with it and I still don't understand it. Here is an example I used with C++ standalone:

set_device('cpp_standalone') my_var = 0.1 * np.array(np.arange(5)) @implementation('cpp', ''' float foo(int i) { using namespace brian; return my_var[i]; }''' , namespace={'my_var': my_var}) @check_units(i=1, result=1) def foo(i): return my_var[i] G = NeuronGroup(5, 'v : 1') G.run_regularly('v = foo(i)') net = Network(G) net.run(defaultclock.dt) assert_allclose(G.v_[:], my_var)

This generates the following code in neurongroup_run_regularly_codeobject.cpp:

// support code static double* _namespacemy_var; float foo(int i) { using namespace brian; return my_var[i]; } // pointer_lines (uses namespace brian) _namespacemy_var = my_var;

The file-global _namespacemy_var variable is not used at all, instead the global my_var variable is used. And the user still has to define using namespace brian; in their function implementation, otherwise it doesn't work.

If the goal was to make my_var accessible without having to add the brian:: namespace, then the cpp function should use the _namespacemy_var I guess. But it doesn't.

And the only place in the brian2 code that I found where this _namespace{var} mechanism is actually used is in the TimedArray implementation, where the return value is hardcoded to be called _namespace{name} (see here).

I think I'm still missing something here?

Your function code gets included verbatim and does not get rewritten automatically (maybe it should?). The correct way to write your example if you follow the instructions in the documentation is to write:

@implementation('cpp', ''' float foo(int i) { return _namespacemy_var[i]; }''' , namespace={'my_var': my_var})

It's not really relevant anymore, but back in the days with weave this would have worked for both runtime and standalone mode whereas your variant hardcodes knowledge about standalone mode.

Ahh I see. Got it now. I didn't see that part in the documentation. Alright, if it is in the docs, it should work with _namespace{var}. I implemented this now for brain2CUDA as well. Turns out, the mechanism is even useful for brian2CUDA, since this way the same function can be used verbatim in both, host and device code and I can just let the _namespace{var} pointer point to either host or device data, depending on codeobject, without the user having to care about it. Perfect.

Thanks again for your help!

This is an attempt for a cleaner solution that removes the unnecessary value -> pointer -> value copying of Clock variables we had before. But it breaks the USES_VARIABLES mechanism. I needed to redefine _clock_t/dt to not use the [0] index.

denisalevi · 2020-08-27T13:36:32Z

Hi @mstimberg , I have another implementation choice question here.

All our Clocks are running on the host. We used to copy time variables to the device every time step, but that created too much overhead (could be implemented efficiently I guess, but we didn't). Instead, we are now passing Clock variables as kernel parameters (by value) only when they are needed. But when Clock variables are used in scalar_code or vector_code, they are expected to be pointers.

My solution here was to get the address of the Clock variables in %KERNEL_VARIABLES% (which is the cuda_standalone version of {{pointer_lines}}). Works fine and looks something like this:

__global__ void kernel_codobject(
    double _value_neurongroup_clock_t  // Clock variable passed by value
)
{
    ...
    // KERNEL_VARIABLES lines, getting the pointer
    double* _ptr_array_neurongroup_glock_t = &_value_neurongroup_clock_t;
    ...
    // scalar or vector code (unchanged from what cpp_standalone produces)
    double __source_value_neurongroup_t = _ptr_array_neurongroup_clock_t[0];
    // some array that uses t, e.g. timedarray
    const double _lio_1 = _timedarray(__source_value_neurongroup_t);
   ...
}

But I just realized that this is another hacky solution because I didn't have all the Brian internals in mind well enough ... So instead of extracting the pointer in KERNEL_VARIABLES, I should maybe just change how time variables are read in CUDACodeGenerator? I did that just now in 02c29d8. Now the code from above would look like this:

__global__ void kernel_codobject(
    double _value_neurongroup_clock_t  // Clock variable passed by value
)
{
    ...
    // nothing in KERNEL_VARIABLES lines
    ...
    // scalar or vector code (removed the [0] in the t variable read)
    double __source_value_neurongroup_t = _ptr_array_neurongroup_clock_t;
    // some array that uses t, e.g. timedarray
    const double _lio_1 = _timedarray(__source_value_neurongroup_t);
   ...
}

This seems to be the cleaner solution and it removes the unnecessary value -> pointer -> value copies. The only problem I have now is that the USES_VARIABLES mechanism does not work anymore. I did change the names of Clock variables to _value{name} (instead of _ptr{name}) in CUDACodeGenerator.get_array_name() (see 02c29d8), so the variable names in the templates are correct when using USES_VARIABLES. But the indexing is wrong, it uses _value{name}[0] instead of _value{name}. For the scalar_code and vector_code, I changed CUDACodeGenerator.translate_to_read_arrays() (see 02c29d8), but for the USES_VARIABLES mechanism, this seems to happen here in Device.code_object() and I didn't quite now how to change that. So for now I just redefine the _clock_t and _clock_dt variables in the templates (see e.g. ratemonitor.cu in 02c29d8). This works but again: not such a clean solution.

And since this becomes a bit more complicated, I thought maybe the easiest way would have been to just change the Clock Variables themselves?

So, sorry for the endless post. Hope it is not too unclear. What is your take, how would you implement this? I realize I am adding some hacky, not-Brian-intended solutions everywhere, so I wanted to start asking for the intended approach when possible :P

mstimberg · 2020-08-27T15:15:03Z

I'm afraid I did not understand completely. The USES_VARIABLES mechanism does only decide what is passed in the variables dictionary to the code generation process so I'm not quite sure how this affects your code. Or is this about generated code that you are re-using from C++ standalone?

denisalevi · 2020-08-27T16:50:51Z

EDIT: I found some more issues that I would need to care about even if I fix the problem described below. So I will just take the solution I had and that worked. Thanks for answering and sorry for the mess 😆

Sorry for all the details. The question evolved while writing it....

I'm afraid I did not understand completely. The USES_VARIABLES mechanism does only decide what is passed in the variables dictionary to the code generation process so I'm not quite sure how this affects your code. Or is this about generated code that you are re-using from C++ standalone?

Yes, this is almost exclusively copied C++ standalone code (with minor modifications).

The USES_VARIABLES mechanism adds items to the variables dictionary but it also adds the correct array name to template_kwds, such that templates can use e.g. {{t}} (this is happening here in Device.code_object()). The array name is determined through device.get_array_name(), which I define in CUDACodeGenerator (just as CPPCodeGenerator). The only thing I can't control is that little [0] index added in line 304 🙃.

Ok, this seems to be quite a special case. Maybe just not worth the effort. But since I already sank so much time into understanding this, I'll suggest something ^^. We could:

Change that line 304 in Device.code_object() to
```
 pointer_name += generator.get_scalar_var_suffix(var)
```
And define a new class method for CodeObject that all generators inherit but that I could overwrite:
```
def get_scalar_var_suffix(var):
    return '[0]'
```
EDIT: If I ever want to implement this, I will also have to take care about the template that the variables are used in, since host and device code would differ. So it would need another argument in get_scarlar_var_suffix(var, template_name) or such to decide depending on template / codeobject what to do. Or it would just need Clock variables as non-pointers on the host too.
Or just forget about it and I use the other solution that works for me :)

…ARIABLES" This reverts commit 02c29d8. Decided against this implementation, see my comments in PR #181.

Keep until brian2 is updated. Brian2 merged change in PR brian-team/brian2#1215

This introduces {{kernel_lines}} as template parameters that can be filles in CUDACodeGenerator to defince codeobject specific extra lines in kernel functions.

brian-team/brian2@9bc29d844

denisalevi · 2020-09-03T20:35:05Z

Tests are passing. Merging.

…ARIABLES" This reverts commit 02c29d8. Decided against this implementation, see my comments in PR #181.

Add cuda implementation for TimedArray

denisalevi added 5 commits August 26, 2020 19:52

Update CUDACodeGenerator class with changes in CPPCodeGenerator

99be525

We copied the CPPCodeGenerator class instead of copying because of function implementation issues (see PR #39). This just updates some changes in CPPCodeGenerator.

Update another Brian2 change in CPPCodeGenerator

51b467c

See brian-team/brian2@ee3356f

More Brian2 change update and move function formating

843625c

Add cuda implementation for TimedArray (WIP)

7453a7c

- Only works for TimedArray in device code - Bug: timedarray seems to not always use the codeobject owner's clock, needs fix

denisalevi changed the base branch from KERNEL_VARIABLES_cleanup to master August 26, 2020 18:19

denisalevi changed the base branch from master to KERNEL_VARIABLES_cleanup August 26, 2020 18:19

Pass Clock variables as kernel parameters (by value)

34e1f65

Clocks only run on the host. It seems my previous attemt at copying time variables to the host when needed did not work in all cases.

Base automatically changed from KERNEL_VARIABLES_cleanup to master August 26, 2020 19:02

denisalevi added 4 commits August 26, 2020 21:18

Add missing brian namespace for 2d timedarray

b81e41d

Remove previous attempt to pass clock variables to kernels

4a10125

- Original commit that removed clock sync: 658925b - With the commit before this one, I implemented a better (and working) way to pass time variables by value to kernels

Fix bug due to _python_rand -> rand function name change

5e12d03

With 51b467c, the rand/n function names changed and where not caught anymore in the CUDAStandaloneDevice

Fix typo bugs introduced in updating CUDACodeGenerator

97b2901

denisalevi commented Aug 26, 2020

View reviewed changes

denisalevi added 4 commits August 26, 2020 23:28

Add TimedArray implementation running in host code

648a967

Fix ratemonitor after clock variable change

9b8fb9a

Remove unused get_array_name argument

dae56be

denisalevi added 9 commits August 27, 2020 19:03

Revert "For Clock vars, change scalar/vector code instead of KERNEL_V…

aac6418

…ARIABLES" This reverts commit 02c29d8. Decided against this implementation, see my comments in PR #181.

Update submodule brian2.diff for TimedArray change

d60b6bd

Keep until brian2 is updated. Brian2 merged change in PR brian-team/brian2#1215

Implement _namespace{var} access in user function implementation

d57aa76

Rename template CONSTANTS, PARAMETERS, VARIBALES

66cbe48

Fix _namespace{var} implementation in user functions

4287e03

This introduces {{kernel_lines}} as template parameters that can be filles in CUDACodeGenerator to defince codeobject specific extra lines in kernel functions.

Move some tests

64ddc45

Add cluster node option to remote test suite script

b8941c4

Fix bug: Avoid repetition of same line in kernel_lines

a57b17a

Brian2 fix: Only add function dependencies once to the code if repeated

a7a861f

brian-team/brian2@9bc29d844

denisalevi merged commit 7c79915 into master Sep 3, 2020

denisalevi mentioned this pull request Sep 4, 2020

Add cuda implementations to TimedArray and BinomialFunction classes #47

Closed

denisalevi added a commit that referenced this pull request Jun 10, 2021

Revert "For Clock vars, change scalar/vector code instead of KERNEL_V…

5c36dba

…ARIABLES" This reverts commit 02c29d8. Decided against this implementation, see my comments in PR #181.

denisalevi added a commit that referenced this pull request Jun 10, 2021

Merge pull request #181 from brian-team/timedarray_implementation

3e26521

Add cuda implementation for TimedArray

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cuda implementation for TimedArray #181

Add cuda implementation for TimedArray #181

denisalevi commented Aug 26, 2020

denisalevi Aug 26, 2020 •

edited

Loading

denisalevi Aug 27, 2020 •

edited

Loading

mstimberg Aug 27, 2020

denisalevi Aug 27, 2020

mstimberg Aug 28, 2020

denisalevi Aug 28, 2020

denisalevi commented Aug 27, 2020

mstimberg commented Aug 27, 2020

denisalevi commented Aug 27, 2020 •

edited

Loading

denisalevi commented Sep 3, 2020

Add cuda implementation for TimedArray #181

Add cuda implementation for TimedArray #181

Conversation

denisalevi commented Aug 26, 2020

denisalevi Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

denisalevi Aug 27, 2020 • edited Loading

Choose a reason for hiding this comment

mstimberg Aug 27, 2020

Choose a reason for hiding this comment

denisalevi Aug 27, 2020

Choose a reason for hiding this comment

mstimberg Aug 28, 2020

Choose a reason for hiding this comment

denisalevi Aug 28, 2020

Choose a reason for hiding this comment

denisalevi commented Aug 27, 2020

mstimberg commented Aug 27, 2020

denisalevi commented Aug 27, 2020 • edited Loading

denisalevi commented Sep 3, 2020

denisalevi Aug 26, 2020 •

edited

Loading

denisalevi Aug 27, 2020 •

edited

Loading

denisalevi commented Aug 27, 2020 •

edited

Loading