-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cuda implementation for TimedArray #181
Conversation
We copied the CPPCodeGenerator class instead of copying because of function implementation issues (see PR #39). This just updates some changes in CPPCodeGenerator.
See brian-team/brian2@ffde29e Function namespace is dealt with in CPPStandaloneDevice instead of CPPCodeGenerator. We inherit the change from CPPStandaloneDevice, but didn't update the change in CUDACodeGenerator (which doesn not inherit from CPPCodeGenerator (PR #39))
- Only works for TimedArray in device code - Bug: timedarray seems to not always use the codeobject owner's clock, needs fix
Clocks only run on the host. It seems my previous attemt at copying time variables to the host when needed did not work in all cases.
brian2cuda/cuda_generator.py
Outdated
#support_code.append('__device__ {0} _namespace{1};'.format(type_str, | ||
# ns_key)) | ||
#pointers.append('_namespace{0} = {1};'.format(ns_key, ns_key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While implementing TimedArray, I came across some code that has been updated in Brian like 5 years ago but never made it into Brian2CUDA 😅. So I've been updating some things and came across this section here on function namespaces (guess this changed in brian-team/brian2#380). But I don't think I actually understand what the use case is where I need these namespaces. @mstimberg Could you shed some light here? :)
This function is copied from CPPCodeGenerator
to CUDACodeGenerator
. I have deleted the support_code
line here because our variables are defined globally in objects.cu
anyways and since we have both host and device pointers, it depends on the template which variable should be use (our synapse generator template is running on the host, all others codeobjects run on the device). But I didn't quite get why cpp needs this file global variable in the first place, aren't the variables in objects.cpp
global as well?
And we don't use the pointers
list, we do all that in brian2cuda/device.py.
Question is: what situations am I breaking here? For the TimedArray
implementation, I need the device pointer d_timedarray_values
array that is global anyways, no need for the _namespace...
version.
Final tests are running right now, but I don't think I saw any test failing because of a namespace issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just came across a documentation section on user defined namespaces. I guess that is it. And I found the Brian2
tests, which are not marked as standalone-compatible
and therefore not run. I'll adapt them. Ok, I have a starting point now, I'll look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if I remember correctly this is mostly so that the values from the namespace are accessible under a known name for user-written code. And particularly the name is the same for runtime and standalone mode, so in principle you could share the C++ implementation which was relevant when we still supported weave
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with it and I still don't understand it. Here is an example I used with C++ standalone:
set_device('cpp_standalone')
my_var = 0.1 * np.array(np.arange(5))
@implementation('cpp', '''
float foo(int i)
{
using namespace brian;
return my_var[i];
}''' , namespace={'my_var': my_var})
@check_units(i=1, result=1)
def foo(i):
return my_var[i]
G = NeuronGroup(5, 'v : 1')
G.run_regularly('v = foo(i)')
net = Network(G)
net.run(defaultclock.dt)
assert_allclose(G.v_[:], my_var)
This generates the following code in neurongroup_run_regularly_codeobject.cpp
:
// support code
static double* _namespacemy_var;
float foo(int i)
{
using namespace brian;
return my_var[i];
}
// pointer_lines (uses namespace brian)
_namespacemy_var = my_var;
The file-global _namespacemy_var
variable is not used at all, instead the global my_var
variable is used. And the user still has to define using namespace brian;
in their function implementation, otherwise it doesn't work.
If the goal was to make my_var
accessible without having to add the brian::
namespace, then the cpp function should use the _namespacemy_var
I guess. But it doesn't.
And the only place in the brian2 code that I found where this _namespace{var}
mechanism is actually used is in the TimedArray
implementation, where the return value is hardcoded to be called _namespace{name}
(see here).
I think I'm still missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your function code gets included verbatim and does not get rewritten automatically (maybe it should?). The correct way to write your example if you follow the instructions in the documentation is to write:
@implementation('cpp', '''
float foo(int i)
{
return _namespacemy_var[i];
}''' , namespace={'my_var': my_var})
It's not really relevant anymore, but back in the days with weave
this would have worked for both runtime and standalone mode whereas your variant hardcodes knowledge about standalone mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh I see. Got it now. I didn't see that part in the documentation. Alright, if it is in the docs, it should work with _namespace{var}
. I implemented this now for brain2CUDA as well. Turns out, the mechanism is even useful for brian2CUDA, since this way the same function can be used verbatim in both, host and device code and I can just let the _namespace{var}
pointer point to either host or device data, depending on codeobject, without the user having to care about it. Perfect.
Thanks again for your help!
This is an attempt for a cleaner solution that removes the unnecessary value -> pointer -> value copying of Clock variables we had before. But it breaks the USES_VARIABLES mechanism. I needed to redefine _clock_t/dt to not use the [0] index.
Hi @mstimberg , I have another implementation choice question here. All our My solution here was to get the address of the __global__ void kernel_codobject(
double _value_neurongroup_clock_t // Clock variable passed by value
)
{
...
// KERNEL_VARIABLES lines, getting the pointer
double* _ptr_array_neurongroup_glock_t = &_value_neurongroup_clock_t;
...
// scalar or vector code (unchanged from what cpp_standalone produces)
double __source_value_neurongroup_t = _ptr_array_neurongroup_clock_t[0];
// some array that uses t, e.g. timedarray
const double _lio_1 = _timedarray(__source_value_neurongroup_t);
...
} But I just realized that this is another hacky solution because I didn't have all the Brian internals in mind well enough ... So instead of extracting the pointer in __global__ void kernel_codobject(
double _value_neurongroup_clock_t // Clock variable passed by value
)
{
...
// nothing in KERNEL_VARIABLES lines
...
// scalar or vector code (removed the [0] in the t variable read)
double __source_value_neurongroup_t = _ptr_array_neurongroup_clock_t;
// some array that uses t, e.g. timedarray
const double _lio_1 = _timedarray(__source_value_neurongroup_t);
...
} This seems to be the cleaner solution and it removes the unnecessary And since this becomes a bit more complicated, I thought maybe the easiest way would have been to just change the So, sorry for the endless post. Hope it is not too unclear. What is your take, how would you implement this? I realize I am adding some hacky, not-Brian-intended solutions everywhere, so I wanted to start asking for the intended approach when possible :P |
I'm afraid I did not understand completely. The |
EDIT: I found some more issues that I would need to care about even if I fix the problem described below. So I will just take the solution I had and that worked. Thanks for answering and sorry for the mess 😆 Sorry for all the details. The question evolved while writing it....
Yes, this is almost exclusively copied C++ standalone code (with minor modifications). The Ok, this seems to be quite a special case. Maybe just not worth the effort. But since I already sank so much time into understanding this, I'll suggest something ^^. We could:
|
Keep until brian2 is updated. Brian2 merged change in PR brian-team/brian2#1215
This introduces {{kernel_lines}} as template parameters that can be filles in CUDACodeGenerator to defince codeobject specific extra lines in kernel functions.
Tests are passing. Merging. |
Add cuda implementation for TimedArray
PR for #47