Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator) #186

denisalevi · 2020-09-03T11:34:34Z

PR for #49 [WIP]

We have three versions in the commits below that should be benchmarked against each other (see #49 discussion):

Original algorithm with synapses * targets threads in total (d2ff3ff )
One thread per synapse, each thread using atomicAdd on the target of its synapse in global memory to calculate summed variables (c54e64c)
Same as 2. but using cudaMemset to reset summed variables instead of thrust::fill (653d487)

EDIT
This PR turned out to touch a couple of issue at the same time:

Summed variables are working now, using global atomicAdd on postsynaptic neurons. I opened Summed variables: Calculate partial sums in parallel in shared memory instead of using global atomicAdd #197 for potential further optimizations and Add a summed variable example to our benchmarks #198 for adding a summed variable example to our benchmarks.
To have all summed variables test passing, I fixed the synapses to synapses issues (Add support for Synapses to Synapses #133).
I needed the spikegenerator to work for some tests, so I started fixing Spikegenerator period argument not working #48, but some tests are still failing and some more tests are needed, this will happen in PR Fix spikegenerator.cu template #194.
And this PR also refactored some additional code: Jinja template block names changed and get_array_name implementation now takes into account the different prefixes we use for device and host variables.

I did not do the benchmarking that I suggested above. This could be done in #197 if it turns out to be relevant.

This is needed when the index is used when setting group variables with boolean indexing (e.g. `syn.w['k == 0'] = ...`, where `k` is the multisynaptic index)

See PR brian-team/brian2#1038

This reverts commit 7b764bd. Was on wrong Brian2 branch (future release). Reapply this commit once Brian2 is updated.

Parallelize threads over synapses, each thread does atomicAdd of the synapse variable on the postsynaptic neuron of its synapse.

Better performance (eyeballed and assumed). Not 100% sure if this works for all CUDA GPUs. Asked on NVIDIA forum: https://forums.developer.nvidia.com/t/can-i-set-a-floats-to-zero-with-cudamemset/153706

This also sets CONSTANTS correctly in host templates, where previously device pointers were used.

Fixes bug in periodicity (see #48) and makes sure the kernel works with multiple spikespaces due to homogeneous delays. See comments in template for potential optimizations.

This removes setting the source/target sizes in synapses_initialise_queue template at code generation and reads them instead at C++/CUDA runtime, after synapses are generated.

Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator)

denisalevi changed the base branch from master to timedarray_implementation September 3, 2020 11:34

denisalevi force-pushed the summed_array_fix branch from 456f59b to 653d487 Compare September 3, 2020 20:34

denisalevi mentioned this pull request Sep 4, 2020

Summed variables implementation fix and update #49

Closed

denisalevi added 12 commits October 23, 2020 11:20

Make multisynaptic_index available on device

77937de

This is needed when the index is used when setting group variables with boolean indexing (e.g. `syn.w['k == 0'] = ...`, where `k` is the multisynaptic index)

Update synapses generator template following brian2 change

b9d03de

See PR brian-team/brian2#1038

Add missing N to summed variable template

14dc007

Revert "Update synapses generator template following brian2 change"

824624a

This reverts commit 7b764bd. Was on wrong Brian2 branch (future release). Reapply this commit once Brian2 is updated.

Don't delete synaptic pre/post idcs when summed vars are used

8f76fc6

Fix summed variable template

361691f

Fix summed variable for subgroups and rename vars

5fb1355

Rename and clean up template blocks

ce65052

Implement summed variables with global atomicAdds

4c6e2e0

Parallelize threads over synapses, each thread does atomicAdd of the synapse variable on the postsynaptic neuron of its synapse.

Use cudaMemset instead of thrust::fill to reset summed variables

afa3ad7

Better performance (eyeballed and assumed). Not 100% sure if this works for all CUDA GPUs. Asked on NVIDIA forum: https://forums.developer.nvidia.com/t/can-i-set-a-floats-to-zero-with-cudamemset/153706

Modify get_array_name for host/device pointers

1f92c18

This also sets CONSTANTS correctly in host templates, where previously device pointers were used.

Modify summed variable template to work with synapses to synapses

d50ad33

denisalevi force-pushed the summed_array_fix branch from b07e906 to d50ad33 Compare November 10, 2020 12:52

denisalevi added 2 commits November 10, 2020 22:39

Fix spikegenerator.cu template

2902280

Fixes bug in periodicity (see #48) and makes sure the kernel works with multiple spikespaces due to homogeneous delays. See comments in template for potential optimizations.

Merge branch 'spikegenerator-fix' into summed_array_fix

5741821

denisalevi changed the base branch from timedarray_implementation to master March 24, 2021 12:08

denisalevi added 11 commits March 25, 2021 00:43

Fix block naming typo

5555992

Set -arch=sm_... correctly for cognition cluster

cfc0eba

Update remote test suite script

bcf97cd

Make gpu detection in utils Python 2 compatible

635a269

Use -binding instead of -pe for test suite on cluster

b861c97

Merge branch 'master' into summed_array_fix

b07e84b

Merge branch 'master' into summed_array_fix

8dbba87

Fix synapses initialization to work with synapses to synapses

9a6b1d6

This removes setting the source/target sizes in synapses_initialise_queue template at code generation and reads them instead at C++/CUDA runtime, after synapses are generated.

Fix typo error in get_array_name

69cd71c

Update cluster test suite scripts to accept arguments

0817280

Merge branch 'master' into summed_array_fix

50860a7

denisalevi added 2 commits April 8, 2021 15:57

Add .git-blame-ignore-revs and ignore indentation fix commit

328c585

Merge branch 'master' into summed_array_fix

940552a

denisalevi mentioned this pull request Apr 8, 2021

Summed variables: Calculate partial sums in parallel in shared memory instead of using global atomicAdd #197

Open

denisalevi changed the title ~~Fix and improve summed variables implementation~~ Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator) Apr 8, 2021

denisalevi merged commit 61e4a76 into master Apr 8, 2021

This was referenced Apr 8, 2021

Add support for Synapses to Synapses #133

Closed

Spikegenerator period argument not working #48

Closed

denisalevi added a commit that referenced this pull request Jun 10, 2021

Merge pull request #186 from brian-team/summed_array_fix

ce95864

Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator)

denisalevi mentioned this pull request Jul 7, 2021

Astrocytes example fails in brian2cuda #214

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator) #186

Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator) #186

denisalevi commented Sep 3, 2020 •

edited

Loading

Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator) #186

Fix and improve summed variables implementation (including adding synapses to synapses support and start fixing spikegenerator) #186

Conversation

denisalevi commented Sep 3, 2020 • edited Loading

denisalevi commented Sep 3, 2020 •

edited

Loading