Tutorial manually specifies number of threads #10

ArnoStrouwen · 2020-04-11T09:51:21Z

In the tutorial the number of threads is always manually specified. For some users it might be difficult to find the suitable number for their GPU. Could an example be added to the documentation on how to automate this?

maleadt · 2020-04-13T07:01:16Z

Thanks!

Could an example be added to the documentation on how to automate this?

Sure, but I think it needs some more explanation about why (amount of threads is not unbounded, also affects performance) and how (pass a closure instead, use the occupancy API). You also can just pass config=configurator, see e.g. uses in CuArrays.

ArnoStrouwen · 2020-05-22T03:32:54Z

What is the reason you do not directly use the number of blocks suggested by launch_configuration? For the example in the tutorial this gives me blocks = 16 and is about 10% faster for me, than blocks = 1024 from:

    function configurator(kernel)
        config = launch_configuration(kernel.fun)

        threads = min(total_threads, config.threads)
        blocks = cld(total_threads, threads)

        return (threads=threads, blocks=blocks)
    end

maleadt · 2021-10-04T08:17:50Z

Sorry, forgot about this. I've included a reworked version of your suggestion in 179498a.

And to answer your question: the block count returned by the occupancy API is the required amount of blocks to fully saturate the GPU. But it obviously depends on the input size whether you should launch as many. It can be useful to change the launch configuration (e.g. decide to use a loop inside your kernel or not) based on this value, but it's generally not possible to use it directly as the number of blocks to launch (unless your kernel is very generic).

Tutorial manually specifies number of threads

48e2120

In the tutorial the number of threads is always manually specified. For some users it might be difficult to find the suitable number for their GPU. Could an example be added to the documentation on how to automate this?

explanation for automating launch configuration

be2aa0b

maleadt added the documentation Improvements or additions to documentation label May 25, 2020

maleadt force-pushed the master branch 11 times, most recently from d3147a2 to cf97309 Compare June 12, 2020 15:49

maleadt closed this Oct 4, 2021

glwagner mentioned this pull request Feb 16, 2022

Cross-device copy of wrapped arrays fails #1377

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tutorial manually specifies number of threads #10

Tutorial manually specifies number of threads #10

ArnoStrouwen commented Apr 11, 2020

maleadt commented Apr 13, 2020

ArnoStrouwen commented May 22, 2020

maleadt commented Oct 4, 2021

Tutorial manually specifies number of threads #10

Tutorial manually specifies number of threads #10

Conversation

ArnoStrouwen commented Apr 11, 2020

maleadt commented Apr 13, 2020

ArnoStrouwen commented May 22, 2020

maleadt commented Oct 4, 2021