Add CUDA.peakflops() #203

maleadt · 2020-06-04T05:42:05Z

No description provided.

carstenbauer · 2022-02-23T13:27:36Z

I'm currently attempting something like this in a package (we could add this feature to CUDA.jl afterwards). However, I wonder how you would implement this to reliably achieve peak performance? (In particular, given that you marked this as "good first issue".)

Currently I'm taking a very straightforward approach:

function peakflops_gpu(; device=CUDA.device(), dtype=Float32, size=20_000, nmatmuls=5, nbench=5, verbose=true)
    device!(device) do
        C = CUDA.zeros(dtype, size, size)
        A = CUDA.rand(dtype, size, size)
        B = CUDA.rand(dtype, size, size)

        t = Inf
        for i in 1:nbench
            Δt = CUDA.@elapsed for _ in 1:nmatmuls
                mul!(C, A, B)
            end
            t = min(t, Δt)
        end

        flops = (size^3 * nmatmuls) / t
        if verbose
            printstyled("Peakflops (TFLOPS):\n"; bold=true)
            print(" └ max: ")
            printstyled(round(flops * 1e-12; digits=2), "\n"; color=:green, bold=true)
        end
        return flops
    end
end

But this "only" gives

Peakflops (TFLOPS):
 └ max: 62.44

on a A100 SXM4 40GB, which (I think) has to be compared to 156 TFLOPS that NVIDIA seems to report as peak performance. (Note that this is with CUDA.math_mode!(CUDA.FAST_MATH).)

maleadt · 2022-02-23T14:02:22Z

There's also https://github.com/JuliaGPU/CUDA.jl/blob/master/examples/peakflops.jl, which I tried a couple of years ago. I deemed it a good first issue because it's not terribly complicated from the CUDA.jl/Julia side, but it does need some figuring out to come up with a good implementation, I guess.

maleadt added the enhancement New feature or request label Jun 4, 2020

maleadt added the good first issue Good for newcomers label Sep 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CUDA.peakflops() #203

Add CUDA.peakflops() #203

maleadt commented Jun 4, 2020

carstenbauer commented Feb 23, 2022

maleadt commented Feb 23, 2022

Add CUDA.peakflops() #203

Add CUDA.peakflops() #203

Comments

maleadt commented Jun 4, 2020

carstenbauer commented Feb 23, 2022

maleadt commented Feb 23, 2022