Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CUDA.peakflops() #203

Open
maleadt opened this issue Jun 4, 2020 · 2 comments
Open

Add CUDA.peakflops() #203

maleadt opened this issue Jun 4, 2020 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@maleadt
Copy link
Member

maleadt commented Jun 4, 2020

No description provided.

@maleadt maleadt added the enhancement New feature or request label Jun 4, 2020
@maleadt maleadt added the good first issue Good for newcomers label Sep 28, 2021
@carstenbauer
Copy link
Member

I'm currently attempting something like this in a package (we could add this feature to CUDA.jl afterwards). However, I wonder how you would implement this to reliably achieve peak performance? (In particular, given that you marked this as "good first issue".)

Currently I'm taking a very straightforward approach:

function peakflops_gpu(; device=CUDA.device(), dtype=Float32, size=20_000, nmatmuls=5, nbench=5, verbose=true)
    device!(device) do
        C = CUDA.zeros(dtype, size, size)
        A = CUDA.rand(dtype, size, size)
        B = CUDA.rand(dtype, size, size)

        t = Inf
        for i in 1:nbench
            Δt = CUDA.@elapsed for _ in 1:nmatmuls
                mul!(C, A, B)
            end
            t = min(t, Δt)
        end

        flops = (size^3 * nmatmuls) / t
        if verbose
            printstyled("Peakflops (TFLOPS):\n"; bold=true)
            print(" └ max: ")
            printstyled(round(flops * 1e-12; digits=2), "\n"; color=:green, bold=true)
        end
        return flops
    end
end

But this "only" gives

Peakflops (TFLOPS):
 └ max: 62.44

on a A100 SXM4 40GB, which (I think) has to be compared to 156 TFLOPS that NVIDIA seems to report as peak performance. (Note that this is with CUDA.math_mode!(CUDA.FAST_MATH).)

@maleadt
Copy link
Member Author

maleadt commented Feb 23, 2022

There's also https://github.com/JuliaGPU/CUDA.jl/blob/master/examples/peakflops.jl, which I tried a couple of years ago. I deemed it a good first issue because it's not terribly complicated from the CUDA.jl/Julia side, but it does need some figuring out to come up with a good implementation, I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants