Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: sortperm! seems to perform much slower than it should #2293

Closed
AhmedSalih3d opened this issue Mar 17, 2024 · 1 comment
Closed

BUG: sortperm! seems to perform much slower than it should #2293

AhmedSalih3d opened this issue Mar 17, 2024 · 1 comment
Labels
performance How fast can we go?

Comments

@AhmedSalih3d
Copy link

For full thread see:

https://discourse.julialang.org/t/how-to-sort-an-array-based-on-another-on-gpu-cuda-efficiently/111693/2

Code to reproduce:

using CUDA
using BenchmarkTools

function reorder_vectors!(sorted_indices, vec1, vec2, vec3)
    sortperm!(sorted_indices, vec1)
    vec1 .= vec1[sorted_indices]
    vec2 .= vec2[sorted_indices]
    vec3 .= vec3[sorted_indices]
end

### GPU TEST
# Initialize vectors
n = 10_000_000
vec1 = CUDA.rand(n)
vec2 = CUDA.rand(n)
vec3 = CUDA.rand(n)
sorted_indices = CUDA.zeros(Int, n)

# Benchmark memory allocation and execution time
mem_allocated = CUDA.@allocated reorder_vectors!(sorted_indices, vec1, vec2, vec3)
execution_time = @benchmark CUDA.@sync reorder_vectors!($sorted_indices, $vec1, $vec2, $vec3)

println("GPU Memory allocated: $mem_allocated bytes")
display(execution_time)

### CPU Test
vec1 = Array(vec1)
vec2 = Array(vec2)
vec3 = Array(vec3)
sorted_indices = zeros(Int, n)

# Benchmark memory allocation and execution time
mem_allocated  = @allocated reorder_vectors!(sorted_indices, vec1, vec2, vec3)
execution_time = @benchmark reorder_vectors!($sorted_indices, $vec1, $vec2, $vec3)

println("CPU Memory allocated: $mem_allocated bytes")
display(execution_time)
###

Results:

image

Hope I haven't done something silly :)

Kind regards

@AhmedSalih3d AhmedSalih3d added the bug Something isn't working label Mar 17, 2024
@maleadt
Copy link
Member

maleadt commented Mar 18, 2024

This is not the kind of issue I was suggesting in https://discourse.julialang.org/t/how-to-sort-an-array-based-on-another-on-gpu-cuda-efficiently/111693/2?u=maleadt; if anything, it shows that GPU sort in isolation performs better than CPU sort. At the very least, the benchmarking problem needs to be cleared up, as it doesn't make sense that adding additional operations speeds up the sort (which is non-lazy).

Right now, this issue doesn't offer any additional information over #937, so closing in favor of that issue.

@maleadt maleadt closed this as not planned Won't fix, can't repro, duplicate, stale Mar 18, 2024
@maleadt maleadt added performance How fast can we go? and removed bug Something isn't working labels Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance How fast can we go?
Projects
None yet
Development

No branches or pull requests

2 participants