Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy!(dst, src) and copyto!(dst, src) are significantly slower and allocate more memory than copyto!(dest, do, src, so[, N]) #126

Closed
colinxs opened this issue Jul 2, 2019 · 1 comment
Labels
cuda array Stuff about CuArray. performance How fast can we go?

Comments

@colinxs
Copy link

colinxs commented Jul 2, 2019

I ran into some odd behavior today. Not quite sure what's going on so I thought I'd leave this here.

Cheers!

Describe the bug
copy!(dst, src) and copyto!(dst, src) are significantly slower and allocate more memory than copyto!(dest, do, src, so[, N])

To Reproduce

julia> CuArrays.allowscalar(false);                                                                         
julia> const x=rand(Float32, 1000,1000);
julia> const xc=cu(x);
julia> @btime @inbounds copyto!($xc, 1, $x, 1, length($x));                                       
  353.422 μs (3 allocations: 64 bytes)                                                                                                                                                    
julia> @btime @inbounds copyto!($xc, 1, $x, 1);                                                  
  353.290 μs (3 allocations: 64 bytes)                                                                                                                                                            
julia> @btime @inbounds copyto!($xc, $x);                                                        
  584.373 μs (5 allocations: 3.81 MiB)                                                                                                                
julia> @btime @inbounds copy!($xc, $x);                                                          
  581.387 μs (5 allocations: 3.81 MiB)                                                           

Expected behavior
For Array, there is virtually no speed or allocation between any of the above variations. I would expect the same for CuArray.

Build log
(don't know why it failed, I'll take a look at some point)

Could not find library 'cudnn'.

CuArrays.jl has been built successfully, but there were warnings.
Some functionality may be unavailable.

Environment details (please complete this section)
Details on Julia:

Julia Version 1.1.1
Commit 55e36cc308 (2019-05-16 04:10 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6

Julia packages:

  • CuArrays.jl: 1.0.2
  • CUDAnative.jl: 2.2.0
  • CUDAdrv.jl: 3.0.1
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Fri_Feb__8_19:08:17_PST_2019
Cuda compilation tools, release 10.1, V10.1.105

Additional context

@maleadt maleadt transferred this issue from JuliaGPU/CuArrays.jl May 27, 2020
@maleadt maleadt added cuda array Stuff about CuArray. performance How fast can we go? labels May 27, 2020
@maleadt
Copy link
Member

maleadt commented Apr 27, 2024

I don't see a difference anymore on current CUDA.jl:

julia> @btime @inbounds copyto!($xc, 1, $x, 1, length($x));
  187.309 μs (5 allocations: 80 bytes)

julia> @btime @inbounds copyto!($xc, 1, $x, 1);
  193.229 μs (5 allocations: 80 bytes)

julia> @btime @inbounds copyto!($xc, $x);
  195.369 μs (5 allocations: 80 bytes)

julia> @btime @inbounds copy!($xc, $x);
  200.839 μs (5 allocations: 80 bytes)

@maleadt maleadt closed this as completed Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda array Stuff about CuArray. performance How fast can we go?
Projects
None yet
Development

No branches or pull requests

2 participants