Skip to content
This repository has been archived by the owner on Jul 31, 2024. It is now read-only.

compilation error in gpu example gpu_device_timer #244

Closed
pkestene opened this issue Feb 6, 2022 · 4 comments
Closed

compilation error in gpu example gpu_device_timer #244

pkestene opened this issue Feb 6, 2022 · 4 comments

Comments

@pkestene
Copy link

pkestene commented Feb 6, 2022

Hello,

i'm new to timemory.
I was just trying to build with cuda/gpu support, and I have a compilation error when building gpu examples.
It is a bit weird to me. The compiler doesn't seem to be enable to find the right overload of data_tracker::store; I don't see anything wrong in the code.

Here is the full compilation command and the error:

[ 93%] Building CUDA object examples/ex-gpu/v3/CMakeFiles/ex_kernel_instrument_v3.dir/gpu_device_timer.cpp.o
cd /home/pkestene/install/timemory/git/timemory/build/cuda/examples/ex-gpu/v3 && /usr/local/cuda-11.6/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/c++ -DTIMEMORY_CMAKE -DTIMEMORY_USE_BACKENDS_EXTERN -DTIMEMORY_USE_COMMON_EXTERN -DTIMEMORY_USE_COMPONENT_EXTERN -DTIMEMORY_USE_CONFIG_EXTERN -DTIMEMORY_USE_CONTAINERS_EXTERN -DTIMEMORY_USE_CORE_EXTERN -DTIMEMORY_USE_CUDA -DTIMEMORY_USE_CUDA_EXTERN -DTIMEMORY_USE_DATA_TRACKER_EXTERN -DTIMEMORY_USE_ERT_EXTERN -DTIMEMORY_USE_EXTERN -DTIMEMORY_USE_GPU -DTIMEMORY_USE_IO_EXTERN -DTIMEMORY_USE_LIBUNWIND -DTIMEMORY_USE_MANAGER_EXTERN -DTIMEMORY_USE_NETWORK_EXTERN -DTIMEMORY_USE_NVTX -DTIMEMORY_USE_OPERATIONS_EXTERN -DTIMEMORY_USE_PRINTER_EXTERN -DTIMEMORY_USE_RUNTIME_EXTERN -DTIMEMORY_USE_RUSAGE_EXTERN -DTIMEMORY_USE_STATISTICS -DTIMEMORY_USE_STORAGE_EXTERN -DTIMEMORY_USE_TIMESTAMP_EXTERN -DTIMEMORY_USE_TIMING_EXTERN -DTIMEMORY_USE_TRIP_COUNT_EXTERN -DTIMEMORY_USE_USER_BUNDLE_EXTERN -DTIMEMORY_USE_VARIADIC_EXTERN -DTIMEMORY_USE_XML -DTIMEMORY_VEC=256 -DUNW_LOCAL_ONLY -Dex_kernel_instrument_v3_EXPORTS -I/home/pkestene/install/timemory/git/timemory/build/cuda/source -I/home/pkestene/install/timemory/git/timemory/source -I/usr/local/cuda-11.6/include -isystem=/usr/local/cuda-11.6/targets/x86_64-linux/include -arch=sm_75 -O3 -DNDEBUG --generate-code=arch=compute_75,code=[compute_75,sm_75] -arch=sm_75 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --extended-lambda -Xcompiler=-W -Xcompiler=-Wall -Xcompiler=-Wno-unknown-pragmas -Xcompiler=-Wno-ignored-attributes -Xcompiler=-Wno-attributes -Xcompiler=-Wno-missing-field-initializers -Xcompiler=-Wno-class-memaccess -Xcompiler=-fno-signaling-nans -Xcompiler=-fno-trapping-math -Xcompiler=-fno-signed-zeros -Xcompiler=-ffinite-math-only -Xcompiler=-fno-math-errno -Xcompiler=-fpredictive-commoning -Xcompiler=-fvariable-expansion-in-unroller -Xcompiler=-faligned-new -Xcompiler=-ftls-model=initial-exec -Xcompiler=-rdynamic -Xcompiler=-finline-functions -Xcompiler=-funroll-loops -Xcompiler=-ftree-vectorize -Xcompiler=-ftree-loop-optimize -Xcompiler=-ftree-loop-vectorize -lineinfo -std=c++14 -x cu -c /home/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.cpp -o CMakeFiles/ex_kernel_instrument_v3.dir/gpu_device_timer.cpp.o
/home/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.hpp(134): warning #177-D: variable "_data" was declared but never referenced

/home/pkestene/install/timemory/git/timemory/source/timemory/components/data_tracker/components.hpp(677): error: no instance of overloaded function "tim::component::data_tracker<InpT, Tag>::store [with InpT=double, Tag=gpu_data_tag]" matches the argument list
            argument types are: (std::plus<double>, double)
            object type is: tim::component::data_tracker<double, gpu_data_tag>
          detected during instantiation of "tim::component::data_tracker<InpT, Tag>::this_type *tim::component::data_tracker<InpT, Tag>::add_secondary(const std::string &, FuncT &&, T &&, tim::component::data_tracker<InpT, Tag>::enable_if_acceptable_t<T, int>) [with InpT=double, Tag=gpu_data_tag, FuncT=std::plus<double>, T=double &]" 
/home/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.cpp(90): here

The host compiler is g++-11, but I tried g++-10 also, the error is stil there.

Any help appreciated.

@jrmadsen
Copy link
Collaborator

jrmadsen commented Feb 8, 2022

Interesting... that overload is used quite often. Could you try replacing std::plus<double>{} with a lambda, e.g. [](double lhs, double rhs) { return lhs + rhs; }?

@jrmadsen
Copy link
Collaborator

jrmadsen commented Feb 8, 2022

Ah based on this [ 93%] Building CUDA object examples/ex-gpu/v3/CMakeFiles/ex_kernel_instrument_v3.dir/gpu_device_timer.cpp.o, I think this might be an NVCC bug. Unfortunately NVCC is quite unreliable when it comes to templates. If the above fails, could you try another CUDA version instead of a different GCC version to try to verify it is a CUDA 11.6 bug?

@pkestene
Copy link
Author

pkestene commented Feb 8, 2022

Thanks for your answer, unfortunately :

  • same error with cuda toolkit 11.5.2
  • if I change std::plus<double>{} into [](double lhs, double rhs) { return lhs + rhs; }, the error is similar
/data/pkestene/install/timemory/git/timemory/source/timemory/components/data_tracker/components.hpp(677): error: no instance of overloaded function "tim::component::data_tracker<InpT, Tag>::store [with InpT=double, Tag=gpu_data_tag]" matches the argument list
            argument types are: (lambda [](double, double)->double, double)
            object type is: tim::component::data_tracker<double, gpu_data_tag>
          detected during instantiation of "tim::component::data_tracker<InpT, Tag>::this_type *tim::component::data_tracker<InpT, Tag>::add_secondary(const std::string &, FuncT &&, T &&, tim::component::data_tracker<InpT, Tag>::enable_if_acceptable_t<T, int>) [with InpT=double, Tag=gpu_data_tag, FuncT=lambda [](double, double)->double, T=double &]" 
/data/pkestene/install/timemory/git/timemory/examples/ex-gpu/v3/gpu_device_timer.cpp(92): here

@jrmadsen
Copy link
Collaborator

jrmadsen commented Feb 8, 2022

Yeah, I was able to reproduce it. It is definitely a NVCC bug -- if I make the necessary changes to compile gpu_device_timer.cpp and gpu_op_tracker.cpp with the host compiler (basically guarding the kernel launches and device functions with #if defined(TIMEMORY_GPUCC) and tweaking the CMakeLists.txt to only set ex_kernel_instrument.cpp as a CUDA source) then it compiles and runs fine. Let me think a bit more on how this should be handled and get back to you bc I am getting tired of having to create workarounds for templates with NVCC, e.g. #237.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants