-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix build issue with device_run_length_encode #31
Fix build issue with device_run_length_encode #31
Conversation
This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.
This will fix the build issues with missing copy operator with Running RocprimDeviceRunLengthEncode still has some failures. Currently results are:
Please refer to Issue #22 for more details. |
Full test output:
|
@@ -44,6 +44,16 @@ namespace detail | |||
template<class Key, class Value> | |||
struct carry_out | |||
{ | |||
ROCPRIM_HOST_DEVICE inline | |||
carry_out& operator=(carry_out rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that without this copy operator, the compiler will complain about using a default copy operator which has attribute __host__
but is being called by __device__
function. so I added this to add ROCPRIM_HOST_DEVICE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be generated by the compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's might be a CUDA spec limitation where __device__
functions can only call explicitly defined __device__
functions. I will discuss it with my colleague
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec suggests without the attributes, it should be compiled for host code only :
The __host__ qualifier declares a function that is:
� Executed on the host,
� Callable from the host only.
It is equivalent to declare a function with only the __host__ qualifier or to declare
it without any of the __host__, __device__, or __global__ qualifier; in either
case the function is compiled for the host only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, ok, I guess I worked with hcc
too much and assumed implicitly generated special member functions like copy assignment operator get required attributes as needed.
Anyway, instead of implementing them it's better to write:
ROCPRIM_HOST_DEVICE inline
scan_by_key_pair& operator=(const scan_by_key_pair& rhs) = default;
That way there's no need for modification if member variables are changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do that at first, but it seems that the default
is only defined with __host__
:(
In file included from /root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:36:
In file included from /root/rocPRIM/rocprim/include/rocprim/rocprim.hpp:74:
In file included from /root/rocPRIM/rocprim/include/rocprim/device/device_reduce_by_key_hip.hpp:34:
/root/rocPRIM/rocprim/include/rocprim/device/detail/device_reduce_by_key.hpp:67:70: error: reference to
__host__ function 'operator=' in __host__ __device__ function
scan_by_key_pair& operator=(const scan_by_key_pair& rhs) = default;
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try defaulting it separately for host and device? or better: only for device?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that works better, thanks Jakub
@@ -53,6 +63,14 @@ struct carry_out | |||
template<class Value> | |||
struct scan_by_key_pair | |||
{ | |||
ROCPRIM_HOST_DEVICE inline | |||
scan_by_key_pair& operator=(scan_by_key_pair rhs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this?
With failing |
This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.