Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix build issue with device_run_length_encode #31

Merged
merged 2 commits into from
Aug 28, 2018

Conversation

aaronenyeshi
Copy link
Contributor

This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.

This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.
@aaronenyeshi
Copy link
Contributor Author

aaronenyeshi commented Aug 24, 2018

This will fix the build issues with missing copy operator with __device__ attribute, and initialization is not support for __shared__ variables on CUDA.

Running RocprimDeviceRunLengthEncode still has some failures. Currently results are:

[----------] Global test environment tear-down
[==========] 26 tests from 13 test cases ran. (2628 ms total)
[  PASSED  ] 23 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns, where TypeParam = params<double, int, 3u, 5u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns, where TypeParam = params<float, int, 1u, 10u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false>

 3 FAILED TESTS

Please refer to Issue #22 for more details.

@aaronenyeshi aaronenyeshi requested review from jszuppe and removed request for ex-rzr August 24, 2018 21:05
@aaronenyeshi
Copy link
Contributor Author

Full test output:

Running main() from /root/rocPRIM/build/googletest-src/googletest/src/gtest_main.cc
[==========] Running 26 tests from 13 test cases.
[----------] Global test environment set-up.
[----------] 2 tests from RocprimDeviceRunLengthEncode/0, where TypeParam = params<int, int, 1u, 1u, true>
[ RUN      ] RocprimDeviceRunLengthEncode/0.Encode
[       OK ] RocprimDeviceRunLengthEncode/0.Encode (223 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/0.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/0.NonTrivialRuns (396 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/0 (619 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/1, where TypeParam = params<double, int, 3u, 5u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/1.Encode
[       OK ] RocprimDeviceRunLengthEncode/1.Encode (150 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:405: Failure
Expected equality of these values:
  offsets_output[i]
    Which is: 893988769
  offsets_expected[i]
    Which is: 18518
Google Test trace:
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:267: with size = 64316
[  FAILED  ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns, where TypeParam = params<double, int, 3u, 5u, false> (15 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/1 (165 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/2, where TypeParam = params<float, int, 1u, 10u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/2.Encode
[       OK ] RocprimDeviceRunLengthEncode/2.Encode (118 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:405: Failure
Expected equality of these values:
  offsets_output[i]
    Which is: 380994
  offsets_expected[i]
    Which is: 84640
Google Test trace:
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:267: with size = 2020609
[  FAILED  ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns, where TypeParam = params<float, int, 1u, 10u, false> (146 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/2 (264 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/3, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/3.Encode
[       OK ] RocprimDeviceRunLengthEncode/3.Encode (82 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:405: Failure
Expected equality of these values:
  offsets_output[i]
    Which is: 42517
  offsets_expected[i]
    Which is: 21344
Google Test trace:
/root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:267: with size = 64316
[  FAILED  ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false> (11 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/3 (93 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/4, where TypeParam = params<test_utils::custom_test_type<int>, unsigned int, 20u, 100u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/4.Encode
[       OK ] RocprimDeviceRunLengthEncode/4.Encode (60 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/4.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/4.NonTrivialRuns (64 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/4 (124 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/5, where TypeParam = params<float, unsigned long long, 100u, 400u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/5.Encode
[       OK ] RocprimDeviceRunLengthEncode/5.Encode (62 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/5.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/5.NonTrivialRuns (62 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/5 (124 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/6, where TypeParam = params<unsigned int, unsigned int, 200u, 600u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/6.Encode
[       OK ] RocprimDeviceRunLengthEncode/6.Encode (58 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/6.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/6.NonTrivialRuns (57 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/6 (115 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/7, where TypeParam = params<double, int, 100u, 2000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/7.Encode
[       OK ] RocprimDeviceRunLengthEncode/7.Encode (59 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/7.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/7.NonTrivialRuns (60 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/7 (119 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/8, where TypeParam = params<test_utils::custom_test_type<double>, test_utils::custom_test_type<int>, 10u, 30000u, true>
[ RUN      ] RocprimDeviceRunLengthEncode/8.Encode
[       OK ] RocprimDeviceRunLengthEncode/8.Encode (76 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/8.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/8.NonTrivialRuns (74 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/8 (150 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/9, where TypeParam = params<int, unsigned int, 1000u, 5000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/9.Encode
[       OK ] RocprimDeviceRunLengthEncode/9.Encode (56 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/9.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/9.NonTrivialRuns (59 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/9 (115 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/10, where TypeParam = params<unsigned int, unsigned long, 2048u, 2048u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/10.Encode
[       OK ] RocprimDeviceRunLengthEncode/10.Encode (62 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/10.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/10.NonTrivialRuns (57 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/10 (120 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/11, where TypeParam = params<unsigned int, unsigned int, 1000u, 50000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/11.Encode
[       OK ] RocprimDeviceRunLengthEncode/11.Encode (57 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/11.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/11.NonTrivialRuns (60 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/11 (117 ms total)

[----------] 2 tests from RocprimDeviceRunLengthEncode/12, where TypeParam = params<unsigned long long, test_utils::custom_test_type<double>, 100000u, 100000u, false>
[ RUN      ] RocprimDeviceRunLengthEncode/12.Encode
[       OK ] RocprimDeviceRunLengthEncode/12.Encode (252 ms)
[ RUN      ] RocprimDeviceRunLengthEncode/12.NonTrivialRuns
[       OK ] RocprimDeviceRunLengthEncode/12.NonTrivialRuns (251 ms)
[----------] 2 tests from RocprimDeviceRunLengthEncode/12 (503 ms total)

[----------] Global test environment tear-down
[==========] 26 tests from 13 test cases ran. (2628 ms total)
[  PASSED  ] 23 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] RocprimDeviceRunLengthEncode/1.NonTrivialRuns, where TypeParam = params<double, int, 3u, 5u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/2.NonTrivialRuns, where TypeParam = params<float, int, 1u, 10u, false>
[  FAILED  ] RocprimDeviceRunLengthEncode/3.NonTrivialRuns, where TypeParam = params<unsigned long long, unsigned long, 1u, 30u, false>

 3 FAILED TESTS

@@ -44,6 +44,16 @@ namespace detail
template<class Key, class Value>
struct carry_out
{
ROCPRIM_HOST_DEVICE inline
carry_out& operator=(carry_out rhs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that without this copy operator, the compiler will complain about using a default copy operator which has attribute __host__ but is being called by __device__ function. so I added this to add ROCPRIM_HOST_DEVICE.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be generated by the compiler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's might be a CUDA spec limitation where __device__ functions can only call explicitly defined __device__ functions. I will discuss it with my colleague

Copy link
Contributor Author

@aaronenyeshi aaronenyeshi Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec suggests without the attributes, it should be compiled for host code only :

The __host__ qualifier declares a function that is:
� Executed on the host,
� Callable from the host only.
It is equivalent to declare a function with only the __host__ qualifier or to declare
it without any of the __host__, __device__, or __global__ qualifier; in either
case the function is compiled for the host only. 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, I guess I worked with hcc too much and assumed implicitly generated special member functions like copy assignment operator get required attributes as needed.

Anyway, instead of implementing them it's better to write:

ROCPRIM_HOST_DEVICE inline
scan_by_key_pair& operator=(const scan_by_key_pair& rhs) = default;

That way there's no need for modification if member variables are changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that at first, but it seems that the default is only defined with __host__ :(

In file included from /root/rocPRIM/test/rocprim/test_hip_device_run_length_encode.cpp:36:
In file included from /root/rocPRIM/rocprim/include/rocprim/rocprim.hpp:74:
In file included from /root/rocPRIM/rocprim/include/rocprim/device/device_reduce_by_key_hip.hpp:34:
/root/rocPRIM/rocprim/include/rocprim/device/detail/device_reduce_by_key.hpp:67:70: error: reference to
      __host__ function 'operator=' in __host__ __device__ function
    scan_by_key_pair& operator=(const scan_by_key_pair& rhs) = default;
                                                                     ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try defaulting it separately for host and device? or better: only for device?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that works better, thanks Jakub

@@ -53,6 +63,14 @@ struct carry_out
template<class Value>
struct scan_by_key_pair
{
ROCPRIM_HOST_DEVICE inline
scan_by_key_pair& operator=(scan_by_key_pair rhs)
Copy link
Contributor

@jszuppe jszuppe Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this?

@jszuppe
Copy link
Contributor

jszuppe commented Aug 27, 2018

This will fix the build errors on HIP-Clang, however, there is still failing 3 of 26 tests in rocprim.device_run_length_encode.

With failing rocprim.hip.device_select which is used in tests that fail for rocprim.device_run_length_encode we can't say if there's anything wrong in run_length_encode_non_trivial_runs() function. You can notice that test that run run_length_encode() function pass.

@jszuppe jszuppe requested a review from ex-rzr August 27, 2018 21:05
@jszuppe jszuppe merged commit 4704c84 into ROCm:develop Aug 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants