Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixes radix sort error #812

Merged
merged 1 commit into from
Jan 21, 2019
Merged

Conversation

rosenrodt
Copy link
Contributor

insert memory fence to avoid AMD OpenCL compiler reordering the read/writes; fixes issue #811

CTest log for Vega56

99% tests passed, 2 tests failed out of 169
 
Total Test time (real) =  85.07 sec

The following tests FAILED:
 	143 - misc.amd_cpp_kernel_language (Failed)
 	148 - example.amd_cpp_kernel (Exit code 0xc0000409)
Errors while running CTest

@rosenrodt
Copy link
Contributor Author

I am closing this PR because some issues still spill over after a clean rebuild on my test machine. Will reopen when it's done

@coveralls
Copy link

coveralls commented Jan 15, 2019

Coverage Status

Coverage remained the same at 84.02% when pulling 81dc8c7 on rosenrodt:sort-failure-on-amd-vega into 2135633 on boostorg:develop.

@jszuppe
Copy link
Contributor

jszuppe commented Jan 17, 2019

You reopened the PR, but the code is the same. How is mem_fence related to what you wrote in #811 (comment)?

@rosenrodt
Copy link
Contributor Author

I closed and reopened the PR because I was unsure of the cause of failed assertions when invoking is_sorted() on sorted arrays, but later figured it was totally unrelated to radix sort. Details were described in my comment here

The PR is to make sure the memory access is in correct order and not mangled up by the compiler, therefore the mem_fence in line 179. But I rarely encounter such a case where compiler is not respecting obvious variable dependencies. So this is just an ad hoc fix until maybe we can get AMD to look into this, especially by the fact that it only happens on AMD Vega/RX5xx and not the older chips nor chips from other vendors.

Another mem_fence is inserted for no apparent reason because it has to be like this to work on Vegas.

@jszuppe
Copy link
Contributor

jszuppe commented Jan 18, 2019

Hmm.. At least a fence for global memory would make more sense, does that option work too?

@rosenrodt
Copy link
Contributor Author

Fences for global, local or both also work. So it won't make a difference. But I think you're right on that: global fencing is more appropriate in literal sense

@jszuppe
Copy link
Contributor

jszuppe commented Jan 18, 2019

If I'm correct this kernel is run with one thread (via enqueue_task), so those fences should not have any performance cost. Because of that it shouldn't be a problem to include that workaround after:

@rosenrodt
Copy link
Contributor Author

bump 😃
@jszuppe anything else that needs to be done?

@jszuppe
Copy link
Contributor

jszuppe commented Jan 21, 2019

No, it's ok. I just didn't have time to look at it.

@jszuppe jszuppe merged commit 5f283bf into boostorg:develop Jan 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants