-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsigned array bounds break in CUDA #808
Comments
An easy fix is to have the code generator cast the limits to signed ... |
Index arithmetic in loopy is always assumed to be signed, but we're not doing a good job checking that. I agree that casting to signed in integer expressions is probably the cleanest way out of this. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
TLDR: defining array sizes with unsigned integers and then tiling the resulting kernel will break the per-thread check to see if it is inside the loop domain. This is because the check is of the form
if (N - something >= 0) {
, which is always true ifN
is unsigned (at least in my testing)Hefty mvp to reproduce (including PyCuda boilerplate)
For the unsigned case, I get the following device code:
which specifically breaks on the parts on the if statement. Expressions like
-1 + -32 * bIdx(y) + -1 * tIdx(y) + nyc>=0
are always true in unsigned arithmetic, so the above kernel will go past the correct array bounds.Output I get with the unsigned kernel:
versus what I (correctly) get with a signed version, which is a ring of 0's around the exterior:
The text was updated successfully, but these errors were encountered: