Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AtomicFAddEXT for the CUDA BE #2853

Closed
AGindinson opened this issue Dec 2, 2020 · 5 comments
Closed

Implement AtomicFAddEXT for the CUDA BE #2853

AGindinson opened this issue Dec 2, 2020 · 5 comments
Assignees
Labels
cuda CUDA back-end enhancement New feature or request performance Performance related issues

Comments

@AGindinson
Copy link
Contributor

After 4fdbfae, there are preparations to switch atomic fetch_add/fetch_sub FP implementations to using the new SPIR-V operand. Providing a "native" implementation in the CUDA BE would enable us to use the leveraged function for NVPTX targets as well (#if !defined(__NVPTX__) macros would have to be removed to achieve this).

@ldrumm
Copy link
Contributor

ldrumm commented Jan 9, 2023

I think this is now implemented. It looks like @AGindinson did the meat of this work in 37a9a2a

Additionally, relevant libclc support went in in the following PRs:
#4820
#4853
#5025
#5191

@AGindinson is there anything missing? Perhaps we can close this?

@AGindinson
Copy link
Contributor Author

@AlexeySachkov, could you please help with evaluating this one?

@npmiller
Copy link
Contributor

npmiller commented May 9, 2023

@AlexeySachkov @AGindinson any updates on this?

@AlexeySachkov
Copy link
Contributor

@AlexeySachkov @AGindinson any updates on this?

Not really. Both of us are not directly working on CUDA, so this item is a lower priority for us both. Feel free to pick it up. I'm also fine with closing it if we believe that everything is implemented already

@npmiller
Copy link
Contributor

npmiller commented May 9, 2023

From a quick look into the headers I don't see any #if !defined(__NVPTX__) usage for fetch_add/fetch_sub, so I believe this is implemented, I'm fine with closing it, what do you think @ldrumm ?

@ldrumm ldrumm closed this as completed May 9, 2023
jsji pushed a commit that referenced this issue Nov 22, 2024
The `OpSizeOf` instruction was added in SPIR-V 1.1, but not supported
yet.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@9aeb7eb92d7c0cb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda CUDA back-end enhancement New feature or request performance Performance related issues
Projects
None yet
Development

No branches or pull requests

6 participants