-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default to per-thread stream semantics #395
Conversation
The current implementation isn't configurable, and defaults to these semantics. Thoughts (especially @vchuravy)? With KA using explicit streams and not relying on the default stream to synchronize the world (note that device synchronization still works like that) I don't think there should be any issues. |
Codecov Report
@@ Coverage Diff @@
## master #395 +/- ##
==========================================
- Coverage 78.99% 78.86% -0.13%
==========================================
Files 165 165
Lines 8731 8749 +18
==========================================
+ Hits 6897 6900 +3
- Misses 1834 1849 +15
Continue to review full report at Codecov.
|
I think from a KA perspective this fine, but we do currently synchronize against the
where How do normal memory copies behave? Do they still act as a global sync? |
I don't think so, as
No, they only sync the thread-local default stream (which is a major part of the benefit here, since we can load data on a thread/task without disturbing others). |
Seems to have been forgotten in #243
940fd3a
to
fbedaaa
Compare
@jonathan-laurent This is probably also an interesting change for AlphaZero.jl (as a heavy user of GPU multi-threading/tasking), maybe you could give it a go to see if it improves performance and/or otherwise affects the programming model? |
@ali-ramadhan and @glwagner, if you have time it would be good to try out Oceanigans on this, since last time we messed up synchronization the tests showed that relatively quickly. |
@maleadt This looks nice. |
No, this only has an impact when multiple threads perform GPU computations. |
This PR changes two things threading and stream-related:
As a result, we get the new default stream semantics where operations on streams don't get synchronized by the global default stream anymore:
Before:
After:
Another, maybe more important consequence is that we automatically get concurrent execution on different threads:
Before:
After:
Thanks to @marius311 for making me read https://developer.nvidia.com/blog/gpu-pro-tip-cuda-7-streams-simplify-concurrency/ again (this time implementing its features) :-)
cc @vchuravy @kshyatt