-
Notifications
You must be signed in to change notification settings - Fork 447
Optimize compilation time for the common case #400
Optimize compilation time for the common case #400
Conversation
gpuCI: NVIDIA/thrust#1557 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @senior-zero, this is a huge improvement -- I'm seeing compilation time improve on both nvcc and nvc++. It isn't quite back to before, but it's a significant reduction!
Compile times for the thrust::sort
test program:
Compiler | Old merge sort | Current merge sort | This PR |
---|---|---|---|
nvcc | 18.79s | 29.10s (+55%) | 22.44s (+19%) |
nvc++ | 61.81s | 75.61s (+22%) | 65.80s (+6%) |
LGTM -- Let's get this merged once the tests are passing and see how this impacts the total build time.
Related to NVBugs 3418930 and 3419768. |
Could you help to explain why these changes can reduce compilation time? |
Hello, @dongxiao92! Two specializations of merge sort kernels exist:
Since the check for available shared memory is performed at runtime, we had to compile for both cases (in generic case). This patch relies on the thrust approach which consists of comparing kernel shared memory size requirements with the default available shared memory size (48KB). This check can be done at compile time. If we know that virtual shared memory is not required at compile-time, there's no need to compile merge sort kernels twice. |
This PR contains compilation time improvement for the common case when agents fit into default shared memory size (48 KB).