Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add nccl test example #3217

Merged
merged 4 commits into from
Jul 4, 2024
Merged

add nccl test example #3217

merged 4 commits into from
Jul 4, 2024

Conversation

asaiacai
Copy link
Contributor

Adds NCCL test example. Closes #3164. I've pasted a dump of my run of it on GCP. The network interface names change depending on the cloud provider I noticed, so to make it more portable, it might make sense to add something to grep the interface name. Also CUDA path would be more consistent if we run inside a container.

(worker1, rank=1, pid=20075, ip=10.164.15.193) 10.164.15.192,10.164.15.193
(head, rank=0, pid=21170) 10.164.15.192,10.164.15.193
(head, rank=0, pid=21170) Warning: Permanently added '10.164.15.193' (ECDSA) to the list of known hosts.
(head, rank=0, pid=21170) # nThread 1 nGpus 8 minBytes 8 maxBytes 134217728 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
(head, rank=0, pid=21170) #
(head, rank=0, pid=21170) # Using devices
(head, rank=0, pid=21170) #  Rank  0 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  0 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  1 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  1 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  2 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  2 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  3 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  3 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  4 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  4 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  5 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  5 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  6 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  6 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  7 Group  0 Pid  21431 on nccl-ebd1-head-5vwmjam2-compute device  7 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  8 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  0 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank  9 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  1 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank 10 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  2 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank 11 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  3 [0x00] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank 12 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  4 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank 13 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  5 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank 14 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  6 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #  Rank 15 Group  0 Pid  20335 on nccl-ebd1-worker-4mntuso6-compute device  7 [0x80] NVIDIA A100-SXM4-40GB
(head, rank=0, pid=21170) #
(head, rank=0, pid=21170) #                                                              out-of-place                       in-place          
(head, rank=0, pid=21170) #       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
(head, rank=0, pid=21170) #        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
(head, rank=0, pid=21170)            8             2     float     sum      -1    225.7    0.00    0.00      0    308.3    0.00    0.00      0
(head, rank=0, pid=21170)           16             4     float     sum      -1    310.2    0.00    0.00      0    310.9    0.00    0.00      0
(head, rank=0, pid=21170)           32             8     float     sum      -1    310.3    0.00    0.00      0    310.2    0.00    0.00      0
(head, rank=0, pid=21170)           64            16     float     sum      -1    312.0    0.00    0.00      0    310.1    0.00    0.00      0
(head, rank=0, pid=21170)          128            32     float     sum      -1    311.5    0.00    0.00      0    228.5    0.00    0.00      0
(head, rank=0, pid=21170)          256            64     float     sum      -1    310.6    0.00    0.00      0    227.7    0.00    0.00      0
(head, rank=0, pid=21170)          512           128     float     sum      -1    313.5    0.00    0.00      0    222.9    0.00    0.00      0
(head, rank=0, pid=21170)         1024           256     float     sum      -1    312.1    0.00    0.01      0    227.9    0.00    0.01      0
(head, rank=0, pid=21170)         2048           512     float     sum      -1    318.0    0.01    0.01      0    142.7    0.01    0.03      0
(head, rank=0, pid=21170)         4096          1024     float     sum      -1    145.1    0.03    0.05      0    145.9    0.03    0.05      0
(head, rank=0, pid=21170)         8192          2048     float     sum      -1    271.5    0.03    0.06      0    185.7    0.04    0.08      0
(head, rank=0, pid=21170)        16384          4096     float     sum      -1    264.9    0.06    0.12      0    257.7    0.06    0.12      0
(head, rank=0, pid=21170)        32768          8192     float     sum      -1    316.1    0.10    0.19      0    312.9    0.10    0.20      0
(head, rank=0, pid=21170)        65536         16384     float     sum      -1    344.3    0.19    0.36      0    349.1    0.19    0.35      0
(head, rank=0, pid=21170)       131072         32768     float     sum      -1    458.2    0.29    0.54      0    461.0    0.28    0.53      0
(head, rank=0, pid=21170)       262144         65536     float     sum      -1    665.3    0.39    0.74      0    621.0    0.42    0.79      0
(head, rank=0, pid=21170)       524288        131072     float     sum      -1   2422.5    0.22    0.41      0   3018.5    0.17    0.33      0
(head, rank=0, pid=21170)      1048576        262144     float     sum      -1   3908.3    0.27    0.50      0   3923.2    0.27    0.50      0
(head, rank=0, pid=21170)      2097152        524288     float     sum      -1   2499.8    0.84    1.57      0   2329.5    0.90    1.69      0
(head, rank=0, pid=21170)      4194304       1048576     float     sum      -1   4423.3    0.95    1.78      0   4280.6    0.98    1.84      0
(head, rank=0, pid=21170)      8388608       2097152     float     sum      -1   6487.8    1.29    2.42      0   6333.2    1.32    2.48      0
(head, rank=0, pid=21170)     16777216       4194304     float     sum      -1    12280    1.37    2.56      0    12318    1.36    2.55      0
(head, rank=0, pid=21170)     33554432       8388608     float     sum      -1    23864    1.41    2.64      0    25575    1.31    2.46      0
(head, rank=0, pid=21170)     67108864      16777216     float     sum      -1    50313    1.33    2.50      0    51542    1.30    2.44      0
(head, rank=0, pid=21170)    134217728      33554432     float     sum      -1   100121    1.34    2.51      0   101764    1.32    2.47      0
(head, rank=0, pid=21170) # Out of bounds values : 0 OK
(head, rank=0, pid=21170) # Avg bus bandwidth    : 0.758155 
(head, rank=0, pid=21170) #
(head, rank=0, pid=21170) 

@asaiacai
Copy link
Contributor Author

i changed this to just use torch and the nccl that ships with it so this is cloud agnostic now.

Copy link
Collaborator

@romilbhardwaj romilbhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asaiacai. We should still keep #3164 open till we support nvidia's own official nccl-tests repo.

examples/nccl_test.yaml Outdated Show resolved Hide resolved
@Michaelvll Michaelvll merged commit c445755 into skypilot-org:master Jul 4, 2024
20 checks passed
@asaiacai asaiacai deleted the nccl branch July 10, 2024 23:16
Michaelvll pushed a commit that referenced this pull request Aug 23, 2024
* add nccl test example

* use pytorch nccl test instead

* fix docstring

* nit newline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[examples] Add nccl-test example
3 participants