-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open UCX 1.15.0 with Open MPI 4.1.1 - running osu_iallgather/osu_iallgatherv stucked when the message size reached 65536 #9731
Comments
Hi, I noticed that when you set Does the program stuck if the command line contains
Does the program stuck if the command line doesn't contain
Does the program stuck if the command line contains
|
Describe the bug
We use Open UCX 1.15.0 with Open MPI 4.1.1 to run osu_iallgather/osu_iallgatherv. However, when the message size reached 65536, the program was stucked, we waited at least 30 minutes but printed nothing no more.
Things we have tried
Steps to Reproduce
Command line
mpirun -x UCX_TLS=sm,rc_x -x UCX_NET_DEVICES=mlx5_1:1 -np 1024 -N 128 --hostfile hostfile_path -mca pml ucx -mca btl ^vader,tcp,openib,uct osu_iallgather -i 2
UCX version used :
1.15.0
UCX configure flags (can be checked by
ucx_info -v
)Setup and versions
Linux 6426-node125 4.19.90-2112.8.0.0131.oe1.aarch64 #1 SMP Fri Dec 31 19:53:20 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux
rdma-core-54mlnx1-1.54303.aarch64
MLNX_OFED_LINUX-5.4-3.0.3.0
ibstat
oribv_devinfo -vv
commandAdditional information (depending on the issue)
Open MPI 4.1.1
osu-micro-benchmarks-7.1-1
The text was updated successfully, but these errors were encountered: