-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance regression in collectives due to UCX_PROTO_ENABLE #9914
Comments
Hi, could you please completely fill the bug template that is described during issue creation? |
@ivankochin Of course! here comes. Let me know if you need more info. Steps to Reproduce
Setup and versions
|
@ivankochin FYI, I have tried the same test on our older system (ConnectX-4), Rocky Linux release 9.2, MLNX_OFED_LINUX-5.8-3.0.7.0. It seems the performance is similar with both It seems in our case only the |
I attach the output of |
@angainor could you please also specify |
@ivankochin This turned out to be a rather large file (70MB), too large to attach here. I've made it available for download here: https://filesender.uio.no/?s=download&token=226c039d-abf0-4e8d-93ab-df04a8fee91e Please let me know if you have problems accessing it. |
@angainor So it would be great if you are able to do some extra experiments. I want to ask you to download the latest UCX master, build it in release mode (using |
The regression is still there, looks similar +- some glitches. I will recompile with debug and collect the logs.
|
Here are logs from one process. I hope this is enough, otherwise the entire file is rather large. Note that this time I only ran the benchmark with message sizes up to 8 bytes due to the amount of logs. Please let me know if this is ok, or if you need more. |
Thanks for all your efforts. Do I understand correctly that all the logs mentioned here were collected with
Are you sure? I see messages up to 2048 bytes in that log. Root cause isn't still defined since according to logs both protov1 and protov2 uses same protocols. Could you please also re-measure
|
Yes, but as I mentioned before, this happens also if I specify DC, or if I don't set
log file:
log file: Note that in the logged runs
At least this is what I asked OSU benchmark to test. But I guess it might send other messages during the run time, I don't know.
These are for
For reference, here are results with
|
I noticed a performance regression in OSU benchmark (OpenMPI with UCX and HCOLL) when using HPCX 2.17.1 compared to 2.14. It is due to the fact that now the
UCX_PROTO_ENABLE=y
by default. Setting it ton
improves performance. Here are some results ofosu_alltoall
, but I have also tested and see problems forallreduce
,allgather
,bcast
.Results with HPCX 2.14 (
UCX_PROTO_ENABLE=n
by default)Results with HPCX 2.17.1 (
UCX_PROTO_ENABLE=y
by default)The text was updated successfully, but these errors were encountered: