mpiexec runs much slower than OpenMPI in Github Actions #6037

wkliao · 2022-05-31T16:55:19Z

I run two github action workflows for PnetCDF. One is using MPICH and the
other OpenMPI. They both run many small 4-process jobs (command: make ptest)
and all MPI jobs run on the same local host node. The MPICH version is always much
slower than OpenMPI. The times of the latest runs show

1h 17m 43s for MPICH and
15m 27s for OpenMPI.

I have also tried to build MPICH 4.0.2 from source as part of workflow, but got a similar
timing results.

Is there a way to speed up mpiexec?

hzhou · 2022-05-31T17:40:14Z

I have two tips:

export HWLOC_XMLFILE=path/to/hwloc.xml
MPICH will probe hardware from each process that can be very slow at init time. Supply a pre-geneated hwloc config XML file should by pass it. https://www.open-mpi.org/projects/hwloc/doc/v2.3.0/a00353.php
If gpu awareness is not required, disable it by e.g. --without-cuda.

wkliao · 2022-05-31T18:25:58Z

Could you please help me with an example for creating the xml file?
I wonder how come OpenMPI does not have such problem.

hzhou · 2022-05-31T19:15:11Z

Could you please help me with an example for creating the xml file?
I believe if you build and install hwloc inside the GitHub action runner, you can run

lstopo --of xml hwloc.xml

to create the xml file. For reference: https://linux.die.net/man/1/lstopo

I wonder how come OpenMPI does not have such problem

I think OpenMPI only discovers hardware in mpiexec and send the hardware topology to processes via PMIx. Ken implemented a similar workaround in #5929. Do you run main branch MPICH or use an older release?

wkliao · 2022-05-31T21:55:00Z

The HWLOC_XMLFILE approach does not improve. I am still getting a similar long time.
I am building mpich 4.0.2.

hzhou · 2022-05-31T21:59:31Z

Could you try both ch3 and ch4 devices and see if there is any difference in timing?

wkliao · 2022-05-31T22:54:47Z

I used --with-device=ch4 and make hangs at Making install in modules/yaksa.

*****************************************************
***
*** device      : ch4:ofi (embedded libfabric)
*** shm feature : auto
*** gpu support : disabled
***
  MPICH is configured with device ch4:ofi, which should work
  for TCP networks and any high-bandwidth interconnect
  supported by libfabric. MPICH can also be configured with
  "--with-device=ch4:ucx", which should work for TCP networks
  and any high-bandwidth interconnect supported by the UCX
  library. In addition, the legacy device ch3 (--with-device=ch3)
  is also available.
*****************************************************
Configuration completed.
Making install in src/mpl
ar: `u' modifier ignored since `D' is the default (see `U')
Making install in /home/runner/work/PnetCDF/PnetCDF/MPICH/mpich-4.0.2/modules/hwloc
Making install in include
Making install in hwloc
ar: `u' modifier ignored since `D' is the default (see `U')
Making install in /home/runner/work/PnetCDF/PnetCDF/MPICH/mpich-4.0.2/modules/json-c
Making install in .
ar: `u' modifier ignored since `D' is the default (see `U')
Making install in tests
Making install in modules/yaksa

hzhou · 2022-05-31T23:51:00Z

Did you somehow hide the lines such as CC xxx.lo? make in yaksa is going to take a while, are you sure it is hanging?

wkliao · 2022-06-01T02:03:10Z

I re-ran to build MPICH with ch4, which took 31 minutes.
PnetCDF's make ptest is still slow, taking 1 hour 9 minutes.

hzhou · 2022-06-01T15:00:35Z

I just tried building PnetCDF on my local linux, and make ptest only took 38 sec. @wkliao Could you check the attached make log and see if I am missing anything?
t.log

wkliao · 2022-06-01T15:04:20Z

It also runs very fast on my local redhat machine.
The issue is it runs very slow on github action workflow.

hzhou · 2022-06-01T15:05:52Z

Do you have a link to the recent github action test log?

wkliao · 2022-06-01T15:13:50Z

See below. You can also check their yml files for the configure options I used.

hzhou · 2022-06-01T15:23:28Z

Somehow every test seems ran twice:

 make[2]: Entering directory '/home/runner/work/PnetCDF/PnetCDF/test/C'
===========================================================
    test/C: Parallel testing on 4 MPI processes
===========================================================
*** TESTING C   pres_temp_4D_wr for writing classic file           ------ pass
*** TESTING C   pres_temp_4D_rd for reading classic file           ------ pass
*** TESTING C   pres_temp_4D_wr for writing classic file           ------ pass
*** TESTING C   pres_temp_4D_rd for reading classic file           ------ pass
make[2]: Leaving directory '/home/runner/work/PnetCDF/PnetCDF/test/C'

wkliao · 2022-06-01T15:30:17Z

This is because configure option --enable-burst_buffering is used.
It runs each test twice, once for using burst buffering feature, one not.

What puzzles me is why the same configure settings used in both
OpenMPI and MPICH and MPICH is much slower than OpenMPI.

hzhou · 2022-06-01T15:32:54Z

Even OpenMPI's 15 min is a big puzzle if it only run on local computer for less than a minute. That's 15x, a much bigger puzzle. Any ideas?

wkliao · 2022-06-01T15:34:33Z

I believe it is because github actions are running on a virtual environment.

hzhou · 2022-06-01T15:36:38Z

I believe it is because github actions are running on a virtual environment.

I believe virtualization nowadays are pretty good, i.e. I would think it won't slow down by more than 2x. Looking at the compilation time, it seems reasonably fast.

wkliao · 2022-06-01T15:39:59Z

If that is the case, then I have no idea why github actions is slower.
The issue remains: OpenMPI's mpiexec runs faster than MPICH.

hzhou · 2022-06-01T15:48:52Z

Is there a way to show timestamp of each test?

wkliao · 2022-06-01T16:17:13Z

Yes. On the top-right corner, click the gear icon and click show timestamps and full screen.

raffenet · 2022-06-29T18:01:50Z

Just catching up here. Github Actions runners only have 2 virtual cores. The oversubscription might be really hurting. Could you test using the --with-device=ch3:sock configuration and see how it performs?

raffenet · 2022-06-29T18:20:40Z

Just catching up here. Github Actions runners only have 2 virtual cores. The oversubscription might be really hurting. Could you test using the --with-device=ch3:sock configuration and see how it performs?

Also I guess I should try to clarify, why do we think that mpiexec is the slow part? Is there some additional breakdown of time spent showing that its mpiexec that is slow? Or are we just comparing MPICH vs. OpenMPI and their associated launchers?

wkliao · 2022-06-29T22:25:38Z

The trick of -with-device=ch3:sock seems to work.
The time has been significantly reduced from 1h 30 m to 19 m.

raffenet · 2022-06-30T16:03:39Z

The trick of -with-device=ch3:sock seems to work. The time has been significantly reduced from 1h 30 m to 19 m.

Thanks for confirming. This is another piece of evidence supporting adding a configuration in ch4 that can run without busy-polling. Whether that is ch4:sock or something else.

wkliao · 2022-06-30T17:22:03Z

I will be happy to do some profiling. Let me know.
Otherwise, I can close this issue. Thanks!

scottwittenburg · 2023-11-08T21:37:07Z

This is another piece of evidence supporting adding a configuration in ch4 that can run without busy-polling.

Is there now another combination with ch4 that runs without busy-polling? After we saw the suggestion here, we tried an mpich build with ch3:sock:tcp, and it cut our testing time in CI by more than half.

cc: @vicentebolea

hzhou · 2023-11-08T21:42:01Z

Is there now another combination with ch4 that runs without busy-polling?

No. It is still sitting on our TODO list.

scottwittenburg · 2023-11-08T21:44:50Z

Is there now another combination with ch4 that runs without busy-polling?

No. It is still sitting on our TODO list.

Ok, thanks, just checking.

wkliao closed this as completed Jul 10, 2022

wkliao mentioned this issue Jul 12, 2022

Github actions: build mpich Parallel-NetCDF/PnetCDF#84

Closed

scottwittenburg mentioned this issue Nov 8, 2023

mpich: support --with-device=ch3:sock:tcp spack/spack#40964

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpiexec runs much slower than OpenMPI in Github Actions #6037

mpiexec runs much slower than OpenMPI in Github Actions #6037

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

raffenet commented Jun 29, 2022

raffenet commented Jun 29, 2022

wkliao commented Jun 29, 2022

raffenet commented Jun 30, 2022 •

edited

Loading

wkliao commented Jun 30, 2022

scottwittenburg commented Nov 8, 2023

hzhou commented Nov 8, 2023

scottwittenburg commented Nov 8, 2023

mpiexec runs much slower than OpenMPI in Github Actions #6037

mpiexec runs much slower than OpenMPI in Github Actions #6037

Comments

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented May 31, 2022

hzhou commented May 31, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

hzhou commented Jun 1, 2022

wkliao commented Jun 1, 2022

raffenet commented Jun 29, 2022

raffenet commented Jun 29, 2022

wkliao commented Jun 29, 2022

raffenet commented Jun 30, 2022 • edited Loading

wkliao commented Jun 30, 2022

scottwittenburg commented Nov 8, 2023

hzhou commented Nov 8, 2023

scottwittenburg commented Nov 8, 2023

raffenet commented Jun 30, 2022 •

edited

Loading