Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

detect MPICH via HYDRA #3984

Merged
merged 1 commit into from
Sep 14, 2023
Merged

detect MPICH via HYDRA #3984

merged 1 commit into from
Sep 14, 2023

Conversation

njzjz
Copy link
Contributor

@njzjz njzjz commented Sep 13, 2023

Checklist before submitting

  • Did you read the contributor guide?
  • Did you update the docs?
  • Did you write any tests to validate this change?
  • Did you update the CHANGELOG, if this change affects users?

Description

Fixes #2761.

Horovod still uses MPICH in mpirun --version to detect MPICH. In the unittest, the test case has MPICHLIB_CFLAGS flag. However, not all MPICH builds has this flag. For example, on Ubuntu 22.04, the MPICH installed with apt-get has no MPICH.

Here is the way to reproduce it (using docker):

FROM ubuntu:22.04
RUN apt-get update && apt-install -y mpich && mpirun --version

The output is:

HYDRA build details:
    Version:                                 4.0
    Release Date:                            Fri Jan 21 10:42:29 CST 2022
    CC:                              gcc -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security  -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro 
    Configure options:                       '--with-hwloc-prefix=/usr' '--with-device=ch4:ofi' 'FFLAGS=-O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -fallow-invalid-boz -fallow-argument-mismatch' '--prefix=/usr' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select

Using HYDRA is safe, as it is hard-coded in the MPICH source code: https://github.com/pmodels/mpich/blob/673b348edf97753502438d6aa9030dcf76074177/src/pm/hydra/mpiexec/options.c#L1154

Review process to land

  1. All tests and other checks must succeed.
  2. At least one member of the technical steering committee must review and approve.
  3. If any member of the technical steering committee requests changes, they must be addressed.

Fix horovod#2761.

Horovod still uses `MPICH` in `mpirun --version` to detect MPICH. In the unittest, the test case has `MPICHLIB_CFLAGS` flag. However, not all MPICH builds has this flag. For example, on Ubuntu 22.04, the MPICH installed with `apt-get` has no `MPICH`.

Here is the way to reproduce it:
```docker
FROM ubuntu:22.04
RUN apt-get update && apt-install -y mpich && mpirun --version
```
The output is:
```
HYDRA build details:
    Version:                                 4.0
    Release Date:                            Fri Jan 21 10:42:29 CST 2022
    CC:                              gcc -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security  -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro
    Configure options:                       '--with-hwloc-prefix=/usr' '--with-device=ch4:ofi' 'FFLAGS=-O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -fallow-invalid-boz -fallow-argument-mismatch' '--prefix=/usr' 'CFLAGS=-g -O2 -ffile-prefix-map=/build/mpich-0xgrG5/mpich-4.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security' 'LDFLAGS=-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro' 'CPPFLAGS=-Wdate-time -D_FORTIFY_SOURCE=2'
    Process Manager:                         pmi
    Launchers available:                     ssh rsh fork slurm ll lsf sge manual persist
    Topology libraries available:            hwloc
    Resource management kernels available:   user slurm ll lsf sge pbs cobalt
    Demux engines available:                 poll select
```

Using `HYDRA` is safe, as it is hard-coded in the MPICH source code: https://github.com/pmodels/mpich/blob/673b348edf97753502438d6aa9030dcf76074177/src/pm/hydra/mpiexec/options.c#L1154

Signed-off-by: Jinzhe Zeng <[email protected]>
@github-actions
Copy link

Unit Test Results

     977 files  +   409       977 suites  +409   12h 1m 57s ⏱️ + 3h 2m 52s
     887 tests ±       0       771 ✔️ +     80     116 💤  -      79  0  - 1 
21 895 runs  +9 157  15 539 ✔️ +6 899  6 356 💤 +2 259  0  - 1 

Results for commit bf1b09a. ± Comparison against base commit ca21d8d.

@github-actions
Copy link

Unit Test Results (with flaky tests)

  1 122 files  +     492    1 122 suites  +492   12h 45m 52s ⏱️ + 3h 19m 57s
     887 tests ±         0       771 ✔️ +     80     116 💤  -      79  0  - 1 
25 502 runs  +11 692  17 624 ✔️ +8 298  7 878 💤 +3 397  0  - 3 

Results for commit bf1b09a. ± Comparison against base commit ca21d8d.

Copy link
Collaborator

@maxhgerlach maxhgerlach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@maxhgerlach maxhgerlach linked an issue Sep 14, 2023 that may be closed by this pull request
@maxhgerlach maxhgerlach merged commit 8724eff into horovod:master Sep 14, 2023
38 of 41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Hydra MPI
2 participants