This course is aimed at programmers seeking to deepen their understanding of MPI and explore some of its more recent and advanced features. We cover topics including exploiting shared-memory access from MPI programs, communicator management and advanced use of collectives. We also look at performance aspects such as which MPI routines to use for scalability, MPI internal implementation issues and overlapping communication and calculation. Intended learning outcomes
- Understanding of how internal MPI implementation details affect performance
- Techniques for overlapping communications and calculation
- Familiarity with neighbourhood collective operations in MPI
- Understanding of best practice for MPI+OpenMP programming
- Knowledge of MPI memory models for RMA operations
Attendees should be familiar with MPI programming in C, C++ or Fortran, e.g. have attended the ARCHER2 MPI course.
Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on.
They are also required to abide by the ARCHER2 Code of Conduct.
Although the start and end times will be as indicated below, this is a draft timetable based on a previous run of the course and the details may change for this run.
Unless otherwise indicated all material is Copyright © EPCC, The University of Edinburgh, and is only made available for private study.
- 09:30 - 09:40 ARCHER2 training
- 09:40 - 10:15 MPI Quiz ("Room Name" is: HPCQUIZ)
- 10:15 - 11:00 MPI History
- 11:00 - 11:30 Coffee
- 11:30 - 13:00 Point-to-point Performance
- 13:00 - 14:00 Lunch
- 14:00 - 15:30 MPI Optimisations
- 15:30 - 16:00 Coffee
- 16:00 - 17:00 Neighbourhood Collectives
- 17:00 CLOSE
- 09:30 - 11:00 MPI + OpenMP (i)
- 11:00 - 11:30 Coffee
- 11:30 - 13:00 MPI + OpenMP (ii) - same slide deck as above
- 13:00 - 14:00 Lunch
- 14:00 - 14:30 RMA Access in MPI
- 14:30 - 15:30 New MPI shared-memory model
- 15:30 - 16:00 Coffee
- 16:00 - 17:00 Finish Exercises
- 17:00 CLOSE
Unless otherwise indicated all material is Copyright © EPCC, The University of Edinburgh, and is only made available for private study.
SLURM batch scripts are set to run in the short queue and should work any time. However, on days when the course is running, we have special reserved queues to guarantee fast turnaround.
The reserved queue for today is called ta161_1261863
. To use this queue, change the --qos
and --reservation
lines to:
#SBATCH --qos=reservation
#SBATCH --reservation=ta161_1261863
-
Description of 3D halo-swapping benchmark is in this README
-
Download the code directly to ARCHER2 using:
git clone https://github.com/davidhenty/halobench
- compile with
make -f Makefile-archer2
- submit with
sbatch archer2.job
- compile with
-
Other things you could do with the halo swapping benchmark:
- change the buffer size to be very small ( a few tens of bytes) or very large (bigger than the eager limit) to see if that affects the results;
- run on different numbers of nodes.
-
Note that you will need to change the number of repetitions to get reasonable runtimes: many more for smaller messages, many fewer for larger messages. Each test needs to run for at least a few seconds to give reliable results.
-
The
halobench
program contains an example of usingMPI_Neighbor_alltoall()
to do pairwise swaps of data between neighbouring processes in a regular 3D grd -
Tomorrows traffic modelling problem sheet also contains a final MPI exercise in Section 3 to replace point-to-point boundary swapping with neighbourhood collectives.
The reserved queue for today is called ta161_1261866
. To use this queue, change the --qos
and --reservation
lines to:
#SBATCH --qos=reservation
#SBATCH --reservation=ta161_1261866
- Traffic modeling exercise sheet
- Traffic model source code and solutions (MPI / OpenMP)
- Traffic model source code and solutions (MPI RMA)
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.