Skip to content

Advanced Topics

Denise Worthen edited this page Oct 22, 2020 · 4 revisions

Debug mode

To compile the coupled model in debug mode, compile using the DEBUG=Y. See tests/tests/rt.conf for the command syntax. For example,

CCPP=Y DEBUG=Y SUITES=FV3_GFS_2017_coupled,FV3_GFS_2017_satmedmf_coupled,FV3_GFS_v15p2_coupled S2S=Y

The DEBUG=Y flag will set both the ESMF debug library as well as component level debug flags. The debug versions of the Intel MPI library are included by adding -link_mpi=dbg*_ (or *-link_mpi=dbg_mt for multi-threaded applications) to the debug settings for the coupled model. This allows the wrappers use the debug versions of the MPI library.

When running in debug mode, the wall clock time will need to be adjusted. For example, to run the C96mx100 (C96 UFSAtm, 1 deg MOM6-CICE6) 6hr debug case on hera, the wall clock time should be set to 1 hour.

Changing the number of PEs for FV3

Changes are required in both model_configure:

TASKS: total number of all tasks for all components
quilting: true/false variable to use writer cores for FV3GFS
write_groups: the number of write groups for FV3GFS
write_tasks_per_group: the number of tasks per FV3GFS write group, a multiple of ntiles

and in input.nml

layout: INPES, JNPES, the layout of pets on each task in the x & y directions
ntiles: the number of tiles, typically 6

The number of FV3 tasks is then given by:

(INPES x JNPES x 6) + (write_groups x write_tasks_per_group)

The PET layout for each component then needs to be adjusted consistent with the TASKS. For the coupled model, the mediator is given the number of FV3 tasks, but without including the the write tasks. For example, if INPES x JNPES x ntiles = 3x8x6 = 144 then the mediator is given 144 tasks and FV3 will be given 144 plus the number of write tasks.

Changing the number of PEs for CICE

In CICE6, model run-time resources can be set at run time. Dave Bailey at NCAR has provided the following useful information.

The main settings in the domain_nml used to set the run-time resources are:

 block_size_x (number of grid cells per block in the x-direction)
 block_size_y (number of grid cells per block in the y-direction)
 max_blocks (number of blocks per processor maximum)
 distribution_type (how the blocks are laid out on the processors)
 processor_shape (what the approximate shape of each block looks like)

While CICE does not necessarily need to have the same number of blocks as processors, this is usually a good rule of thumb. When you get up
into higher processor counts, the 'spacecurve' or 'sectrobin' distribution_type with smaller square shaped blocks can be used. Also, as you
go up in processors it might also be useful to have OpenMP threading. In this case you would have more than one block per processor and
max_blocks would need to be increased.

More information about decomposition choices and performance can be found in the CICE documentation.

For the regresson tests, the following settings in domain_nml are used:

processor_shape   = 'slenderX2'
nprocs            = NPROC_ICE
nx_global         = NX_GLB
ny_global         = NY_GLB
block_size_x      = BLCKX
block_size_y      = BLCKY
max_blocks        = -1

Where for NPROC_ICE (the number of TASKS assinged to ice),

BLCKX=NX_GLB/(NPROC_ICE/2)
BLCKY=NY_GLB/2

and NX_GLB and NY_GLB are the domain size in the x and y directions.

Profiling Timing Across Components

To check run times of different components for load balancing, the following two environment variables must be set:

export ESMF_RUNTIME_PROFILE=ON
export ESMF_RUNTIME_PROFILE_OUTPUT=SUMMARY

For the coupled model, the environment variables should be added to the file tests/fv3_conf/fv3_slurm.IN_<platform> where platform is Hera, Orion, etc. This will produce the ESMF_Profile.summary in the run directory which will give you timing information for the run. See the ESMF Reference Manual for more details.

The ESMF_Profile.summary can also include MPI functions to indicate how much time is spent inside communication calls. To use this feature, modify the file tests/fv3_conf/fv3_slurm.IN_<platform> to set the environment variable for the location of mpi profiling preload script, and include the script in srun command as show below.

# set location of mpi profiling preload script
export ESMF_PRELOAD=${ESMFMKFILE/esmf.mk/preload.sh}

# include preload script before forecast executable in srun command
srun --label -n @[TASKS] $ESMF_PRELOAD ./fcst.exe

See the ESMF Reference Manual for more details:

Note, your job must complete for the summary table to be written so make sure to adjust the wall clock or runtime.

Porting to a new Machine

Note: NCEPLIBS and third party libraries need to be installed on the new platform

Coupled model

An example can be seen [here] for machine stampede.intel

The following files need to be added for each machine_name and compiler option

modulefiles/<machine_name>.<compiler>/fv3
conf/configure.fv3.<machine_name>.<compiler>

To compile, cd into the tests directory, followed by

./compile.sh ../FV3 stampede.intel 'MAKE_OPT'

See this page for details.

TODO:: Add information about how this works with Cmake

Clone this wiki locally