Skip to content

Commit

Permalink
Use non-login shell for launch_FV3LAM_wflow.sh; remove support for …
Browse files Browse the repository at this point in the history
…`WCOSS_CRAY`; fix cron capability for `tcsh` users on Cheyenne (ufs-community#675)

* Remove unneeded sourcing of source_util_funcs.sh; add "%s" to printf calls since that's the proper calling method; edit comments.

* Generalize machine files.  Details:

* Add a wrapper (source_machine_file.sh) for sourcing the machine file that allows other commands common to all machines to be called.
* Change the scalar variable MODULE_INIT_PATH in the machine files to the array variable ENV_INIT_SCRIPTS_FPS that specifies the list of system scripts that need to be sourced (e.g. to make the "module" command available in a given script).  This is needed because on Cheyenne, at least two system scripts need to be sourced (to enable "module" and "qsub").
* Move the "ulimit" commands at the ends of the machine files into the new variable PRE_TASK_CMDS so that they are not called every time the machine file is sourced.  They will be called only if a given script issues an "eval ${PRE_TASK_CMDS}" (which all the ex-scripts will do).

* In the relevant ex-scripts: (1) Change sourcing of machine files to use the wrapper source_machine_file.sh; (2) Use "eval" to evaluate the contents of PRE_TASK_CMDS.

* In the WE2E script, change sourcing of the machine file to use the wrapper source_machine_file.sh.

* Add new variable valid_vals_BOOLEAN to constants.sh so that this file can be sourced and the valid values for a boolean can be made available to any other script.

* Bug fix.

* Remove file that was accidentally added in previous commit.

* Change the way crontab is called so that it also works on Cheyenne (for tcsh users).  Details:

* Introduce new function get_crontab_contents() that takes as input whether or not the calling script is itself being called from a cron job and returns (1) the path to the appropriate crontab command and (2) the contents of the user's cron table.
  * Such a function is needed because on Cheyenne, the location of the crontab command is different depending on whether or not the script that's calling crontab is itself called from a cron job (because on Cheyenne, "crontab" is containerized, and that complicates things).
* Use get_crontab_contents() in generate_FV3LAM_wflow.sh and launch_FV3LAM_wflow.sh (instead of simply calling "crontab" because the latter approach doesn't work on Cheyenne, at least not with users whose login shell is tcsh).
* Add "called_from_cron" as an optional argument to launch_FV3LAM_wflow.sh [so that it can then be passed on to get_crontab_contents()].  This argument is only used in the cron job that relaunches the workflow (which is created only if USE_CRON_TO_RELAUNCH is set to "TRUE").
  * Having an optional argument like this seems to be the best way to tell launch_FV3LAM_wflow.sh whether or not it is running from a cron job.
  * launch_FV3LAM_wflow.sh can still be called from the command line without any arguments (since the default value of "called_from_cron" is "FALSE").

* Generalize the way commands are initialized so that any number of system scripts can be sourced in a given script (currently, only "module" is initialized).  Details:

* Introduce the new function init_env() that initializes the envrionment of a script by sourcing necessary system scripts.  The full paths to these system scripts are specified in the array ENV_INIT_SCRIPTS_FPS in the machine files.
  * This function is needed because (1) this sourcing needs to be done in a couple of different scripts in the SRW App and (2) on some machines (e.g. Cheyenne), more than one system script may need to be sourced.
* Use the new init_env() function in launch_FV3LAM_wflow.sh and load_modules_run_task.sh.
  * In load_modules_run_task.sh, init_env() replaces sourcing of only the system script that defines the "module" command.  That is because on Cheyenne, in addition to the "module" command, the "qsub" command needs to be defined/initialized (by sourcing a second system script named pbs.sh).

* Replace calls to "crontab -l" by echoing of already obtained contents.  Fix comments and informational messages.

* For Cheyenne, don't need to source two separate system scripts.  Just sourcing "/etc/profile" is enough to make both the "module" and "qsub" commands (and probably all other system-supported commands) available in non-login scripts.

* Make script exit with an error message if rocoto commands fail.

* Fix the system script that needs to be sourced on Hera to get "module" (and other commands) to work.

* In init_env.sh, declare "local" variables and change the index of the for-loop so it's different than the variable i used (and unset) by the system script on Hera.

* Fix the system script that needs to be sourced on Orion to enable the "module" and other commands in a non-login bash shell.

* Fix the system script on Jet that needs to be sourced to enable the "module" and other commands in a non-login bash shell.

* Update comments.

* Bug fix:  Make sure the variable __crontab_cmd__ is defined for WCOSS_DELL_P3.

* Try changing the system script to source on WCOSS_DELL_P3 to "/etc/profile" (since it works on the other machines to enable the "module" and other commands).  This needs to be tested by someone who has access to WCOSS_DELL_P3.

* Changes to try to make the machine file work for WCOSS_DELL_P3.  Not yet tested.

* Fix modulepath issue on wcoss

* Fix issues on wcoss cray

* Fix crontab issue on wcoss cray

* Remove support for WCOSS_CRAY.

* Place double qoutes around ${RUN_CMD_...} in if-statements that check whether the RUN_CMD_... variable is empty, i.e. -z "${RUN_CMD_...}".  This is needed because on Cheyenne, not having the double quotes generates an error when RUN_CMD_... consists of a command that contains spaces (e.g. "mpirun -np ...").

Co-authored-by: chan-hoo <[email protected]>
  • Loading branch information
gsketefian and chan-hoo authored Mar 1, 2022
1 parent 9fd0b19 commit 2684204
Show file tree
Hide file tree
Showing 26 changed files with 364 additions and 430 deletions.
32 changes: 3 additions & 29 deletions scripts/exregional_make_grid.sh
Original file line number Diff line number Diff line change
Expand Up @@ -83,36 +83,10 @@ print_input_args valid_args
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

"WCOSS_CRAY")
{ save_shell_opts; set +x; } > /dev/null 2>&1
. $MODULESHOME/init/sh
module load PrgEnv-intel cfp-intel-sandybridge/1.1.0
module list
{ restore_shell_opts; } > /dev/null 2>&1
export NODES=1
export RUN_CMD_SERIAL="aprun -n 1 -N 1 -j 1 -d 1 -cc depth"
export KMP_AFFINITY=disabled
ulimit -s unlimited
ulimit -a
;;

"WCOSS_DELL_P3")
{ save_shell_opts; set +x; } > /dev/null 2>&1
module list
{ restore_shell_opts; } > /dev/null 2>&1
export RUN_CMD_SERIAL="mpirun"
ulimit -s unlimited
;;

*)
source ${MACHINE_FILE}
;;

esac

if [ -z ${RUN_CMD_SERIAL:-} ] ; then
if [ -z "${RUN_CMD_SERIAL:-}" ] ; then
print_err_msg_exit " \
Run command was not set in machine file. \
Please set RUN_CMD_SERIAL for your platform"
Expand Down
21 changes: 3 additions & 18 deletions scripts/exregional_make_ics.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,27 +86,12 @@ export OMP_STACKSIZE=${OMP_STACKSIZE_MAKE_ICS}
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in

"WCOSS_CRAY")
ulimit -s unlimited
RUN_CMD_UTILS="aprun -b -j1 -n48 -N12 -d1 -cc depth"
;;

"WCOSS_DELL_P3")
ulimit -s unlimited
RUN_CMD_UTILS="mpirun"
;;

*)
source ${MACHINE_FILE}
;;

esac
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

nprocs=$(( NNODES_MAKE_ICS*PPN_MAKE_ICS ))

if [ -z ${RUN_CMD_UTILS:-} ] ; then
if [ -z "${RUN_CMD_UTILS:-}" ] ; then
print_err_msg_exit "\
Run command was not set in machine file. \
Please set RUN_CMD_UTILS for your platform"
Expand Down
21 changes: 3 additions & 18 deletions scripts/exregional_make_lbcs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -86,27 +86,12 @@ export OMP_STACKSIZE=${OMP_STACKSIZE_MAKE_LBCS}
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in

"WCOSS_CRAY")
ulimit -s unlimited
RUN_CMD_UTILS="aprun -b -j1 -n48 -N12 -d1 -cc depth"
;;

"WCOSS_DELL_P3")
ulimit -s unlimited
RUN_CMD_UTILS="mpirun"
;;

*)
source ${MACHINE_FILE}
;;

esac
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

nprocs=$(( NNODES_MAKE_LBCS*PPN_MAKE_LBCS ))

if [ -z ${RUN_CMD_UTILS:-} ] ; then
if [ -z "${RUN_CMD_UTILS:-}" ] ; then
print_err_msg_exit "\
Run command was not set in machine file. \
Please set RUN_CMD_UTILS for your platform"
Expand Down
29 changes: 3 additions & 26 deletions scripts/exregional_make_orog.sh
Original file line number Diff line number Diff line change
Expand Up @@ -95,33 +95,10 @@ export OMP_STACKSIZE=${OMP_STACKSIZE_MAKE_OROG}
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

"WCOSS_CRAY")
{ save_shell_opts; set +x; } > /dev/null 2>&1
. $MODULESHOME/init/sh
module load PrgEnv-intel cfp-intel-sandybridge/1.1.0
module list
{ restore_shell_opts; } > /dev/null 2>&1
NODES=1
RUN_CMD_SERIAL="aprun -n 1 -N 1 -j 1 -d 1 -cc depth"
ulimit -s unlimited
ulimit -a
;;

"WCOSS_DELL_P3")
ulimit -s unlimited
ulimit -a
RUN_CMD_SERIAL="mpirun"
;;

*)
source ${MACHINE_FILE}
;;

esac

if [ -z ${RUN_CMD_SERIAL:-} ] ; then
if [ -z "${RUN_CMD_SERIAL:-}" ] ; then
print_err_msg_exit "\
Run command was not set in machine file. \
Please set RUN_CMD_SERIAL for your platform"
Expand Down
26 changes: 3 additions & 23 deletions scripts/exregional_make_sfc_climo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -136,32 +136,12 @@ EOF
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in

"WCOSS_CRAY")
RUN_CMD_UTILS=${APRUN:-"aprun -j 1 -n 6 -N 6"}
;;

"WCOSS_DELL_P3")
# Specify computational resources.
export NODES=2
export ntasks=48
export ptile=24
export threads=1
export MP_LABELIO=yes
export OMP_NUM_THREADS=$threads
RUN_CMD_UTILS="mpirun"
;;

*)
source ${MACHINE_FILE}
;;

esac
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

nprocs=$(( NNODES_MAKE_SFC_CLIMO*PPN_MAKE_SFC_CLIMO ))

if [ -z ${RUN_CMD_UTILS:-} ] ; then
if [ -z "${RUN_CMD_UTILS:-}" ] ; then
print_err_msg_exit "\
Run command was not set in machine file. \
Please set RUN_CMD_UTILS for your platform"
Expand Down
28 changes: 3 additions & 25 deletions scripts/exregional_run_fcst.sh
Original file line number Diff line number Diff line change
Expand Up @@ -99,34 +99,12 @@ export OMP_STACKSIZE=${OMP_STACKSIZE_RUN_FCST}
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in

"WCOSS_CRAY")
ulimit -s unlimited
ulimit -a

if [ ${PE_MEMBER01} -gt 24 ];then
RUN_CMD_FCST="aprun -b -j1 -n${PE_MEMBER01} -N24 -d1 -cc depth"
else
RUN_CMD_FCST="aprun -b -j1 -n${PE_MEMBER01} -N${PE_MEMBER01} -d1 -cc depth"
fi
;;

"WCOSS_DELL_P3")
ulimit -s unlimited
ulimit -a
RUN_CMD_FCST="mpirun -l -np ${PE_MEMBER01}"
;;

*)
source ${MACHINE_FILE}
;;

esac
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

nprocs=$(( NNODES_RUN_FCST*PPN_RUN_FCST ))

if [ -z ${RUN_CMD_FCST:-} ] ; then
if [ -z "${RUN_CMD_FCST:-}" ] ; then
print_err_msg_exit "\
Run command was not set in machine file. \
Please set RUN_CMD_FCST for your platform"
Expand Down
37 changes: 3 additions & 34 deletions scripts/exregional_run_post.sh
Original file line number Diff line number Diff line change
Expand Up @@ -92,42 +92,11 @@ export OMP_STACKSIZE=${OMP_STACKSIZE_RUN_POST}
#
#-----------------------------------------------------------------------
#
case "$MACHINE" in

"WCOSS_CRAY")

# Specify computational resources.
export NODES=2
export ntasks=48
export ptile=24
export threads=1
export MP_LABELIO=yes
export OMP_NUM_THREADS=$threads

RUN_CMD_POST="aprun -j 1 -n${ntasks} -N${ptile} -d${threads} -cc depth"
;;

"WCOSS_DELL_P3")

# Specify computational resources.
export NODES=2
export ntasks=48
export ptile=24
export threads=1
export MP_LABELIO=yes
export OMP_NUM_THREADS=$threads

RUN_CMD_POST="mpirun"
;;

*)
source ${MACHINE_FILE}
;;

esac
source $USHDIR/source_machine_file.sh
eval ${PRE_TASK_CMDS}

nprocs=$(( NNODES_RUN_POST*PPN_RUN_POST ))
if [ -z ${RUN_CMD_POST:-} ] ; then
if [ -z "${RUN_CMD_POST:-}" ] ; then
print_err_msg_exit "\
Run command was not set in machine file. \
Please set RUN_CMD_POST for your platform"
Expand Down
2 changes: 1 addition & 1 deletion tests/WE2E/run_WE2E_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -682,7 +682,7 @@ Please correct and rerun."
# Set the machine-specific configuration settings by sourcing the
# machine file in the ush directory

source $MACHINE_FILE
source $ushdir/source_machine_file.sh

expt_config_str=${expt_config_str}"\
#
Expand Down
8 changes: 8 additions & 0 deletions ush/constants.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,11 @@ degs_per_radian=$( bc -l <<< "360.0/(2.0*$pi_geom)" )
# Radius of the Earth in meters.
radius_Earth="6371200.0"

#
#-----------------------------------------------------------------------
#
# Other.
#
#-----------------------------------------------------------------------
#
valid_vals_BOOLEAN=("TRUE" "true" "YES" "yes" "FALSE" "false" "NO" "no")
56 changes: 29 additions & 27 deletions ush/generate_FV3LAM_wflow.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ ushdir="${scrfunc_dir}"
#-----------------------------------------------------------------------
#
. $ushdir/source_util_funcs.sh
. $ushdir/get_crontab_contents.sh
. $ushdir/set_FV3nml_sfc_climo_filenames.sh
#
#-----------------------------------------------------------------------
Expand Down Expand Up @@ -500,11 +501,12 @@ if [ "${USE_CRON_TO_RELAUNCH}" = "TRUE" ]; then
print_info_msg "$VERBOSE" "
Copying contents of user cron table to backup file:
crontab_backup_fp = \"${crontab_backup_fp}\""
if [ "$MACHINE" = "WCOSS_DELL_P3" ]; then
cp_vrfy "/u/$USER/cron/mycrontab" "${crontab_backup_fp}"
else
crontab -l > ${crontab_backup_fp}
fi

called_from_cron=${called_from_cron:-"FALSE"}
get_crontab_contents called_from_cron=${called_from_cron} \
outvarname_crontab_cmd="crontab_cmd" \
outvarname_crontab_contents="crontab_contents"
echo "${crontab_contents}" > "${crontab_backup_fp}"
#
# Below, we use "grep" to determine whether the crontab line that the
# variable CRONTAB_LINE contains is already present in the cron table.
Expand All @@ -514,23 +516,23 @@ Copying contents of user cron table to backup file:
crontab_line_esc_astr=$( printf "%s" "${CRONTAB_LINE}" | \
$SED -r -e "s%[*]%\\\\*%g" )
#
# In the grep command below, the "^" at the beginning of the string be-
# ing passed to grep is a start-of-line anchor while the "$" at the end
# of the string is an end-of-line anchor. Thus, in order for grep to
# find a match on any given line of the output of "crontab -l", that
# line must contain exactly the string in the variable crontab_line_-
# esc_astr without any leading or trailing characters. This is to eli-
# minate situations in which a line in the output of "crontab -l" con-
# tains the string in crontab_line_esc_astr but is precedeeded, for ex-
# ample, by the comment character "#" (in which case cron ignores that
# line) and/or is followed by further commands that are not part of the
# string in crontab_line_esc_astr (in which case it does something more
# than the command portion of the string in crontab_line_esc_astr does).
#
if [ "$MACHINE" = "WCOSS_DELL_P3" ];then
# In the grep command below, the "^" at the beginning of the string
# passed to grep is a start-of-line anchor, and the "$" at the end is
# an end-of-line anchor. Thus, in order for grep to find a match on
# any given line of the cron table's contents, that line must contain
# exactly the string in the variable crontab_line_esc_astr without any
# leading or trailing characters. This is to eliminate situations in
# which a line in the cron table contains the string in crontab_line_esc_astr
# but is precedeeded, for example, by the comment character "#" (in which
# case cron ignores that line) and/or is followed by further commands
# that are not part of the string in crontab_line_esc_astr (in which
# case it does something more than the command portion of the string in
# crontab_line_esc_astr does).
#
if [ "$MACHINE" = "WCOSS_DELL_P3" ]; then
grep_output=$( grep "^${crontab_line_esc_astr}$" "/u/$USER/cron/mycrontab" )
else
grep_output=$( crontab -l | grep "^${crontab_line_esc_astr}$" )
grep_output=$( echo "${crontab_contents}" | grep "^${crontab_line_esc_astr}$" )
fi
exit_status=$?

Expand All @@ -548,10 +550,10 @@ Adding the following line to the user's cron table in order to automatically
resubmit SRW workflow:
CRONTAB_LINE = \"${CRONTAB_LINE}\""

if [ "$MACHINE" = "WCOSS_DELL_P3" ];then
if [ "$MACHINE" = "WCOSS_DELL_P3" ]; then
echo "${CRONTAB_LINE}" >> "/u/$USER/cron/mycrontab"
else
( crontab -l; echo "${CRONTAB_LINE}" ) | crontab -
( echo "${crontab_contents}"; echo "${CRONTAB_LINE}" ) | ${crontab_cmd}
fi

fi
Expand Down Expand Up @@ -912,8 +914,8 @@ cp_vrfy $USHDIR/${EXPT_CONFIG_FN} $EXPTDIR
#
# For convenience, print out the commands that need to be issued on the
# command line in order to launch the workflow and to check its status.
# Also, print out the command that should be placed in the user's cron-
# tab in order for the workflow to be continually resubmitted.
# Also, print out the line that should be placed in the user's cron table
# in order for the workflow to be continually resubmitted.
#
#-----------------------------------------------------------------------
#
Expand Down Expand Up @@ -978,14 +980,14 @@ Note that:
task(s) to the queue.
2) In order for the output of the rocotostat command to be up-to-date,
the rocotorun command must be issued immediately before the rocoto-
stat command.
the rocotorun command must be issued immediately before issuing the
rocotostat command.
For automatic resubmission of the workflow (say every 3 minutes), the
following line can be added to the user's crontab (use \"crontab -e\" to
edit the cron table):
*/3 * * * * cd $EXPTDIR && ./launch_FV3LAM_wflow.sh
*/3 * * * * cd $EXPTDIR && ./launch_FV3LAM_wflow.sh called_from_cron=\"TRUE\"
"

fi
Expand Down
Loading

0 comments on commit 2684204

Please sign in to comment.