-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Added an option for RRFS external model files used as ICS and LBCS #1089
base: develop
Are you sure you want to change the base?
[develop] Added an option for RRFS external model files used as ICS and LBCS #1089
Conversation
I'll move this work to On-Hold until a WE2E test has been added so that we can properly test this new functionality. |
NB: @MichaelLueken - this PR requires an option "RRFS" to be allowed in UFS_UTILS. Current develop branch of ufs-community/UFS_UTILS does have the "RRFS" option enabled, but the version checked out by the SRW does not (a correction needs to be to allow it). How should we proceed with this requirement? |
An updated tag could be used for the UFS_UTILS that had this option implemented |
I'll check if updating the version of I'll go ahead and try updating the |
What version of |
It looks like |
Yes - thank you!! I was having troubles finding exact time/version when this change was implemented!. |
Using the version of UFS_UTILS associated with PR #902 is causing a failure in the fundamental tests:
The
I'll try backing my way through the commits in the UFS_UTILS repository to see which entry is causing issues with |
@MichaelLueken - thank you for testing!! Let me look into these errors - look like a data problem. I might need to stage an additional directory in the EPIC space with data that I though was not needed... will get back to you! |
|
||
CHGRES_CUBE=${SRW_DIR}/sorc/UFS_UTILS/sorc/chgres_cube.fd | ||
${SED} -i 's/"RAP","HRRR"/"RAP","HRRR","RRFS"/g' ${CHGRES_CUBE}/program_setup.F90 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an appropriate fix to fortran codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a temporary workaround to allow chgres_cube to build before a later tag is used that officially introduces RRFS capability. The current UFS_UTILS tag simply flag the "RRFS" option as not valid, and reports an error. Newer UFS_UTILS tags do not yet work with the SRW, and require more adaptation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christinaholtNOAA -
please see comments:
#1089 (comment)
#1089 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed an extra empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The appropriate workaround to needing code not yet committed to an authoritative repo is to keep the necessary code in a branch and point to that branch in the interim. Editing external code in the build process is a bad practice that sets a horrible precedent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updating the UFS_UTILS hash to 1dac855 would bring in the necessary change to chgres_cube's
program_setup.F90
. However, all WE2E tests that use RAP and HRRR physics will fail while moving to this version of UFS_UTILS, as it is after the fractional grid update. It doesn't look like there has been any work on issue #961 in UFS_UTILS, so this is still keeping the SRW App from moving forward with updated versions of UFS_UTILS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally understand that there is no way to get the UFS_UTILS hash updated right now.
There are two ways to handle it -- a temporary branch in a fork that EPIC controls to hold this source code change, or holding off on this PR until that update can be made.
Editing source code in the build script is unacceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@christinaholtNOAA @MichaelLueken
Another possible approach could be to solve the issue of chgres_cube flagging the RRFS option to be done on SRW scripts level, instead of changing the error condition on the UFS_UTILS level.
My suggestion is to change the variable "external_model" from "RRFS" to "HRRR" in namelist fort.41 files before chgres_cube is called. These namelist files are stored in $EXPTDIR/[TEST_NAME]/[YYYYMMDDHH]/tmp_MAKE_ICS/ and ./tmp_MAKE_LBCS/ .
The changes are to be done in exregional_make_ics.sh and exregional_make_lbcs.sh, or in exregional_get_extrn_mdl_files.sh
That way the RRFS files interpolated to a regular grid still could be retrieved from the RRFS AWS location as needed, according to the external model type set for the experiment, but will be processed by chgres_cube similar to those for HRRR. This should be functional as long as HRRR and RAP remain as options for the SRW initial and lateral boundary condition.
Please let me know if this sounds a viable alternative to error flag removal in chgres_cube's program_setup.F90!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labeling RRFS ICs/LBCs as "HRRR" to get it through is fine with me. Modifying source code is not.
@@ -3,6 +3,7 @@ grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot | |||
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 | |||
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR | |||
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta | |||
grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of the coverage tests is that by running a portion of the suite on each platform, all the comprehensive tests will have run. Should this only be done on one platform?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other tests seem to exist for all the machines; e.g,
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the test should only be added to a single platform's coverage test suite. I'd recommend either Hera GNU, Derecho, Orion, or Hercules (machines with only 11 tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichaelLueken thanks for confirming!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we have a coverage.cheyenne file now that Cheyenne has been decommissioned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adamant that this more expensive ensemble test not be included as a test on every machine. I appreciate @MichaelLueken's suggestion above about choosing one of the platforms with a lighter load.
...uites_community/config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml
Outdated
Show resolved
Hide resolved
...uites_community/config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml
Outdated
Show resolved
Hide resolved
...uites_community/config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml
Outdated
Show resolved
Hide resolved
...uites_community/config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml
Outdated
Show resolved
Hide resolved
updated /WE2E/test_configs/grids_extrn_mdls_suites_community/config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml Co-authored-by: Christina Holt <[email protected]>
Placing variable in alphabetical order as requested
removed debugging diagnostics
@gspetro-NOAA - |
...uites_community/config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta.yaml
Show resolved
Hide resolved
A current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During one of the latest pushes, it appears as though changes to the documentation had been undone. Please reapply these updates to the documentation.
Co-authored-by: Michael Lueken <[email protected]>
Co-authored-by: Michael Lueken <[email protected]>
@MichaelLueken - thank you for noticing documentation changes! I'll compare with the changes done and documentation built at a local system, to verify all the necessary changes are in place |
@MichaelLueken - |
Thanks for reapplying the documentation modifications again! The documentation updates look good to me. It caught me off guard when I noticed that they had been changed to the way they were before you originally addressed my concerns with the documentation, but everything is once again good to go. |
My apologies - some merges did not go well right away when I attempted to implement changes, address comments, and run a new test before yesterday's demo. Using different platforms to test the changes (AWS, Hera), build and change documentation (local system), and addressing comments + commits (GitHub) were likely not recorded properly. I'm glad that it's back to the expected. |
I think I have finally figured out the issue with chgres_cube v2 surface files (fractional grid) not being read in by the weather model. RAP and HRRR use RUC LSM, which requires setting I'll need to do some work tomorrow to fully figure this out, but I should be able to update the UFS_UTILS hash to 1dac855, which will include the necessary changes to Thank you, Michael, fingers crossed!! |
@@ -61,6 +59,45 @@ usage_error () { | |||
exit 1 | |||
} | |||
|
|||
# remove_directory function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being difficult, but I don't like this solution either. Why do we need to give the users a choice for what directories to delete? If users want to specify some custom directory to delete, they can just delete it! Get rid of these options entirely, simply hard-code the directories and files that need to be deleted. No need to bother with input arguments and checking for wildcards: all of this complexity just leads to more potential failure modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to get a better understanding what are the features that are desired vs. what was coded earlier. Let me list the old/existing behavior and a new behavior below; please let me know what is to be added to the existing features, and what are excessive. I did refer to the issues mentioned in #1073, such as removing directories outside the SRW tree, or those mentioned in the links: this Quora link , ValveSoftware/steam-for-linux#3671 , which included raising issues of having empty variables or wildcards.
The old behavior:
- User requests what type of cleaning is to be done using arguments to a script, which could be to do the following
- remove build directory only;
- remove all build artifacts (except for built conda directory + conda_loc file);
- remove all checked-out external repositories/submodules;
- remove conda directory and conda_loc;
- When type of cleaning is determined based on the arguments, the script checks if a directory or directories exists and then removes them. This is done for all but conda + conda_loc (check for directory existence not done).
New behavior, type of cleaning to be done using similar arguments:
- Checks whether a directory exists;
- Check whether a directory is immediately under the expected SRW tree (ask to confirm a remove request if it is not)
- Check whether a directory name contains wildcards, such as " [ ] | ? + *" or a space, do not remove if it does and ask the user to do it from a command line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me clarify some of the background of my comments: I did not know about the existence of this script for some time, and when I found out about it, I immediately noticed some unsafe behavior as I've mentioned in my open issue. On thinking about it further as I typed up that issue, it struck me that this script is completely unnecessary: all it does is remove directories in known locations. But since you insisted it is still necessary to support users with no advanced knowledge, then I can be happy with a simple script that removes the default directory locations, even if I personally think it's unnecessary.
This functionality that you have added in this PR has greatly increased the complexity of the script, and offers countless more potential failure modes due to introducing complex/advanced regex that purports to remove certain potentially dangerous characters. Furthermore, the functionality you are attempting to preserve (deleting custom build, install, etc. directories) only applies to advanced users anyway, who should be able to follow very simple instructions to remove the directories if they want a clean build. These are directories that they will already have had to specify explicitly at build time (assuming that they are different than the default locations), and it is functionality that is not even documented in the users guide!
Simple is better here, especially if you are talking about a tool for supporting novice users. I propose that this script, if it must continue to exist, simply deletes the default locations of these directories (relative to the script itself), and does not attempt to allow the user to input custom locations for these directories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mkavulich - thank you for your comments. This devclean.sh script was added in the late 2022: #415
This devclean.sh was created to accompany a devbuild.sh script, which accept custom locations for the build artifacts that could be specified, and have variables INSTALL_DIR, CONDA_DIR, and BIN_DIR available. It is also included in the SRW Documentation
https://ufs-srweather-app.readthedocs.io/en/develop/UsersGuide/Reference/FAQ.html#how-can-i-clean-up-the-srw-app-code-if-something-went-wrong-during-the-build
If these variables are not set by the user, they accept default locations, such as under main SRW tree.
So if the original code looks good enough, I could only proposed to add a check for the existence of a CONDA_DIR before removing it, when conda cleaning is requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also included in the SRW Documentation
I am talking specifically about the customization options for INSTALL_DIR, etc. This is advanced behavior that is only documented in the script itself, so I do not expect anyone to be using it outside of advanced users.
if the original code looks good enough
Do you mean the code before your changes here? I have already expressed my concerns about that in Issue #1073 that started this discussion. Otherwise I'm not sure what you mean.
I don't think you are addressing my primary concern here: There needs to be a good reason for including all this complexity and potential failure modes in a script, especially one explicitly designed for novice users, and I just do not see that reason at all. Why include all of this for a use case that I can't imagine anyone ever needing? You are implying that there are users out there who are advanced enough to have a very specific use case which requires customizing their build, install, or conda directories, but these same hypothetical users are not advanced enough to follow simple instructions that they should remove those same directories to clean up an existing build? More code is not always better, and I don't understand the argument against replacing this now-200+ line script with a single sentence in the users guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR has grown to accumulate too much in general. Please limit PRs to one feature.
The title should definitely be changed since it is not only adding support for RRFS ICs/LBCS, but also introducing graphics for ensembles, and addressing issues with the clean script.
All the features must pass PR before you can merge any of them and that's going to be an extremely tall order given the roadblocks of these particular features.
|
||
CHGRES_CUBE=${SRW_DIR}/sorc/UFS_UTILS/sorc/chgres_cube.fd | ||
${SED} -i 's/"RAP","HRRR"/"RAP","HRRR","RRFS"/g' ${CHGRES_CUBE}/program_setup.F90 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labeling RRFS ICs/LBCs as "HRRR" to get it through is fine with me. Modifying source code is not.
--bin-dir=BIN_DIR | ||
binary directory name ("exec" by default); full path is \${INSTALL_DIR}/\${BIN_DIR}) | ||
--build-dir=BUILD_DIR | ||
main build directory, absolute path (\${SRW_DIR}/build/ by default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This option should not exist bc it's dangerous. If a user chooses to go against the default, then they need to clean their own directories. It's perfectly acceptable to provide support for ppl who go the supported path and leave the others to their own devices.
This applies to any of the flags that allow users to specify something other than the default build directories created by the build script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This options is created to mimic the options in ./devbuild.sh, as the script ./devclean.sh is stated to be cleaning any issues with the build:
https://ufs-srweather-app.readthedocs.io/en/develop/UsersGuide/Reference/FAQ.html#how-can-i-clean-up-the-srw-app-code-if-something-went-wrong-during-the-build
native: '{{ platform.SCHED_NATIVE_CMD }}' | ||
nnodes: 1 | ||
nnodes: 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plotting script MPI capable? If not, this is wasting resources that will never be used.
# | ||
# Path to the RRFS/HRRRX geogrid file. | ||
# | ||
geogrid_file_input_grid="${FIXgsm}/geo_em.d01.nc_HRRRX" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the geogrid file used to produce RRFS or HRRR or both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used for both.
@@ -3,6 +3,7 @@ grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot | |||
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16 | |||
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR | |||
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta | |||
grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am adamant that this more expensive ensemble test not be included as a test on every machine. I appreciate @MichaelLueken's suggestion above about choosing one of the platforms with a lighter load.
@christinaholtNOAA @MichaelLueken @mkavulich |
DESCRIPTION OF CHANGES:
An option to use RRFS model output (control) files are added as initial and lateral boundary conditions, ICS and LBCS.
RRFS_a data for the test was retrieved from the NODD website ((https://registry.opendata.aws/noaa-rrfs/)), pressure-level grib2 files from the control directory, RRFS forecasts interpolated into 3-km regular grid.
A new test has been added
grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta
with RRFS input files for the event on 06/05/2024 with the tornadoes reported in MarylandWorkflow for the plotting tasks has been updated to allow graphic output for individual ensemble members.
Python plotting script is updated to have geographical data visible over the plotted fields.
Updated devclean.sh script with safety checks, addressing
devclean.sh
demonstrates very unsafe directory removal behavior #1073 (devclean.sh
demonstrates very unsafe directory removal behavior #1073)UPDATE (6/13/2024):
RRFS data location:
https://noaa-rrfs-pds.s3.amazonaws.com/rrfs_a/rrfs_a.{yyyymmdd}/{hh}/control/
files are in the format
rrfs.t{hh}z.prslev.f{fcst_hr:03d}.conus.grib2
where {yyyymmdd} are 4-digit year, 2-digit month, and 2-digit day of the forecast cycle, and {hh} is a 2-digit hour of the forecast cycle (forecast start), and {fcst_hr:03d} is a 3-digit forecast hour.
Browsing the bucket could be done at the site: browse the bucket:
https://noaa-rrfs-pds.s3.amazonaws.com/index.html#rrfs_a/
For this PR, RRFS input data uses are interpolated into a regular 3-km grid, these files need older sfs_data v1.
The sfc_data v2 that contains rotated u,v fields or fractional grids will be needed to use a newer UFS_UTILS version and tag. This would allow use of full RRFS input files, i.e. on a native grid with no remapping into regular grids; these files are ~6GB per file, and also require higher-version of packages (g2) that are not present in a spack-stack v1.5.1 or 1.6.0.
The following needs to added to config.yaml file to use RRFS ICS/LBCS option: (an example)
An example of a config.yaml file is attached. It accessed the data from a pre-stage standard location. Variables such as EXTRN_MDL_SOURCE_BASEDIR_ICS, EXTRN_MDL_FILES_ICS,
EXTRN_MDL_SOURCE_BASEDIR_LBCS, EXTRN_MDL_FILES_LBCS
need to be added for another date/forecast cycle.
If data are not found on disk, it is retrieved from the AWS.
config.yaml.txt
Type of change
TESTS CONDUCTED:
Conducted a test for RRFS_CONUScompact_25km grid, setting ICS and LBCS to "RRFS" option, running on NOAA AWS cloud. One-, two-, and 3-ensemble member experiments.
A new test configured, config.grid_RRFS_CONUScompact_25km_ics_RRFS_lbcs_RRFS_suite_RRFS_v1beta , which could be launched on all the platforms, with data staged in standard location for EPIC project.
Fundamental tests (not including a newly developed one) pass successfully on AWS.
staged data for the test using RRFS ICS/LBCS for the SRW:
NOAA Cloud: /contrib/EPIC/UFS_SRW_data/develop/input_model_data/RRFS/
Derecho: /glade/work/epicufsrt/contrib/UFS_SRW_data/develop/input_model_data/RRFS/
Hera: /scratch1/NCEPDEV/nems/role.epic/UFS_SRW_data/develop/input_model_data
Gaea: /gpfs/f5/epic/world-shared/UFS_SRW_data/develop/input_model_data/RRFS/
Jet: /mnt/lfs4/HFIP/hfv3gfs/role.epic/UFS_SRW_data/develop/input_model_data/RRFS/
Orion/Hercules: /work/noaa/epic/role-epic/contrib/UFS_SRW_data/develop/input_model_data/RRFS/
A directory that uses forecast cycle date stamp for the test, ./2024060517, has 10 files:
rrfs.t17z.prslev.f000.conus.grib2
rrfs.t17z.prslev.f001.conus.grib2
rrfs.t17z.prslev.f002.conus.grib2
rrfs.t17z.prslev.f003.conus.grib2
rrfs.t17z.prslev.f004.conus.grib2
rrfs.t17z.prslev.f005.conus.grib2
rrfs.t17z.prslev.f006.conus.grib2
rrfs.t17z.prslev.f007.conus.grib2
rrfs.t17z.prslev.f008.conus.grib2
rrfs.t17z.prslev.f009.conus.grib2
DEPENDENCIES:
DOCUMENTATION:
A new option for "RRFS" used as ICS and LBCS may need to be documented.
ISSUE:
In preparation for RRFS integration tasks, option to use "RRFS" model file as ICS and LBCS was added.
CHECKLIST
LABELS (optional):
A Code Manager needs to add the following labels to this PR:
CONTRIBUTORS (optional):
@christinaholtNOAA