[develop] Introduce test cases from ufs-case-studies platform thru WE2E #822

clouden90 · 2023-06-05T14:37:56Z

DESCRIPTION OF CHANGES:

The UFS Case Studies Platform provides a set of cases that reveal the forecast challenges of NOAA's operational Global Forecast System (GFS). Here we introduce one of these cases: [2020 Cold Air Damming](2020 Cold Air Damming) into UFS SRW thru WE2E testing framework. A yaml config file is added and moderate modifications are done for exregional_get_extrn_mdl_files.sh. This new function allows users to run any test cases from UFS Case Studies Platform directly thru WE2E framework without need of additional steps (e.g. download ICS/LBCS data from platform first). User can still modify the yaml file to suit their needs (e.g. increase fcst time, play with different grid resolution or CCPP suite).

Additionally, we added CCPP-SCM user and technical guide as a reference in Section 8.2 for users who are interested in running single column model.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

TESTS CONDUCTED:

On Level 1 systems, git clone the feature branch, navigate to the ufs-srweather-app folder, and check out the external repositories. Then navigate to tests folder and follow the instruction below:

./build.sh orion intel
module use ../modulefiles
module load wflow_orion
conda activate regional_workflow
cd WE2E/
./run_WE2E_tests.py -t 2020_CAD -m orion -a epic-ps --expt_basedir "ufs_case_studies" --exec_subdir=install_intel/exec -q

The modeled T2M were compared with RAP analysis, and the conclusions are consistent with the results shown here

CHECKLIST

My code follows the style guidelines in the Contributor's Guide
I have performed a self-review of my own code using the Code Reviewer's Guide
I have commented my code, particularly in hard-to-understand areas
My changes need updates to the documentation. I have made corresponding changes to the documentation
My changes do not require updates to the documentation (explain).
My changes generate no new warnings
New and existing tests pass with my changes
Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

EdwardSnyder-NOAA · 2023-06-12T14:21:42Z

The 2020_CAD experiment passed on Cheyenne intel for me, but I had to manually get the initial conditions as the get_extrn_* steps failed. Cheyenne's compute node had trouble connecting to S3 to download the initial condition tar file, so I had to downloaded and stage it locally. I had the same problem on Jet, so I'm curious how you were able to run the get_extrn_* tasks on the other tier one platforms? From my understanding, the compute nodes don't have internet access.

clouden90 · 2023-06-12T16:27:25Z

The 2020_CAD experiment passed on Cheyenne intel for me, but I had to manually get the initial conditions as the get_extrn_* steps failed. Cheyenne's compute node had trouble connecting to S3 to download the initial condition tar file, so I had to downloaded and stage it locally. I had the same problem on Jet, so I'm curious how you were able to run the get_extrn_* tasks on the other tier one platforms? From my understanding, the compute nodes don't have internet access.

Thanks for testing, Ed. Good catch. Normally on compute node you do not have access to internet. I have tested 2020_CAD experiment on Hera, Orion, and Gaea. Hera and Orion have service partition so you can submit jobs with internet access. Gaea has specific nodes to allow you do data transfer and the associated changes have been included in this PR. Unfortunately I do not have access of Jet and Cheyenne, but I guess they may have similar partitions? @MichaelLueken do you have any inputs?

MichaelLueken · 2023-06-12T16:45:57Z

@clouden90 I can't speak on Cheyenne, but looking through the coverage and functional tests on Jet, there shouldn't be an issue pulling data from HPSS or AWS. We will likely need to stage data on Cheyenne if we want to run these tests on that machine, but there are get_from_HPSS and get_from_AWS tests run on Jet (and Hera), so that shouldn't be an issue.

I'm currently working on testing this PR on Jet and will let you know if I encounter this issue as well.

MichaelLueken · 2023-06-12T17:47:36Z

@clouden90 In the ush/machine/*.yaml files, the necessary partitions to pull data from the internet should already be defined. While this isn't the case for Cheyenne (it only has access to the regular partition), the rest of the machines appear to be split between the compute node partition and the service partition, which should allow access to the internet. When I run the 2020_CAD test on Jet, I see the following:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
2020_CAD                                                           COMPLETE              34.66
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE              34.66

----------------------------------------------------------------------------------------------------
Detailed summary of experiment 2020_CAD
in directory /mnt/lfs1/NAGAPE/epic/Michael.Lueken/expt_dirs/2020_CAD
                                        | Status    | Walltime   | Core hours used
----------------------------------------------------------------------------------------------------
make_grid_202002031200                    SUCCEEDED          19.0           0.13
get_extrn_ics_202002031200                SUCCEEDED         457.0           0.13
get_extrn_lbcs_202002031200               SUCCEEDED         840.0           0.23
make_orog_202002031200                    SUCCEEDED          42.0           0.28
make_sfc_climo_202002031200               SUCCEEDED          41.0           0.55
make_ics_mem000_202002031200              SUCCEEDED         212.0           2.83
make_lbcs_mem000_202002031200             SUCCEEDED         224.0           2.99
run_fcst_mem000_202002031200              SUCCEEDED         444.0          23.68
run_post_mem000_f000_202002031200         SUCCEEDED          69.0           0.92
run_post_mem000_f001_202002031200         SUCCEEDED          49.0           0.65
run_post_mem000_f002_202002031200         SUCCEEDED          16.0           0.21
run_post_mem000_f003_202002031200         SUCCEEDED          46.0           0.61
run_post_mem000_f004_202002031200         SUCCEEDED          45.0           0.60
run_post_mem000_f005_202002031200         SUCCEEDED          48.0           0.64
run_post_mem000_f006_202002031200         SUCCEEDED          16.0           0.21
----------------------------------------------------------------------------------------------------
Total                                     COMPLETE                         34.66

The test is running without isue for me on Jet.

MichaelLueken

@clouden90 These changes look good to me!

I was able to successfully run the new 2020_CAD test on Jet. I wouldn't expect the new test to work on Cheyenne, you need to pre-stage data on that machine, but the rest of the machines should be able to pull the necessary data from AWS. So, while I will give my approval, I'd like to see if @EdwardSnyder-NOAA is still encountering issues with running the test on Hera, Jet, or Orion.

EdwardSnyder-NOAA · 2023-06-14T22:13:18Z

I was able to get data on Jet now, so not sure what went wrong the first time. Are we wanting this test to run on Cheyenne? If so, we should pre-stage the data there, which is something I can help with. I'll approve the PR once the data is staged or if the decision is to not to run this on Cheyenne.

clouden90 · 2023-06-15T14:02:24Z

I was able to get data on Jet now, so not sure what went wrong the first time. Are we wanting this test to run on Cheyenne? If so, we should pre-stage the data there, which is something I can help with. I'll approve the PR once the data is staged or if the decision is to not to run this on Cheyenne.

@EdwardSnyder-NOAA: Thanks for testing, and I'm glad to hear that you can now pass the test on Jet. Ideally, it would be great if we could make this test work on all the Tier 1 NOAA machines, including Cheyenne. Please note that the end date for this specific deliverable is 6/23. Do you think it's possible to have the pre-staged data ready on Cheyenne before that? In the meantime, I can add a note in the test config YAML file to notify users that this test will require pre-staged data on Cheyenne. Does this sound good?

EdwardSnyder-NOAA · 2023-06-15T14:44:19Z

@clouden90 - Yeah, we can pre-stage the data by then. It looks like this data is FV3GFS, so I'll place it with the other case/test input model data here: /glade/work/epicufsrt/contrib/UFS_SRW_data/develop/input_model_data/FV3GFS/nemsio/2020020312

MichaelLueken · 2023-06-15T14:47:24Z

@clouden90 Before this PR is merged, since @EdwardSnyder-NOAA is working on staging the data on Cheyenne for the new test, the new 2020_CAD test should be added to the tests/WE2E/machine_suites/comprehensive* files, to ensure that the test is run on every machine as part of the comprehensive testing. I'd also recommend that the new test get added to one of the tests/WE2E/machine_suites/coverage.* files, so that it is run regularly as part of the Jenkins automated testing.

clouden90 · 2023-06-15T15:00:20Z

@clouden90 - Yeah, we can pre-stage the data by then. It looks like this data is FV3GFS, so I'll place it with the other case/test input model data here: /glade/work/epicufsrt/contrib/UFS_SRW_data/develop/input_model_data/FV3GFS/nemsio/2020020312

Thanks @EdwardSnyder-NOAA for the support! Since I do not have account on Cheyenne, would you mind to re-do the test on Cheyenne once the pre-staging data is ready? Thanks

EdwardSnyder-NOAA · 2023-06-15T17:05:43Z

@clouden90 - Yeah, we can pre-stage the data by then. It looks like this data is FV3GFS, so I'll place it with the other case/test input model data here: /glade/work/epicufsrt/contrib/UFS_SRW_data/develop/input_model_data/FV3GFS/nemsio/2020020312

Thanks @EdwardSnyder-NOAA for the support! Since I do not have account on Cheyenne, would you mind to re-do the test on Cheyenne once the pre-staging data is ready? Thanks

The data has been staged on Cheyenne and the test passed successfully. These are the changes (highlighted by **) I made to the config.2020_CAD.yaml in order for the get_extrn_* tasks to fetch the data locally:

task_get_extrn_ics:
  EXTRN_MDL_NAME_ICS: **FV3GFS**
  FV3GFS_FILE_FMT_ICS: nemsio
  **USE_USER_STAGED_EXTRN_FILES: true**
task_get_extrn_lbcs:
  EXTRN_MDL_NAME_LBCS: **FV3GFS**
  LBC_SPEC_INTVL_HRS: 3
  FV3GFS_FILE_FMT_LBCS: nemsio
  **USE_USER_STAGED_EXTRN_FILES: true**

clouden90 · 2023-06-15T17:26:35Z

@EdwardSnyder-NOAA , Thanks again for staging data on Cheyenne and sharing the changes! I will add a note in the description section to include your modifications for users who are interested in running this test on Cheyenne.

MichaelLueken · 2023-06-15T18:04:02Z

@EdwardSnyder-NOAA Unfortunately, if the USE_USER_EXTRN_STAGED_FILES variable is set to true in the config.2020_CAD.yaml file, then the data needs to be staged on all machines that it will be run on. If pre-staged data isn't found, then the test will fail.

@clouden90 Following the update to the develop branch this morning, there is now a conflict in Components.rst. Please merge the latest develop into your branch and correct the conflict, then we should be able to move forward. Thanks!

clouden90 · 2023-06-15T18:21:32Z

@EdwardSnyder-NOAA Unfortunately, if the USE_USER_EXTRN_STAGED_FILES variable is set to true in the config.2020_CAD.yaml file, then the data needs to be staged on all machines that it will be run on. If pre-staged data isn't found, then the test will fail.

@clouden90 Following the update to the develop branch this morning, there is now a conflict in Components.rst. Please merge the latest develop into your branch and correct the conflict, then we should be able to move forward. Thanks!

@MichaelLueken , thanks! I have merged the latest develop, and add 2020_CAD test to comprehensive.orion and coverage.orion. Also @EdwardSnyder-NOAA suggestions are added as a note in description session for users who are interested in running this test on Cheyenne.

clouden90 · 2023-06-20T19:22:23Z

@EdwardSnyder-NOAA , as @MichaelLueken mentioned, the USE_USER_EXTRN_STAGED_FILES variable is set to true but pre-staged data isn't found, the test will fail. I have added your modification in the description session of config.2020_CAD.yaml. Feel free to let me know If you have any comments, suggestions, or concerns regarding this. If you are satisfied with the changes and find them acceptable, could you kindly consider approving the pull request at your convenience? Thanks

EdwardSnyder-NOAA

Thanks for adding that note! LGTM.

MichaelLueken · 2023-06-21T15:28:21Z

@clouden90 The Jenkins automated tests passed on Cheyenne, Hera, and Jet. The tests failed on Orion due to the inability to clone the ccpp-physics repository (a known issue that requires git/2.28.0 to be loaded in the .bashrc file on the machine before the tests can run). I am currently running the Jenkins tests manually on Orion. Once they are complete, I will move forward with merging this PR.

MichaelLueken · 2023-06-21T15:31:42Z

@clouden90 The manual submission of the Jenkins tests on Orion have all passed. Moving forward with merging this PR now.

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
deactivate_tasks                                                   COMPLETE               1.07
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me  COMPLETE             760.15
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta   COMPLETE             265.20
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot        COMPLETE             140.06
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta            COMPLETE              15.93
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_2017_gfdlmp  COMPLETE              14.14
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR              COMPLETE             384.49
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16   COMPLETE              29.58
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16    COMPLETE             281.31
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0         COMPLETE              14.95
nco                                                                COMPLETE               7.78
2020_CAD                                                           COMPLETE              31.48
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1946.14

The reference of CCPP-SCM was missing in PR #822. Here, we add it back.

clouden90 and others added 5 commits May 19, 2023 12:12

Mod scripts

d03e147

change default ccpp suite

5436274

mod gaea machine config

190e5ce

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

c69c7a4

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

8315ab2

MichaelLueken changed the title ~~Introduce test cases from ufs-case-studies platform thru WE2E~~ [develop] Introduce test cases from ufs-case-studies platform thru WE2E Jun 5, 2023

MichaelLueken added the enhancement New feature or request label Jun 5, 2023

clouden90 added 2 commits June 5, 2023 16:08

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

591b324

Update Components.rst

684a7fd

michelleharrold pushed a commit to michelleharrold/ufs-srweather-app that referenced this pull request Jun 7, 2023

fixed typo (ufs-community#822)

ea7c109

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

95f2460

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

c8ac7b3

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

81e85e0

MichaelLueken approved these changes Jun 14, 2023

View reviewed changes

clouden90 and others added 3 commits June 15, 2023 13:15

Update yaml files

47634bc

Merge branch 'ufs-community:develop' into feature/ufs-case-studies

03765c7

add test to comprehensive.orion and coverage.orion

ec771bd

EdwardSnyder-NOAA approved these changes Jun 20, 2023

View reviewed changes

MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Jun 21, 2023

MichaelLueken merged commit 4932f02 into ufs-community:develop Jun 21, 2023
3 of 5 checks passed

clouden90 mentioned this pull request Jun 23, 2023

[develop] Add ccpp-scm reference #842

Merged

22 tasks

MichaelLueken pushed a commit that referenced this pull request Jun 26, 2023

[develop] Add ccpp-scm reference (#842)

cb1864d

The reference of CCPP-SCM was missing in PR #822. Here, we add it back.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[develop] Introduce test cases from ufs-case-studies platform thru WE2E #822

[develop] Introduce test cases from ufs-case-studies platform thru WE2E #822

clouden90 commented Jun 5, 2023 •

edited

Loading

EdwardSnyder-NOAA commented Jun 12, 2023 •

edited

Loading

clouden90 commented Jun 12, 2023 •

edited

Loading

MichaelLueken commented Jun 12, 2023

MichaelLueken commented Jun 12, 2023

MichaelLueken left a comment

EdwardSnyder-NOAA commented Jun 14, 2023

clouden90 commented Jun 15, 2023

EdwardSnyder-NOAA commented Jun 15, 2023 •

edited

Loading

MichaelLueken commented Jun 15, 2023

clouden90 commented Jun 15, 2023

EdwardSnyder-NOAA commented Jun 15, 2023

clouden90 commented Jun 15, 2023

MichaelLueken commented Jun 15, 2023

clouden90 commented Jun 15, 2023

clouden90 commented Jun 20, 2023

EdwardSnyder-NOAA left a comment

MichaelLueken commented Jun 21, 2023

MichaelLueken commented Jun 21, 2023 •

edited

Loading

[develop] Introduce test cases from ufs-case-studies platform thru WE2E #822

[develop] Introduce test cases from ufs-case-studies platform thru WE2E #822

Conversation

clouden90 commented Jun 5, 2023 • edited Loading

DESCRIPTION OF CHANGES:

Type of change

TESTS CONDUCTED:

CHECKLIST

LABELS (optional):

EdwardSnyder-NOAA commented Jun 12, 2023 • edited Loading

clouden90 commented Jun 12, 2023 • edited Loading

MichaelLueken commented Jun 12, 2023

MichaelLueken commented Jun 12, 2023

MichaelLueken left a comment

Choose a reason for hiding this comment

EdwardSnyder-NOAA commented Jun 14, 2023

clouden90 commented Jun 15, 2023

EdwardSnyder-NOAA commented Jun 15, 2023 • edited Loading

MichaelLueken commented Jun 15, 2023

clouden90 commented Jun 15, 2023

EdwardSnyder-NOAA commented Jun 15, 2023

clouden90 commented Jun 15, 2023

MichaelLueken commented Jun 15, 2023

clouden90 commented Jun 15, 2023

clouden90 commented Jun 20, 2023

EdwardSnyder-NOAA left a comment

Choose a reason for hiding this comment

MichaelLueken commented Jun 21, 2023

MichaelLueken commented Jun 21, 2023 • edited Loading

clouden90 commented Jun 5, 2023 •

edited

Loading

EdwardSnyder-NOAA commented Jun 12, 2023 •

edited

Loading

clouden90 commented Jun 12, 2023 •

edited

Loading

EdwardSnyder-NOAA commented Jun 15, 2023 •

edited

Loading

MichaelLueken commented Jun 21, 2023 •

edited

Loading