Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SRW-AQM] Update AQM task scripts with those of production/aqm_dev branch #1060

Merged
merged 29 commits into from
Mar 27, 2024

Conversation

chan-hoo
Copy link
Collaborator

DESCRIPTION OF CHANGES:

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

AQM we2e test

  • hera.intel
  • orion.intel
  • hercules.intel
  • cheyenne.intel
  • cheyenne.gnu
  • derecho.intel
  • gaea.intel
  • gaeac5.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

ISSUE:

Fixes issue mentioned in #1020

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chan-hoo -

I have completed an initial review of the changes in this PR. It looks like the path for upp_parm should include ${SRW_DIR}/parm in devbuild.sh. Also, all of the AQM task modulefiles include nco/5.1.6. Should this be nco/5.0.6?

devbuild.sh Outdated Show resolved Hide resolved
modulefiles/tasks/hera/aqm_ics.local.lua Show resolved Hide resolved
modulefiles/tasks/hera/aqm_lbcs.local.lua Show resolved Hide resolved
@chan-hoo
Copy link
Collaborator Author

chan-hoo commented Mar 21, 2024

@MichaelLueken, On hera, the available version is 4.9.3 or 5.1.6:

[Chan-hoo.Jeon@hfe06 ufs-srweather-app]$ module avail

---------------------------------------- /apps/lmod/lmod/modulefiles/Core -----------------------------------------
   lmod    settarg

-------------------------------------------- /apps/modules/modulefiles --------------------------------------------
   R/3.6.1                   cuda/12.2.1            idl/8.7                         ncl/6.6.2
   advisor/2019              cuda/12.3.1     (D)    idl/8.7.3                (D)    nco/4.9.3
   advisor/2020              ecflow/5.5.3           imagemagick/7.1.1-11            nco/5.1.6       (D)

Once your PR is merged, we can remove this call from modulefiles/tasks/hera/[task].

@MichaelLueken
Copy link
Collaborator

@chan-hoo -

Fair enough. I was using spack-stack's nco/5.0.6, but using the system default 5.1.6 is fine.

@MichaelLueken
Copy link
Collaborator

@chan-hoo -

Your changes look good! My testing of your work successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240321204642                   COMPLETE            2692.02
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            2692.02

Approving now.

@chan-hoo
Copy link
Collaborator Author

@MichaelLueken, thanks!!!

@chan-hoo
Copy link
Collaborator Author

@RatkoVasic-NOAA, could you please (plz..plz..plz) review this pr when you have time? :) It used to be very hard to find the second reviewer for an AQM pr. It would be very appreciated if you can review this pr as before, but no pressure at all !!! :) :)

@RatkoVasic-NOAA
Copy link
Collaborator

@chan-hoo I was already working on it :-)

@RatkoVasic-NOAA, could you please (plz..plz..plz) review this pr when you have time? :) It used to be very hard to find the second reviewer for an AQM pr. It would be very appreciated if you can review this pr as before, but no pressure at all !!! :) :)

@RatkoVasic-NOAA
Copy link
Collaborator

Test passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240322161126                   COMPLETE            2669.04
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            2669.04

Approving!

@chan-hoo
Copy link
Collaborator Author

@RatkoVasic-NOAA, thanks!!!

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Mar 22, 2024
@MichaelLueken
Copy link
Collaborator

@chan-hoo -

The Jenkins tests are failing with the following error messages (example given is from Jet, but this is the same for Hercules and Orion as well):

Traceback (most recent call last):
  File "/mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/tests/WE2E/./run_WE2E_tests.py", line 567, in <module>
    run_we2e_tests(homedir,args)
  File "/mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/tests/WE2E/./run_WE2E_tests.py", line 246, in run_we2e_tests
    expt_dir = generate_FV3LAM_wflow(ushdir,logfile=f"{ushdir}/log.generate_FV3LAM_wflow",
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/tests/WE2E/../../ush/generate_FV3LAM_wflow.py", line 71, in generate_FV3LAM_wflow
    expt_config = setup(ushdir,debug=debug)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/tests/WE2E/../../ush/setup.py", line 378, in setup
    expt_config = load_config_for_setup(USHdir, default_config_fp, user_config_fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/tests/WE2E/../../ush/setup.py", line 105, in load_config_for_setup
    raise Exception(errmsg)
Exception: Invalid key(s) specified in /mnt/lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/ush/config.yaml:
OPSROOT_default = /lfs1/NAGAPE/epic/role.epic/jenkins/workspace/s-srweather-app_pipeline_PR-1060/jet/nco_dirs

OPSROOT_default has been removed in this PR. It appears as though tests/WE2E/run_WE2E_tests.py still includes traces of NCO mode testing.

Please go though this script and remove the last remaining NCO sections, then I can resubmit the tests.

@chan-hoo
Copy link
Collaborator Author

@MichaelLueken, I've removed nco section from run_WE2E_tests.py.

@MichaelLueken
Copy link
Collaborator

@chan-hoo -

Tests are once again failing due to the inclusion of opsroot. Both .cicd/scripts/srw_ftest.sh and .cicd/scripts/srw_test.sh include nco_dir and srw_test.sh includes --opsroot=${nco_dir} in it's call to tests/WE2E/run_WE2E_tests.py. Please remove these final entries in the Jenkins scripts, then the tests should run correctly.

@chan-hoo
Copy link
Collaborator Author

@MichaelLueken, thanks for testing. I've removed them.

@MichaelLueken
Copy link
Collaborator

@chan-hoo -

This run is much better. The Hera Intel, Hercules, and Orion automated tests have successfully completed. Once the Gaea, Hera GNU, and Jet tests finish, I will move forward with merging this work.

@MichaelLueken
Copy link
Collaborator

The WE2E coverage tests were manually ran on Derecho and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_IndianOcean_6km_20240325074105                      COMPLETE              27.27
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              42.40
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024032507411  COMPLETE              50.64
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_HRRR_20240325  COMPLETE              34.28
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              20.14
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024032507411  COMPLETE              45.42
pregen_grid_orog_sfc_climo_20240325074118                          COMPLETE              17.29
specify_template_filenames_20240325074121                          COMPLETE              18.26
2019_hurricane_barry_20240325074122                                COMPLETE              40.58
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             296.28

@gspetro-NOAA
Copy link
Collaborator

@chan-hoo @MichaelLueken Is there a plan to document the changes in this PR? I'm happy to help, if need be, but if they go in separately from this PR, we'll need to create an issue.

@MichaelLueken
Copy link
Collaborator

@chan-hoo @MichaelLueken Is there a plan to document the changes in this PR? I'm happy to help, if need be, but if they go in separately from this PR, we'll need to create an issue.

@gspetro-NOAA -

This is a very good point. With the removal of NCO-specific variables in config_defaults.yaml, as well as the removal of the NCO WE2E tests, changes will need to be made to the documentation.

@chan-hoo -

If you would like to apply the modifications to the documentation in this PR, then please feel free to proceed. If you would like to add it in a separate PR, please open an issue for this documentation update. Thanks!

@MichaelLueken
Copy link
Collaborator

The automated Jenkins WE2E coverage tests successfully passed for Gaea:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240325132026                                           COMPLETE              57.25
custom_ESGgrid_NewZealand_3km_20240325132153                       COMPLETE              62.18
grid_RRFS_CONUScompact_13km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              44.53
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240325132  COMPLETE              52.84
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024032513  COMPLETE              47.51
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15_thompson  COMPLETE             353.83
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024032  COMPLETE              44.54
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_20  COMPLETE             306.40
grid_SUBCONUS_Ind_3km_ics_RAP_lbcs_RAP_suite_RRFS_v1beta_plot_202  COMPLETE              28.06
2020_CAPE_20240325132402                                           COMPLETE              45.99
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1043.13

On Hera GNU, the get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS WE2E test is failing. The error message is suggesting a CFL violation, but your changes shouldn't be affecting this test (and the rest of the v15p2 tests are also passing):

FATAL from PE 3: compute_qs: saturation vapor pressure table overflow, nbad= 1

Continued attempts to use rocotorewind and rocotoboot have failed as well. Will rerun the Hera GNU tests separately:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km_20240325145609                     COMPLETE              36.09
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200_202403  COMPLETE              10.37
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240325145613              DEAD                  15.54
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024032514  COMPLETE              43.67
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              24.09
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240325145  COMPLETE              19.11
long_fcst_20240325145618                                           COMPLETE              66.69
MET_verification_only_vx_20240325145619                            COMPLETE               0.26
MET_ensemble_verification_only_vx_time_lag_20240325145624          COMPLETE               9.27
2019_halloween_storm_20240325145626                                COMPLETE              51.44
----------------------------------------------------------------------------------------------------
Total                                                              DEAD                 276.53

Hera Intel:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Peru_12km_20240325145603                            COMPLETE              17.84
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_grib2_2019061200_2024032  COMPLETE               6.33
get_from_HPSS_ics_GDAS_lbcs_GDAS_fmt_netcdf_2022040400_ensemble_2  COMPLETE             766.67
get_from_HPSS_ics_HRRR_lbcs_RAP_20240325145608                     COMPLETE              14.08
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               6.36
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              12.69
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_RAP_suite_RAP_20240325145612  COMPLETE              10.19
grid_RRFS_CONUS_25km_ics_GSMGFS_lbcs_GSMGFS_suite_GFS_v15p2_20240  COMPLETE               6.28
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_202403  COMPLETE             232.30
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_20240325  COMPLETE             306.28
grid_RRFS_CONUScompact_3km_ics_HRRR_lbcs_RAP_suite_HRRR_202403251  COMPLETE             327.73
pregen_grid_orog_sfc_climo_20240325145621                          COMPLETE               6.41
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            1713.16

Hercules:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_GFDLgrid__GFDLgrid_USE_NUM_CELLS_IN_FILENAMES_eq_FALSE_202  COMPLETE               7.11
grid_CONUS_25km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_202  COMPLETE              10.48
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              28.27
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              17.39
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024032509  COMPLETE              25.16
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240325092  COMPLETE              49.77
grid_RRFS_CONUScompact_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              12.80
grid_RRFS_NA_13km_ics_FV3GFS_lbcs_FV3GFS_suite_RAP_20240325092549  COMPLETE              66.81
grid_SUBCONUS_Ind_3km_ics_NAM_lbcs_NAM_suite_GFS_v16_202403250925  COMPLETE              28.66
MET_verification_only_vx_20240325092550                            COMPLETE               0.27
specify_EXTRN_MDL_SYSBASEDIR_ICS_LBCS_20240325092552               COMPLETE               7.66
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             254.38

Jet:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
community_20240325151610                                           COMPLETE              20.24
custom_ESGgrid_20240325151613                                      COMPLETE              21.65
custom_ESGgrid_Great_Lakes_snow_8km_20240325151614                 COMPLETE              15.03
custom_GFDLgrid_20240325151616                                     COMPLETE              11.33
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2021032018_202403  COMPLETE               9.41
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_netcdf_2022060112_48h_20  COMPLETE              55.52
get_from_HPSS_ics_RAP_lbcs_RAP_20240325151620                      COMPLETE              17.04
grid_RRFS_AK_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240325151621  COMPLETE             223.87
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20  COMPLETE              39.64
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               7.54
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_2024  COMPLETE             497.13
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             918.40

Orion:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240325093819                             COMPLETE             161.46
deactivate_tasks_20240325093821                                    COMPLETE               1.00
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me  COMPLETE             737.74
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_  COMPLETE             258.03
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240  COMPLETE             138.63
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202403250  COMPLETE              14.15
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240325093  COMPLETE             386.87
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              28.92
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2  COMPLETE             274.43
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202403  COMPLETE              24.76
2020_CAD_20240325093832                                            COMPLETE              30.32
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            2056.31

@chan-hoo
Copy link
Collaborator Author

@MichaelLueken @gspetro-NOAA, I'd like to update the document in another PR. I'll open an issue for this soon.

@MichaelLueken
Copy link
Collaborator

The rerun of the Hera GNU Jenkins tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
custom_ESGgrid_Central_Asia_3km_20240326193107                     COMPLETE              37.96
get_from_HPSS_ics_FV3GFS_lbcs_FV3GFS_fmt_nemsio_2019061200_202403  COMPLETE              12.54
get_from_NOMADS_ics_FV3GFS_lbcs_FV3GFS_20240326193112              COMPLETE              16.81
grid_RRFS_CONUS_13km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_2024032619  COMPLETE              44.77
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_202  COMPLETE              25.80
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240326193  COMPLETE              19.46
long_fcst_20240326193117                                           COMPLETE              67.72
MET_verification_only_vx_20240326193118                            COMPLETE               0.26
MET_ensemble_verification_only_vx_time_lag_20240326193120          COMPLETE               9.31
2019_halloween_storm_20240326193122                                COMPLETE              48.52
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             283.15

and issue #1064 has been opened to update the SRW User's Guide following the significant modifications that are coming in from this PR.

Since all testing has now successfully passed and an issue has been opened to address the need for documentation updates, I will now move forward with merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SRW-AQM] AQM scripts need to be updated with those in production or aqm_dev branch
4 participants