Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[develop] Upgrade SRW to spack-stack 1.6.0 from 1.5.1 #1093

Merged
merged 17 commits into from
Jun 21, 2024

Conversation

RatkoVasic-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA RatkoVasic-NOAA commented Jun 10, 2024

DESCRIPTION OF CHANGES:

As ufs-weather-model was upgraded to spack-stack 1.6.0, we are upgrading SRW as well.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

TESTS CONDUCTED:

  • hera.intel
  • orion.intel
  • hercules.intel
  • derecho.intel
  • gaea.intel
  • jet.intel
  • wcoss2.intel
  • NOAA Cloud (indicate which platform) All three (AWS, Azure, and GCP)
  • Jenkins
  • fundamental test suite
  • comprehensive tests (specify which if a subset was used)

ISSUE:

Issue #1092

CHECKLIST

  • My code follows the style guidelines in the Contributor's Guide
  • I have performed a self-review of my own code using the Code Reviewer's Guide
  • I have commented my code, particularly in hard-to-understand areas
  • My changes need updates to the documentation. I have made corresponding changes to the documentation
  • My changes do not require updates to the documentation (explain).
  • My changes generate no new warnings
  • New and existing tests pass with my changes
  • Any dependent changes have been merged and published

LABELS (optional):

A Code Manager needs to add the following labels to this PR:

  • Work In Progress
  • bug
  • enhancement
  • documentation
  • release
  • high priority
  • run_ci
  • run_we2e_fundamental_tests
  • run_we2e_comprehensive_tests
  • Needs Jet test (lfs4 is down)

CONTRIBUTORS (optional):

@natalie-perlin

@RatkoVasic-NOAA
Copy link
Collaborator Author

Fundamental tests.
HERA:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE               9.87
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               8.14
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              15.01
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              36.00
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610202  COMPLETE              24.35
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061020231  COMPLETE              21.66
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             115.03

GAEA:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              18.83
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              13.41
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              29.85
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              38.00
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610224  COMPLETE              35.28
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061022410  COMPLETE              49.54
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             184.91

ORION:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              13.03
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               9.71
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              18.47
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              45.96
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610174  COMPLETE              31.41
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061017454  COMPLETE              24.67
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             143.25

HERCULES:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              13.30
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              11.07
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              32.29
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              36.33
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610152  COMPLETE              69.53
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061015212  COMPLETE              42.97
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             205.49

DERECHO:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              19.34
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              18.90
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              32.63
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              34.49
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240610142  COMPLETE              33.22
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061014204  COMPLETE              48.32
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             186.90

@MichaelLueken MichaelLueken linked an issue Jun 11, 2024 that may be closed by this pull request
@MichaelLueken MichaelLueken added the enhancement New feature or request label Jun 11, 2024
@chan-hoo
Copy link
Collaborator

AQM test on Hera:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240611123548                   COMPLETE            5299.42
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5299.42

Approving.

Copy link
Collaborator

@MichaelLueken MichaelLueken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RatkoVasic-NOAA -

These changes look good to me!

The fundamental tests were run on Hera GNU (given the update from GNU 9.2.0 to 13.3.0) and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              12.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               9.43
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              20.35
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              47.26
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240611133  COMPLETE              32.30
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061113312  COMPLETE              37.38
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             159.65

The AQM WE2E test was also run and passed successfully:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
aqm_grid_AQM_NA13km_suite_GFS_v16_20240611144245                   COMPLETE            5352.87
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5352.87

Approving now.

@MichaelLueken
Copy link
Collaborator

@RatkoVasic-NOAA -

Given that Jet /lfs4 is still down, Derecho being down for maintenance today, and Orion undergoing the OS migration tomorrow and Thursday, I will hold off on automated testing until spack-stack 1.6.0 is ready on Orion Rocky 9. By then, Jet should hopefully be back, as well as Derecho.

@RatkoVasic-NOAA
Copy link
Collaborator Author

@RatkoVasic-NOAA -

Given that Jet /lfs4 is still down, Derecho being down for maintenance today, and Orion undergoing the OS migration tomorrow and Thursday, I will hold off on automated testing until spack-stack 1.6.0 is ready on Orion Rocky 9. By then, Jet should hopefully be back, as well as Derecho.

OK, their guess is that they will upgrade Orion by 6/12-13, which means we can start building libraries 6/14. Since there are plenty of versions and different environments, I will start with spack-stack 1.6.0

@MichaelLueken
Copy link
Collaborator

With the return of Jet /lfs4 yesterday afternoon, the fundamental tests were run and all tests successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE               9.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               7.44
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              13.92
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              37.29
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240613144  COMPLETE              28.96
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061314444  COMPLETE              19.29
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             116.12

@EdwardSnyder-NOAA
Copy link
Collaborator

Successfully built and ran the fundamental test suite on AWS, Azure, and GCP using spack-stack v1.6.0. Had some issues running the tests on Azure. Jobs would just sit in the queue for hours without running or they would fail with this mpi error message: OFI get address vector map failed. However, I believe it to be an issue with PW configuration because if you shutdown the instance and restart it, jobs would start submitting and passing without that mpi error message.

GCP:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              12.93
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE               6.50
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              12.90
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              32.34
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240614192  COMPLETE              19.71
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061419272  COMPLETE              19.34
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             103.72

Detailed summary written to /contrib/Edward.Snyder/ss160/expt_dirs/WE2E_summary_20240614195958.txt

AWS:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE             137.10
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              19.22
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              70.17
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE             283.56
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240614193  COMPLETE              80.22
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061419305  COMPLETE             152.68
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             742.95

Detailed summary written to /contrib/Edward.Snyder/ss160/expt_dirs/WE2E_summary_20240614212234.txt

Azure:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2  COMPLETE              41.10
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20240  COMPLETE              10.31
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot  COMPLETE              21.96
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024061  COMPLETE              82.61
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20240617152  COMPLETE              38.43
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024061715211  COMPLETE              42.68
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE             237.09

Detailed summary written to /contrib/Edward.Snyder/ss160/expt_dirs/WE2E_summary_20240618190711.txt

@RatkoVasic-NOAA
Copy link
Collaborator Author

@MichaelLueken with last addition of Orion modulefiles (although cannot be tested for some time), I think this PR is ready for final testing.

@MichaelLueken
Copy link
Collaborator

Thanks, @RatkoVasic-NOAA! I'll launch Jenkins tests for this PR now.

@MichaelLueken MichaelLueken added the run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests label Jun 20, 2024
@MichaelLueken MichaelLueken changed the title Upgrade SRW to spack-stack 1.6.0 from 1.5.1 [develop] Upgrade SRW to spack-stack 1.6.0 from 1.5.1 Jun 21, 2024
@MichaelLueken
Copy link
Collaborator

There are issues with Jenkins on Orion following the OS migration and software stack update. Jenkins is attempting to use /apps/git-2.28.0/bin/git to clone repositories. However, /apps/git-2.28.0/bin/git doesn't exist, leading to failure (it should be pointing to /usr/bin/git).

Manual runs of the Orion coverage tests have successfully passed:

----------------------------------------------------------------------------------------------------
Experiment name                                                  | Status    | Core hours used 
----------------------------------------------------------------------------------------------------
custom_ESGgrid_SF_1p1km_20240621082732                             COMPLETE             444.16
deactivate_tasks_20240621082733                                    COMPLETE               1.18
get_from_AWS_ics_GEFS_lbcs_GEFS_fmt_grib2_2022040400_ensemble_2me  COMPLETE            1984.40
grid_CONUS_3km_GFDLgrid_ics_FV3GFS_lbcs_FV3GFS_suite_RRFS_v1beta_  COMPLETE            1029.93
grid_RRFS_AK_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_plot_20240  COMPLETE             388.84
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_RRFS_v1beta_202406210  COMPLETE              22.77
grid_RRFS_CONUS_3km_ics_FV3GFS_lbcs_FV3GFS_suite_HRRR_20240621082  COMPLETE             993.22
grid_RRFS_CONUScompact_13km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_  COMPLETE              63.58
grid_RRFS_CONUScompact_3km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v16_2  COMPLETE             763.58
grid_SUBCONUS_Ind_3km_ics_FV3GFS_lbcs_FV3GFS_suite_WoFS_v0_202406  COMPLETE              65.77
2020_CAD_20240621082749                                            COMPLETE              72.16
----------------------------------------------------------------------------------------------------
Total                                                              COMPLETE            5829.59

and the SRW Metrics test also successfully passed:

Skill Score: 0.99807
+ [[ 0.99807 < 0.700 ]]
Congrats! You pass check!

Will now move forward with merging this PR.

@MichaelLueken MichaelLueken merged commit 94dc192 into ufs-community:develop Jun 21, 2024
3 of 5 checks passed
@RatkoVasic-NOAA RatkoVasic-NOAA deleted the ss160 branch June 21, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request run_we2e_coverage_tests Run the coverage set of SRW end-to-end tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade SRW from spack-stack 1.5.1 to 1.6.0
4 participants