Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WCOSS2/Acorn update DT_ATMOS to 120 for run completion using "faster" compile, also re-enable 3 hafs tests for WCOSS2. #1907

Merged
merged 16 commits into from
Oct 27, 2023

Conversation

BrianCurtis-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA BrianCurtis-NOAA commented Sep 15, 2023

PR Author Checklist:

  • I have linked PR's from all sub-components involved in section below.
  • I am confirming reviews are completed in ALL sub-component PR's.
  • I have run the full RT suite on either Hera/Cheyenne AND have attached the log to this PR below this line:
    • LOG: N/A
  • I have added the list of all failed regression tests to "Anticipated changes" section.
  • I have filled out all sections of the template.

Description

The test "regional_atmaq_faster" would not run to completion on WCOSS2/Acorn with an instability from the saturation vapor pressure late in the model run time, but by changing the DT_ATMOS to 120, it is now able to run to completion and compare with its baselines successfully.

This PR also enabled three tests on WCOSS2 for hafs that were disabled previously.

Linked Issues and Pull Requests

Associated UFSWM Issue to close

Closes #1742
Closes #1896

Subcomponent Pull Requests

N/A

Blocking Dependencies

None

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Anticipated Changes

Input data

  • No changes are expected to input data.
  • Changes are expected to input data:
    • New input data.
    • Updated input data.

Regression Tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:
Tests effected by changes in this PR: regional_atmaq_faster will be added on both Acorn and WCOSS2 bring back baselines for: - hafs_regional_docn - hafs_regional_docs_oisst - hafs_regional_datm_cdeps

Libraries

  • Not Needed
  • Needed
    • Create separate issue in JCSDA/spack-stack asking for update to library. Include library name, library version.
    • Add issue link from JCSDA/spack-stack following this item
Code Managers Log
  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.
    • N/A

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Jet
    • Gaea
    • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Sep 15, 2023

@BrianCurtis-NOAA I a little confused about how it was failing previously. Did it not run to completion because it is unstable at the longer timestep (the model blew up)? Or, did it run to completion but failed baseline comparison?

@BrianCurtis-NOAA
Copy link
Collaborator Author

BrianCurtis-NOAA commented Sep 15, 2023

@BrianCurtis-NOAA I a little confused about how it was failing previously. Did it not run to completion because it is unstable at the longer timestep (the model blew up)? Or, did it run to completion but failed baseline comparison?

It would fail at some point later in the model run time with a saturation vapor pressure issue.

Instability

@DeniseWorthen
Copy link
Collaborator

@BrianCurtis-NOAA In the issue description you wrote "it is not able to run to completion and compare with its baselines". It never gets to comparing with a baseline if it fails during the run, right?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@BrianCurtis-NOAA In the issue description you wrote "it is not able to run to completion and compare with its baselines". It never gets to comparing with a baseline if it fails during the run, right?

Typo, should say "it is now able to run to completion and compare with its baselines"

@BrianCurtis-NOAA BrianCurtis-NOAA changed the title WCOSS2/Acorn update DT_ATMOS to 120 for run completion using "faster" compile WCOSS2/Acorn update DT_ATMOS to 120 for run completion using "faster" compile, also re-enable 3 hafs tests for WCOSS2. Sep 18, 2023
@BrianCurtis-NOAA BrianCurtis-NOAA added the Baseline Updates Current baselines will be updated. label Sep 18, 2023
@zach1221
Copy link
Collaborator

Hey @BrianCurtis-NOAA . I think we're ready to start here, if you can resolve conflicts.

@zach1221 zach1221 added Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. jenkins-ci Jenkins CI: ORT build/test on docker container labels Oct 26, 2023
@BrianCurtis-NOAA
Copy link
Collaborator Author

@zach1221 OK done. We don't need a new bl_date because i'm just adding "new" tests.

@BrianCurtis-NOAA
Copy link
Collaborator Author

Baselines copied over, onto comparisons for WCOSS/Acorn.

@BrianCurtis-NOAA
Copy link
Collaborator Author

Acorn slow, still 138 tasks. I'll check it later.

@FernandoAndrade-NOAA
Copy link
Collaborator

On Hera, regional_atmaq_faster_intel failed most comparisons:

/scratch1/NCEPDEV/stmp2/Fernando.Andrade-maldonado/FV3_RT/rt_6253/regional_atmaq_faster_intel
Checking test 183 regional_atmaq_faster_intel results ....
 Comparing sfcf000.nc ............ALT CHECK......NOT OK
 Comparing sfcf003.nc ............ALT CHECK......NOT OK
 Comparing sfcf006.nc ............ALT CHECK......NOT OK
 Comparing atmf000.nc ............ALT CHECK......NOT OK
 Comparing atmf003.nc ............ALT CHECK......NOT OK
 Comparing atmf006.nc ............ALT CHECK......NOT OK
 Comparing RESTART/20190801.180000.coupler.res .........OK
 Comparing RESTART/20190801.180000.fv_core.res.nc .........OK
 Comparing RESTART/20190801.180000.fv_core.res.tile1.nc ............ALT CHECK......NOT OK
 Comparing RESTART/20190801.180000.fv_srf_wnd.res.tile1.nc ............ALT CHECK......NOT OK
 Comparing RESTART/20190801.180000.fv_tracer.res.tile1.nc ............ALT CHECK......NOT OK
 Comparing RESTART/20190801.180000.phy_data.nc ............ALT CHECK......NOT OK
 Comparing RESTART/20190801.180000.sfc_data.nc ............ALT CHECK......NOT OK

@jkbk2004
Copy link
Collaborator

I am ok to update current baseline of regional_atmaq_faster_intel with new one across RDHPCS. @BrianCurtis-NOAA what do you think?

@epic-cicd-jenkins
Copy link
Collaborator

Jenkins-ci ORTs failed

@BrianCurtis-NOAA
Copy link
Collaborator Author

Yes, the test will have new baselines across the board.

@epic-cicd-jenkins
Copy link
Collaborator

Jenkins-ci ORTs failed

@zach1221
Copy link
Collaborator

Jenkins-ci ORTs failed

Platform team is running tests on the jenkins-ci issue, so there may be some failure notifications.

@epic-cicd-jenkins
Copy link
Collaborator

Jenkins-ci ORTs failed

@jkbk2004
Copy link
Collaborator

All tests are done. We can merge this pr. @FernandoAndrade-NOAA thanks for pushing hera log.

@jkbk2004 jkbk2004 merged commit 020e783 into ufs-community:develop Oct 27, 2023
@epic-cicd-jenkins
Copy link
Collaborator

Jenkins-ci ORTs passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Baseline Updates Current baselines will be updated. jenkins-ci Jenkins CI: ORT build/test on docker container Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
7 participants