Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Gaea C5 (includes PR #1977: new ccpp SDFs added to support RRFS multiphysics ensemble and add tob in ocean output)(Includes PR #1997) #1784

Merged
merged 52 commits into from
Dec 4, 2023

Conversation

ulmononian
Copy link
Collaborator

@ulmononian ulmononian commented Jun 5, 2023

Description

Adds the capability to run the WM on Gaea c5.

A spack-stack-based module file is added, the cluster name is changed for compile jobs, and some logic in the RT scripts is added/modified.

Update: no longer removes c4 support. Just adds C5 as a new machine. some shared logic has been exploited in the spirit of minimal line additions.

Includes PR #1997 (new ccpp SDFs added to support RRFS multiphysics ensemble and add tob in ocean output (#2019).

Linked issue: #1755

Hera log :
RegressionTests_hera.log

Gaea-c5 log:
RegressionTests_gaea-c5.log

Input data additions/changes

  • No changes are expected to input data.
  • Changes are expected to input data:
    • New input data.
    • Updated input data.

Anticipated changes to regression tests:

  • No changes are expected to any regression test.
  • Changes are expected to the following tests:

New baseline required for "new" machine.

Subcomponents involved:

  • AQM
  • CDEPS
  • CICE
  • CMEPS
  • CMakeModules
  • FV3
  • GOCART
  • HYCOM
  • MOM6
  • NOAHMP
  • WW3
  • stochastic_physics
  • none

Combined with PR's (If Applicable):

ulmononian#13

Commit Queue Checklist:

  • Link PR's from all sub-components involved in section below
  • Confirm reviews completed in ALL sub-component PR's
  • Add all appropriate labels to this PR.
  • Run full RT suite on either Hera/Cheyenne AND attach log to a PR comment.
  • Add list of any failed regression tests to "Anticipated changes to regression tests" section.

Linked PR's and Issues:

Closes #1783
1997 fv3atm sub-pr #721

Testing Day Checklist:

  • This PR is up-to-date with the top of all sub-component repositories except for those sub-components which are the subject of this PR.
  • Move new/updated input data on RDHPCS Hera and propagate input data changes to all supported systems.

Testing Log (for CM's):

  • RDHPCS
    • Hera
    • Orion
    • Jet
    • Gaea
    • Cheyenne
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
    • Completed
  • opnReqTest
    • N/A
    • Log attached to comment

Contributors:

@natalie-perlin

@ulmononian
Copy link
Collaborator Author

ulmononian commented Jul 18, 2023

cpld_control_p8 passes w/ intel compiler version upgrade to intel 2023.1.0 (/lustre/f2/scratch/Cameron.Book/FV3_RT/rt_134365).

@ulmononian ulmononian reopened this Aug 23, 2023
@zach1221
Copy link
Collaborator

zach1221 commented Aug 23, 2023

@ulmononian thanks for updating. I'll try running through the rt.conf so we can document any issues.

@ulmononian
Copy link
Collaborator Author

@ulmononian thanks for updating. I'll running through the rt.conf so we can document any issues.

i wouldn't bother running the full set of tests yet. i set up the wm to use rocoto on c5, but the partition specification (c5) that used to work for the run job_card is now resulting in an error. i am not sure what the partition needs set to now.

@zach1221
Copy link
Collaborator

@ulmononian thanks for updating. I'll running through the rt.conf so we can document any issues.

i wouldn't bother running the full set of tests yet. i set up the wm to use rocoto on c5, but the partition specification (c5) that used to work for the run job_card is now resulting in an error. i am not sure what the partition needs set to now.

I used ecflow on gaea c5 and only three tests failed.
RegressionTests_gaea_c5.txt

@ulmononian
Copy link
Collaborator Author

ulmononian commented Aug 24, 2023

@zach1221 regarding the three failed tests:

regional_atmaq_debug is currently set to not run on gaea. do we also want to turn this off for gaea c5?

regional_atmaq and regional_atmaq_faster have clauses that set TPN in their respective test configuration files (i.e.:

elif [[ $MACHINE_ID = cheyenne || $MACHINE_ID = gaea ]]; then
and
elif [[ $MACHINE_ID = gaea ]]; then
), respectively.

we can modify that logic so that the same TPN setting is done for gaea c5 and perhaps that will work.

@zach1221
Copy link
Collaborator

@zach1221 regarding the three failed tests:

regional_atmaq_debug is currently set to not run on gaea. do we also want to turn this off for gaea c5?

regional_atmaq and regional_atmaq_faster have clauses that set TPN in their respective test configuration files (i.e.:

elif [[ $MACHINE_ID = cheyenne || $MACHINE_ID = gaea ]]; then

and

elif [[ $MACHINE_ID = gaea ]]; then

), respectively.
we can modify that logic so that the same TPN setting is done for gaea c5 and perhaps that will work.

Thanks @ulmononian . I hadn't looked into it yet, but maybe we can add the same config with Gaea C5, in regards to regional_atmaq and faster as you suggest, but also add the same thing to regional_atmaq_debug to see if it will pass as well on Gaea C5. I try it and see how it works.

@ulmononian
Copy link
Collaborator Author

@zach1221 im running those now and will let you know.

@ulmononian
Copy link
Collaborator Author

@zach1221 these three tests now pass with the TPN changes. i will push them shortly.

@ulmononian
Copy link
Collaborator Author

@zach1221 i pushed changes to fix the three tests that were failing. here's the path on c5 if you want to check: /lustre/f2/dev/Cameron.Book/c5_1707/tests. like we discussed, i did not use -c for these, so some of the result checks fail, but the model compiles and runs to completion.

@zach1221
Copy link
Collaborator

@zach1221 i pushed changes to fix the three tests that were failing. here's the path on c5 if you want to check: /lustre/f2/dev/Cameron.Book/c5_1707/tests. like we discussed, i did not use -c for these, so some of the result checks fail, but the model compiles and runs to completion.

Thanks @ulmononian ! Yes this is fine.

@ulmononian
Copy link
Collaborator Author

@ulmononian can you put the line to load nccmp in ufs_gaea-c5.intel.lua ?

done.

@zach1221
Copy link
Collaborator

zach1221 commented Dec 4, 2023

Ok, testing is done and we can begin the merging process. @ulmononian can you review the open conversations in this PR, and resolve them, if possible?

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2023

@zach1221 hold a bit. I need to push c5 log.

@zach1221
Copy link
Collaborator

zach1221 commented Dec 4, 2023

@zach1221 hold a bit. I need to push c5 log.

Sure no problem!

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2023

@ulmononian all sets now. Please, resolve all conversations.

@zach1221
Copy link
Collaborator

zach1221 commented Dec 4, 2023

@jkbk2004 fv3atm hash: ba6e8ea442b2d0d5992a8550db6d0c720ff338d2

@ulmononian
Copy link
Collaborator Author

@jkbk2004 @zach1221 just checked but all convos look resolved.

@DeniseWorthen
Copy link
Collaborator

@natalie-perlin Needs to resolve her conversation

@jkbk2004
Copy link
Collaborator

jkbk2004 commented Dec 4, 2023

@natalie-perlin this pr is ready to be merged. please, approve your request.

Copy link
Collaborator

@natalie-perlin natalie-perlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for finally pulling this through!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Gaea C5 support