Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update module_write_netcdf to avoid hangs in RRFS runs #2193

Merged

Conversation

DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA DusanJovic-NOAA commented Mar 15, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

Commit Message:

* UFSWM - Update module_write_netcdf to avoid hangs in RRFS runs
  * FV3 - Update module_write_netcdf to avoid hangs in RRFS runs

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • No Baseline Changes.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@SamuelTrahanNOAA
Copy link
Collaborator

I've asked the people who run RRFS parallels to try this fix so we can confirm it works for a large number of cases.

@MatthewPyle-NOAA
Copy link
Collaborator

@SamuelTrahanNOAA Is this being tested in any RRFS parallels yet? Thanks!

@SamuelTrahanNOAA
Copy link
Collaborator

Ming Hu is testing it in parallels. One cycle failed so far, but it looks like an unrelated failure. (Yet another problem to debug.) That cycle was scrubbed, so we can't reproduce the new failure yet.

I'm unable to make a regression test case for this configuration. I can almost do it with the conus13km cases, but the lack of one input field for the GF scheme prevents me from having all of the fields.

@hu5970
Copy link

hu5970 commented Mar 27, 2024

We have a few days parallel test with this PR and it does fixed the hang problem. Please move forward with this PR.
Could you also make a PR to production/RRFS.v1 branch?

@DusanJovic-NOAA DusanJovic-NOAA marked this pull request as ready for review March 28, 2024 14:56
@DusanJovic-NOAA
Copy link
Collaborator Author

Regression test passed on Hera RegressionTests_hera.log

@DusanJovic-NOAA DusanJovic-NOAA added No Baseline Change No Baseline Change Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked. labels Mar 28, 2024
@MatthewPyle-NOAA
Copy link
Collaborator

@DusanJovic-NOAA Will echo what @hu5970 said that we'll need this in the production/RRFS.v1 release branch as well. Is needed more critically there as that is what we're using for the RRFS.

@DusanJovic-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA Will echo what @hu5970 said that we'll need this in the production/RRFS.v1 release branch as well. Is needed more critically there as that is what we're using for the RRFS.

@MatthewPyle-NOAA @hu5970 I'll add the changes to the production/RRFS.v1 release branch and open a PR

@MatthewPyle-NOAA
Copy link
Collaborator

Thanks @DusanJovic-NOAA!

@DusanJovic-NOAA
Copy link
Collaborator Author

Thanks @DusanJovic-NOAA!

@MatthewPyle-NOAA @hu5970 Please check the RRFS.v1 branch PRs and run few RRFS parallel runs to confirm that everything works as expected:

#2212

NOAA-EMC/fv3atm#810

@zach1221
Copy link
Collaborator

Hi, @DusanJovic-NOAA . Can you sync up your branch, please? As we're planning to test against this PR next.

@DusanJovic-NOAA
Copy link
Collaborator Author

Hi, @DusanJovic-NOAA . Can you sync up your branch, please? As we're planning to test against this PR next.

Done.

@BrianCurtis-NOAA
Copy link
Collaborator

Acorn issues have been resolved, hopefully. I am running current develop branch against baselines and will start testing with Acorn on the next PR. It can be skipped for this PR.

@zach1221
Copy link
Collaborator

zach1221 commented Apr 1, 2024

@DusanJovic-NOAA fv3atm pr is merged. Please updated hash/gitmodule url.

@zach1221 zach1221 merged commit 1411b90 into ufs-community:develop Apr 1, 2024
@DusanJovic-NOAA DusanJovic-NOAA deleted the rrfs_write_netcdf_hangs branch April 2, 2024 16:23
zhanglikate added a commit to zhanglikate/ufs-weather-model that referenced this pull request May 3, 2024
commit f234a3e
Author: Ufuk Turunçoğlu <[email protected]>
Date:   Tue Apr 30 11:35:25 2024 -0600

    Fix for land component model (ufs-community#2191)

    * UFSWM - fix fully coupled land component configuration
      * NOAHMP - get fixed information from surface file

commit 04bbc15
Author: jiandewang <[email protected]>
Date:   Thu Apr 25 14:52:00 2024 -0400

    update MOM6 to its main repo. 20240401 commit (ufs-community#2241)

    * UFSWM -
      * MOM6 - update MOM6 to its main repo. 20240401 commit (NCAR-candidate-20240319)

commit b6c576d
Author: Daniel Sarmiento <[email protected]>
Date:   Tue Apr 23 12:24:22 2024 -0400

    Merged global namelist (ufs-community#2173)

    * UFSWM - global_control.nml_IN has been added as the new regression test namelist template for all global regression tests. The namelist now uses pointers (i.e. @[abc]) for variables and default values have been added to the default_vars.sh script. A new section in default_vars.sh has been added (export_tiled) to account for tiled RTs that pulls the correct parameter files using the ATMRES variable.
    Regression tests have been modified to account for these changes. Tests that were not compatible with the GFSv17_p8 core have been disabled for now. They will be turned on as they are updated from GFSv16 to GFSv17.

commit 5d2ca19
Author: WenMeng-NOAA <[email protected]>
Date:   Fri Apr 19 13:59:12 2024 -0400

    Update upp submodule (ufs-community#2213)

    * UFSWM - Update inline post
      * FV3 - Update upp submodule for inline post

commit 47c0099
Author: Brian Curtis <[email protected]>
Date:   Wed Apr 17 15:59:48 2024 -0400

    Add bash linting to CI. Cleanup .sh scripts a bit. Address .sh bugs. Adds -v Verbose option. (ufs-community#2218)  Remove nowarn Intel compiler flag (ufs-community#2225)

    * UFSWM
    - Add bash linting to CI:
      - uses superlinter to check for consistent bash code writing
    - Cleans up .sh scripts to comply with superlinter
    - Cleans up .sh scripts to be more consistent, easier to read.
    - Add's -v verbose option if debugging outputs needed, otherwise simplifies rt.sh run echo's.
    - Addresses smaller bugs
      - quota/timeout search logic adjusted.
      - check for dirs existing (DISKNM, STMP, PTMP) before starting.
      - adjustments/cleanup to ecflow/rocoto sections
      - rt.sh will attempt to start ecflow, and only stop ecflow if it started from rt.sh.
      - fix for issue where run_dir will not delete properly.
    * FV3: Address compiler warnings
      * atmos_cubed_sphere: Address compiler warnings.

commit 4f32a4b
Author: Rick Grubin <[email protected]>
Date:   Mon Apr 15 07:21:08 2024 -0600

    Document ATMW / ATMAERO / HAFS WM configurations (ufs-community#2160)

    * UFSWM
      * doc/Userguide
        * source
          * conf.py
          * Configurations.rst
          * FAQ.rst
          * InputsOutputs.rst
          * Introduction.rst

commit ac4445d
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date:   Mon Apr 15 08:59:42 2024 -0400

    Bump idna from 3.6 to 3.7 in /doc/UsersGuide (ufs-community#2234)

    *doc/UserGuide
       *requirements.txt - updates inda version from 3.6 to 3.7

commit 281b32f
Author: Samuel Trahan (NOAA contractor) <[email protected]>
Date:   Mon Apr 15 08:38:01 2024 -0400

    bug fixes: kchunk3d ignored, hailwat uninitialized in dycore, tile_num wrong for nests (ufs-community#2201)

    * UFSWM - None.
      * FV3 - Write component will use kchunk3d. Model init sends the right tile number to CCPP.
        * atmos_cubed_sphere - Initialize the hailwat variable. Pass global_tile index to model.

commit 8a5f711
Author: Denise Worthen <[email protected]>
Date:   Thu Apr 11 13:32:26 2024 -0400

    Add PIO namelist control for CICE (ufs-community#2145)

    Update to CICE-Consortium/CICE aca8357. Adds implementation of namelist PIO options for CICE

commit 45c8b2a
Author: JONG KIM <[email protected]>
Date:   Thu Apr 4 19:49:13 2024 -0400

    Hotfix/cubed sphere hash fix: HAILCAST diagnostic code (units issue) (ufs-community#2223)

    cubed_sphere hash update: f060e85 for a bug- fix in the HAILCAST diagnostic code (units issue)

commit 26e6db6
Author: Denise Worthen <[email protected]>
Date:   Wed Apr 3 19:57:08 2024 -0400

    Enable cpl_scalars export from ATM and NoahMP for use by CMEPS (ufs-community#2175)

      * CMEPS - allow additional dimension in cpl_scalars for CSG and regional ATM domains for use in mediator history files
      * CMEPS - fix mapping mask for lnd->atm
      * FV3 - add export of cpl_scalars
      * NOAHMP - add export of cpl_scalars

commit 1411b90
Author: Dusan Jovic <[email protected]>
Date:   Mon Apr 1 18:04:44 2024 -0400

    Update module_write_netcdf to avoid hangs in RRFS runs (ufs-community#2193)

    * UFSWM - Update module_write_netcdf to avoid hangs in RRFS runs
      * FV3 - Update module_write_netcdf to avoid hangs in RRFS runs

commit 87c27b9
Author: Matthew Masarik <[email protected]>
Date:   Fri Mar 29 15:23:42 2024 -0400

    WW3 feature:  Langmuir turbulence parameterization (ufs-community#2195)

      * WW3 - Langmuir turbulence parameterization

commit c54e986
Author: Samuel Trahan (NOAA contractor) <[email protected]>
Date:   Wed Mar 27 16:11:03 2024 -0400

    regression test system bug fixes, eliminate MOM6 warnings (ufs-community#2197), add xr_cnvcld flag to FV3 (ufs-community#2185) (ufs-community#2202)

    * UFSWM - atparse.bash: correctly handle input that doesn't end with an end-of-line character. Fix some bugs in Rocoto support and clean up rt.sh.
      * FV3 - namelist flag xr_cnvcld to control if suspended grid-mean convective cloud condensate should be included in cloud fraction and optical depth calculation in radiation in the GFS suite
        * ccpp - physics-level changes to implement new namelist variable
      * MOM6 - update MOM6 code to eliminate all compiler warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Baseline Change No Baseline Change Ready for Commit Queue The PR is ready for the Commit Queue. All checkboxes in PR template have been checked.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write component hangs in nf90_enddef with planned operational RRFS
7 participants