Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

develop: merge HWRF version of saSAS with GFS version #94

Merged
merged 35 commits into from
Jun 5, 2020

Conversation

climbfuji
Copy link
Collaborator

@climbfuji climbfuji commented Apr 1, 2020

Changes in this PR:

  • add a new regression test that uses the HWRF versions of saSAS deep and shallow convection
  • update how the sutils module is loaded on hera and jet
  • contains @MinsukJi-NOAA's unit testing branch
  • contains @DusanJovic-NOAA's butterfly effect branch (changes in GFDL_atmos_cubed_sphere only)

Note: The changes in the ccpp-physics PR NCAR/ccpp-physics#423 lead to different results for two existing regression tests in PROD mode: fv3_ccpp_regional_c768 and fv3_ccpp_stretched_nest. In a separate comment below, I will describe in detail my investigation that allowed me to conclude that this change is acceptable.

Associated PRs:

NCAR/ccpp-physics#423
NOAA-EMC/fv3atm#93
#94

Tested together with NOAA-EMC/GFDL_atmos_cubed_sphere#21.

For regression testing, see below.

MinsukJi-NOAA and others added 11 commits February 20, 2020 13:14
*If TEST_NAME specifies WARM_START=.T., error and exit
*Restart file names are automatically determined based on SYEAR, SMONTH, SDAY, SHOUR, FHMAX
*RESTART_INTERVAL automatically determined as one half of FHMAX
*Restart comparison files are automatically determined based on OUTPUT_GRID and OUTPUT_FILE
*MPI testing sets WRITE_GROUP=2 and WRTTASK_PER_GROUP=12
*Name of the lock directory changed from lock to lock_ut to avoid conflict with RT
…conf)

*Add more test-names to utest.bld
*Further modifications to differentiate output file names from those of RT
*RT passed
* Time taken for full unit tests is 40% of of that without using ecflow
* Numerous changes in utest for ecflow as well as better handling of baseline runs
* Modify run_test.sh to handle any failure before run_test
* Modify rt.sh to use ECFLOW_SUITE instead of regtest
* Modify rt_utils.sh to use ECFLOW_SUITE instead of regtest
* In rt_utils.sh, modify ecflow_create_run_task: make the call to run_test a foreground job so that 'set -e' error handling works properly by getting the correct exit status (another way would be to use 'job id' and 'wait job id' following the call to run_test &)
* Modify some of fv3_conf/*_run.IN files to handle errors when restart file copies fail
@climbfuji
Copy link
Collaborator Author

climbfuji commented Apr 1, 2020

Changes in the regression test results for two existing regression tests in PROD mode: fv3_ccpp_regional_c768 and fv3_ccpp_stretched_nest.

I investigated in detail why the results change for these two tests because of the changes in NCAR/ccpp-physics#423.

  • both regression tests are using non-uniform block sizes
  • for fv3_ccpp_stretched_nest:
48: WARNING from PE    48: atmos_modeldefine_blocks_packed: domain (  35  24) is not an even divisor with definition (  32) - blocks will not be uniform with a remainder of    8
  • for fv3_ccpp_regional_c768:
  0: WARNING from PE     0: atmos_modeldefine_blocks_packed: domain (  87  60) is not an even divisor with definition (  32) - blocks will not be uniform with a remainder of    4

(side note: I believe this is also why we have seen issues with these tests on some platforms in the past)

  • the REPRO versions of these tests are bit-for-bit identical with the existing baselines
  • the changes are coming from samfdeepcnv.f
    • I added debug print statements in the original GFS code and the merged HWRF-GFS code (starting from output variables that were different and tracing it backwards)
    • once I added print statements around the "offending" local variables (one or more of the following: dellal, xlamud, xlamue, frh), both versions of the code gave the same results
  • for fv3_ccpp_regional_c768: I change the default layout from 20,24 to 24,20 so that all blocks are uniform, and I did this in both the original code and the code in this PR. After making the change, the two versions gave bit-for-bit identical results, but both were different from the existing baseline (when run in PROD mode)
  • after fixing a bug in the implementation of samfshal (see commit NCAR/ccpp-physics@71eace1), not used by test fv3_ccpp_regional_c768, I attempted to do the same for fv3_ccpp_stretched_nest. I changed the blocksize from 32 to 24 for the outer domain in the original code and in this PR, and the warnings about non-uniform block sizes went away. However, the results are still different. My further investigation showed that the difference first shows up in samfdeepcnv.f in the first time step. For the first time step, I get b4b identical results between the current version in develop and the combined version in this PR if I add the following debug print statements before the block of code shown below (around line 1000 in the combined version). If I comment out the print statements (which print minimum/maximum/checksum of the array passed in), then the results for xlamue and/or xlamud differ. This is of course only in PROD mode. In REPRO mode, the current version in develop and the combined version in this PR always give the same results. I also ran this in debug mode and didn't spot anything obvious.
c
c  final entrainment and detrainment rates as the sum of turbulent part and
c    organized one depending on the environmental relative humidity
c    (Bechtold et al., 2008; Derbyshire et al., 2011)
c
      !call print_var(-999, 0, 1, '978 xlamud', xlamud)
      !call print_var(-999, 0, 1, '978 xlamue', xlamue)
      !call print_var(-999, 0, 1, '978 fent1', fent1)
      !call print_var(-999, 0, 1, '978 fent2', fent2)
      !call print_var(-999, 0, 1, '978 km1   ', km1)
      !call print_var(-999, 0, 1, '978 im    ', im)
      !call print_var(-999, 0, 1, '978 cnvflg', cnvflg)
      !call print_var(-999, 0, 1, '978 kmax  ', kmax)
      !call print_var(-999, 0, 1, '978 frh   ', frh)
      !call print_var(-999, 0, 1, '978 cxlamd', cxlamd)
      !call print_var(-999, 0, 1, '978 cxlame', cxlame)

      if (hwrf_samfdeep) then
        do k = 2, km1
        do i=1,im
          if(cnvflg(i) .and.
     &      (k > kbcon(i) .and. k < kmax(i))) then
              tem = cxlamu * frh(i,k) * fent2(i,k)
              xlamue(i,k) = xlamue(i,k)*fent1(i,k) + tem
          endif
        enddo
        enddo
      else
        do k = 2, km1
        do i=1,im
          if(cnvflg(i) .and.
     &      (k > kbcon(i) .and. k < kmax(i))) then
              tem = cxlame * frh(i,k) * fent2(i,k)
              xlamue(i,k) = xlamue(i,k)*fent1(i,k) + tem
              tem1 = cxlamd * frh(i,k)
              xlamud(i,k) = xlamud(i,k) + tem1
          endif
        enddo
        enddo
      endif
  • In another test, I turned the logical hwrf_samfdeep into a parameter set to .false. (instead of using an argument whose value is not known at compile time). This made the differences disappear, too.

This lets me conclude that the code changes are correct and that what we are seeing are the results of a different compiler optimization. Note that changing the blocksize or the layout changes the results of the regression tests anyway, hence new baselines are needed.

@climbfuji
Copy link
Collaborator Author

Note: these PRs were fully merged into the NCAR dtc/hwrf-physics branch, see #94 and associated PRs listed in there. The reason why the current PR and PRs listed here are proposed for EMC develop / NCAR master separately is the change in the regression test baseline results because of the change o of the blocksize for the two tests as described above. Once #94 et al. are merged into EMC develop / NCAR master, NCAR dtc/hwrf-physics can be formally updated from these branches (all commits already included). The remaining commits to NCAR dtc/hwrf-physics can be merged back to EMC develop / NCAR master afterwards easily, because they do not change the answer and are related to HWRF physics only.

@climbfuji
Copy link
Collaborator Author

Update 2020/06/03: I updated all PRs with the latest changes from the authoritative repositories and ran the regression tests on hera.intel using rt.conf. Compared to the authoritative EMC baseline, the following three tests fail as expected, all others pass:

fv3_ccpp_gfdlmp_hwrfsas # new test, missing baseline
fv3_ccpp_stretched_nest # change in blocksize / MPI layout
fv3_ccpp_regional_c768 # change in blocksize / MPI layout

I also tested this code against the HWRF baseline (from branch NCAR dtc/hwrf-physics), the following tests failed (as expected):

fv3_ccpp_ca # new test, missing baseline
fv3_ccpp_regional_control # do_sat_adj = .true. was added to develop, but not yet to dtc/hwrf-physics
fv3_ccpp_regional_quilt # do_sat_adj = .true. was added to develop, but not yet to dtc/hwrf-physics
fv3_ccpp_regional_restart # do_sat_adj = .true. was added to develop, but not yet to dtc/hwrf-physics
fv3_ccpp_regional_c768 # do_sat_adj = .true. was added to develop, but not yet to dtc/hwrf-physics

rt_hera_intel_against_existing_baseline.log
rt_hera_intel_against_existing_baseline_fail_test.log

rt_hera_intel_against_existing_hwrf_baseline.log
rt_hera_intel_against_existing_hwrf_baseline_fail_test.log

@climbfuji
Copy link
Collaborator Author

Manual testing of code on macOS: code compiles with clang+gfortran and gcc+gfortran, using compile.sh and compile_cmake.sh, for CCPP and IPD.

Manual testing on jet: code compiles with Intel (18) in DEBUG and PROD mode for CCPP, using both compile.sh and compile_cmake.sh.

@climbfuji
Copy link
Collaborator Author

After creating new baselines, the regression tests passed on hera.intel, wcoss_cray, wcoss_dell_p3. Logs updated in the PR.

@climbfuji
Copy link
Collaborator Author

climbfuji commented Jun 4, 2020

Regression testing on orion against existing baseline (20200512). All tests pass except fv3_ccpp_stretched_nest and fv3_ccpp_regional_c768 as expected.

rt_orion_intel_verify_against_existing_baseline.log
rt_orion_intel_verify_against_existing_baseline_fail_test.log

I then created a new baseline for date tag 20200603, all tests pass in the create step.

rt_orion_intel_create.log

Finally, I ran the regression test against the newly created baseline and all tests pass. Logs updated in the PR.

rt_orion_intel_verify.log

@DusanJovic-NOAA DusanJovic-NOAA merged commit 1150bf5 into ufs-community:develop Jun 5, 2020
climbfuji pushed a commit to climbfuji/ufs-weather-model that referenced this pull request Jul 21, 2021
…_and_joe_changes_combined_20210712

Wrapper PR for ufs-community#94 (GF aerosol updates and tunings)
LarissaReames-NOAA pushed a commit to LarissaReames-NOAA/ufs-weather-model that referenced this pull request Oct 22, 2021
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
epic-cicd-jenkins pushed a commit that referenced this pull request Apr 17, 2023
## DESCRIPTION OF CHANGES:
* Update hashes of regional_workflow and ufs-weather-model to work with PR #[416](https://github.com/NOAA-EMC/regional_workflow/pull/416) into regional_workflow.
* Fix devbuild.sh to work with new .env files (instead of old README files).

## TESTS CONDUCTED:
See PR #[416](https://github.com/NOAA-EMC/regional_workflow/pull/416) in regional_workflow.
mkavulich pushed a commit to mkavulich/ufs-weather-model that referenced this pull request Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants