Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test/fix CI + update docs + use ssh for cloning #1

Merged
merged 15 commits into from
Aug 29, 2018
Merged

Conversation

dmey
Copy link

@dmey dmey commented Aug 29, 2018

No description provided.

@dmey dmey merged commit f59a33b into wrf-cmake Aug 29, 2018
@dmey dmey deleted the wrf-cmake-ci branch August 29, 2018 14:24
letmaik pushed a commit that referenced this pull request Jan 4, 2019
TYPE: bug fix

KEYWORDS: moist, analysis update

SOURCE: Internal (JJG)

DESCRIPTION OF CHANGES: 
xatowrf now requires cloud_cv_options that match packaging defined in registry.var
for xa%qrn, xa%qcw, xa%qci, xa%qsn, and xa%qgr.

This bugfix is connected to PR wrf-model#283 from  11 AUG 2017 (wrf-model@c7405bb#diff-fe8b020143d32583d82b945e2bd66f50)

The array bounds error this avoids occurs when mp_physics ~= [0, 98] and cloud_cv_options==0.

LIST OF MODIFIED FILES: 
M       var/da/da_transfer_model/da_transfer_xatowrf.inc

TESTS CONDUCTED: 
The following error at the end of rsl.error.0000 is avoided with this fix when "debug" build is used for WRFDA:

>``forrtl: severe (408): fort: (2): Subscript #1 of the array QCW has value 2 which is greater than the upper bound of 1``
>
>``Image              PC                Routine            Line        Source
da_wrfvar.exe      00000000060CF996  Unknown               Unknown  Unknown
da_wrfvar.exe      00000000017EDB3B  da_transfer_model        2224  da_transfer_model.f
da_wrfvar.exe      00000000018A4263  da_transfer_model        3399  da_transfer_model.f
da_wrfvar.exe      00000000004C5D92  da_wrfvar_top_mp_        3675  da_wrfvar_top.f
da_wrfvar.exe      00000000004B0699  da_wrfvar_top_mp_        2779  da_wrfvar_top.f
da_wrfvar.exe      00000000004B0559  da_wrfvar_top_mp_        2749  da_wrfvar_top.f
da_wrfvar.exe      0000000000459863  MAIN__                     34  da_wrfvar_main.f
da_wrfvar.exe      0000000000405C1E  Unknown               Unknown  Unknown
libc-2.19.so       00002AAAAB7E5B25  __libc_start_main     Unknown  Unknown
da_wrfvar.exe      0000000000405B29  Unknown               Unknown  Unknown``


The WRFDA Regression test was not run.  The changes are minor and fix the known bug.
letmaik pushed a commit that referenced this pull request Jan 4, 2019
TYPE: bug fix

KEYWORDS: obs nudging, max number of tasks

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
The max number of processors, 1024, is hard coded in module_dm.F for observation nudging.
If a user requests more MPI tasks than this max number, this leads to segmentation fault.

Solution:
In the routine where the dimension of the variables is defined as the maximum number of MPI
tasks, those two variables are now declared as ALLOCATABLE, and then they are allocated based on
the total number of MPI ranks.

LIST OF MODIFIED FILES:
M external/RSL_LITE/module_dm.F

TESTS CONDUCTED:

Applied new code to a user's case, which shows the code works as expected.
No bit-wise diffs with smaller test case, before vs after mods: I built the code with ./configure -d option, and run a small test case with 1 processor and 36 processors, respectively. OBS nudging is turned on. Both runs cover a 3-hour period. Results are identical.
Test case with > 1024 MPI tasks: A large case (derived from a user's case) is also tested. In this case, the code is built with ./configure -D option. Without the change, the case crashed immediately. The error message is:
OBS NUDGING is requested on a total of  2 domain(s).
++++++CALL ERROB AT KTAU =     0 AND INEST =  1:  NSTA =     0 ++++++
At line 5741 of file module_dm.f90
Fortran runtime error: Index '1025' of dimension 1 of array 'idisplacement' above upper bound of 1024
Error termination. Backtrace:
#0  0x782093 in __module_dm_MOD_get_full_obs_vector
	at /glade/scratch/chenming/WRFHELP/WRFV3.9.1.1_intel_dmpar_large-file/frame/module_dm.f90:5741
#1  0xffffffffffffffff in ???
With the code change, the case can run successfully for 6 hours.

RELEASE NOTE: After removing a hard-coded limit for an assumed maximum number of MPI tasks, the observation nudging code for WRF now supports more than 1024 MPI tasks. If users previously ran the obs nudging code with 1024 or fewer MPI tasks, the original code is OK. However, if users tried to run obs nudging with > 1024 MPI tasks, likely the code died from a segmentation fault, while trying to access an address for an array index that was not available.
letmaik pushed a commit that referenced this pull request Mar 8, 2020
… data (wrf-model#875)

TYPE: bug fix

KEYWORDS: LBC, valid time

SOURCE: identified by Michael Duda (NCAR/MMM), fixed internally

DESCRIPTION OF CHANGES:
Problem:
1. If a user tried to start a simulation _after_ the last LBC valid period, the
WRF model would get into a nearly infinite loop and print out repeated statements:
```
 THIS TIME 2000-01-24_18:00:00, NEXT TIME 2000-01-25_00:00:00
d01 2000-01-25_06:00:00  Input data is acceptable to use: wrfbdy_d01
           2  input_wrf: wrf_get_next_time current_date: 2000-01-24_18:00:00 Status =           -4
d01 2000-01-25_06:00:00  ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01
```
2. If a user tries to extend the model simulation beyond that valid times of the LBC, the code
behavior is not controlled (nearly infinite loops on some machines, or runtime errors with a backtrace
on other machines).

Solution:
In another routine, the lateral boundary condition is read to get to the
correct time. Once inside of share/input_wrf.F, we should be at the
correct time. There is no need to try to get to the next time. In this
particular case, the effort to get to the next time fails, but we try
again (and again and again). This solution fixes both problems identified
above.

ISSUE:
Fixes wrf-model#769 "WRF doesn't halt when beginning LBC time is not in wrfbdy_d01 file"

LIST OF MODIFIED FILES:
M share/input_wrf.F

TESTS CONDUCTED:
1. Without fix, start the model after the last valid time of the LBC file => lots of repeated messages
```
 THIS TIME 2000-01-24_18:00:00, NEXT TIME 2000-01-25_00:00:00
d01 2000-01-25_06:00:00  Input data is acceptable to use: wrfbdy_d01
           2  input_wrf: wrf_get_next_time current_date: 2000-01-24_18:00:00 Status =           -4
d01 2000-01-25_06:00:00  ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01
```
2. With this fix, when LBC stops at 2000 01 25 00, and WRF starts at 2000 01 25 06
```
d01 2000-01-25_06:00:00  Input data is acceptable to use: wrfbdy_d01
 THIS TIME 2000-01-24_12:00:00, NEXT TIME 2000-01-24_18:00:00
d01 2000-01-25_06:00:00  Input data is acceptable to use: wrfbdy_d01
 THIS TIME 2000-01-24_18:00:00, NEXT TIME 2000-01-25_00:00:00
d01 2000-01-25_06:00:00  Input data is acceptable to use: wrfbdy_d01
           2  input_wrf: wrf_get_next_time current_date: 2000-01-24_18:00:00 Status =           -4
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:    1134
 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01
-------------------------------------------
```
3. Without this fix, if we try to extend the module simulation beyond the valid lateral boundary times
```
Timing for main: time 2000-01-24_23:54:00 on domain   1:    0.53782 elapsed seconds
Timing for main: time 2000-01-24_23:57:00 on domain   1:    0.51111 elapsed seconds
Timing for main: time 2000-01-25_00:00:00 on domain   1:    0.54507 elapsed seconds
Timing for Writing wrfout_d01_2000-01-25_00:00:00 for domain        1:    0.03793 elapsed seconds
d01 2000-01-25_00:00:00  Input data is acceptable to use: wrfbdy_d01
           2  input_wrf: wrf_get_next_time current_date: 2000-01-25_00:00:00 Status =           -4
d01 2000-01-25_00:00:00  ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01
At line 777 of file module_date_time.f90
Fortran runtime error: Bad value during integer read

Error termination. Backtrace:
#0  0x10e67c36c
#1  0x10e67d075
#2  0x10e67d7e9
```
4. With this fix, if we try to extend the module simulation beyond the valid lateral boundary times
```
Timing for main: time 2000-01-24_23:54:00 on domain   1:    0.60755 elapsed seconds
Timing for main: time 2000-01-24_23:57:00 on domain   1:    0.57641 elapsed seconds
Timing for main: time 2000-01-25_00:00:00 on domain   1:    0.60817 elapsed seconds
Timing for Writing wrfout_d01_2000-01-25_00:00:00 for domain        1:    0.04499 elapsed seconds
d01 2000-01-25_00:00:00  Input data is acceptable to use: wrfbdy_d01
           2  input_wrf: wrf_get_next_time current_date: 2000-01-25_00:00:00 Status =           -4
-------------- FATAL CALLED ---------------
FATAL CALLED FROM FILE:  <stdin>  LINE:    1134
 ---- ERROR: Ran out of valid boundary conditions in file wrfbdy_d01
-------------------------------------------
```

MMM Classroom regtest; em_real, nmm, em_chem; GNU only
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants