-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NLDAS-2 netCDF-4 LIS metforcing reader #1578
Add NLDAS-2 netCDF-4 LIS metforcing reader #1578
Conversation
This pull request adds the NLDAS-2 v020 netCDF-4 LIS metforcing reader into the LIS code. As the v002 GRIB-1 NLDAS-2 data and the v020 netCDF-4 NLDAS-2 data are nearly identical (within round-off error) - the two metforcing readers in LIS also produce nearly identical results. Note that as part of this pull request, the "Met forcing sources:" option for the NLDAS-2 v002 GRIB-1 data has changed from "NLDAS2" to "NLDAS2 grib". To use the new v020 netCDF-4 metforcing reader, please use "NLDAS2 netcdf" for "Met forcing sources:". Resolves: NASA-LIS#1577
Testcase directory forthcoming. v020 netCDF-4 FORA data: https://doi.org/10.5067/THUF4J1RLSYG v020 netCDF-4 FORB data: https://doi.org/10.5067/96S0R3LFOBTU |
Testcase available here: /discover/nobackup/projects/lis/tools/pull_requests/1578 |
I am reviewing it. |
Hello David. Would you please rework the "REVISION HISTORY" of the new files? For example, in timeinterp_nldas20.F90, from
to
or to
Thanks. |
Will do
Will do. |
These are not strictly speaking required, but it removes warning messages from ifort when strict checking is turned on.
This is to address warning messages from ifort when run with strict checks.
Hi @dmocko: I'm trying to run your code, but I'm confused by the multiple config files. (What is FORB?) Also, I notice the lis.config.nldas.netcdf* files point to /css/nldas/NLDAS2. Do you have a batch script with the SLURM settings to access that directory from the compute nodes? |
@dmocko Disregard the question about FORB, you answered in in the README file. (Note to all: RTFM.) |
@dmocko You can ignore my other question about /css access. LIS can see it just fine on the Milan nodes. (I was probably thinking about a special node to write to /css, but that is not needed for these test cases.) Notable finding: LIS will typically crash when launched with mpirun with a MPI error: Abort(1090063) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Unknown error class, error stack: But I don't see that when I use srun. Not sure what the problem is (the domain is 468x228). |
I get bit-wise identical results for the netCDF target output, both w/ and w/o FORB secondary forcing. |
Identical results are produced.
No changes in results.
Hi @emkemp I was traveling yesterday, and wasn't able to respond to your questions. I added a SLURM script to the pull request directory: It does include this line: #SBATCH --constraint="mil&cssro" https://www.nccs.nasa.gov/nccs-users/instructional/using-discover/file-system-storage/discover-css Based on the link above, the "cssro" constraint is listed as being needed, but it's possible by now that by default, all Milan SCU17 nodes have CSS read-only access by default. I had run this job initially without adding the "cssro" constraint. The "cssrw" constraint is only available for "datamove" nodes. Are you still getting that crash with mpirun? Can you try the above SLURM script as well? |
Minor point, but the NLDAS-2 domain is 464 by 224. |
What is the reason for changing the indentation in the code in commit "0993d3c" and the follow-on commit "303dfe1"? Is it just personal preference, or do these changes add something to the code? If it's just personal preference - I would prefer that these commits are backed-out, as I wrote this code, and will maintain it going forward. |
Hi @dmocko I admit "303dfe1" was probably me being too opinionated. I have no objection to reverting that commit (though I'll need Jim's help on doing that for both GitHub and my local repo on Discover). But I'm digging in on "0993d3c". Fixed format wastes space, and interferes with seeing how code is nested within the larger file. This is particularly true in nldas20_forcingMod.F90, where most of the code inside each subroutine has the same indentation as the subroutine declaration, which in turn have the same indentation as the parent module declaration. Also, member variables in the nldas20_type_dec type have the same indentation as the type declaration (this specific code confused me when I first looked at it). Fixed format dates back to the punch card era, and it's a compromise that is no longer needed. Certainly not with brand new code written in-house. |
@emkemp Nope - if you insist on "0993d3c", then I also like "303dfe1" as well. So keep them both. |
[ERR] usually means LIS cannot continue, so messages for less serious events have been changed to [WARN]. Also changed "Error:" to "[ERR]" for consistency.
Would you please give me the ldt.config file that you used to make your lis_input.nldas.noahmp401.nc file? |
Also placed in: /discover/nobackup/projects/lis/tools/pull_requests/1578 In the same directory, I also placed a "data" directory with a required mask file. Please compile LDT big_endian for the mask. |
Hi @emkemp - Are you still getting this error when using mpirun? Have you tried the nldas.job SLURM script I shared? |
Hi @dmocko It was still crashing with mpirun. I didn't try your job script (no modules were defined), but I did tweak the SBATCH settings. |
I tried your nldas.job (tweaking the lis.config filename, and commenting out the call to dmm_get_sbus.bash), and it again crashes with mpirun. Abort(1090063) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init_thread: Un |
@dmocko Thanks for the LDT config file. I am unable to reproduce your domain and parameter file. Do you still have the LDT log files? |
@emkemp I was able to run this pull request last week. I will pull the latest commits and try again. |
@emkemp I compiled and ran using these modules:
I also copied my configure.lis file into the pull_requests/1578 directory. |
Amazingly, I do! I also put them in: /discover/nobackup/projects/lis/tools/pull_requests/1578 Obviously, my lis_input and ldt.config files are 5+ years old. I will try to generate the lis_input file again, |
Actually, I just ran the ldt.config file again on Milan, and got the same exact ldtlog file. And the lis_input file is identical, other than some points in the "HYMAP_grid_elevation" field. But since I'm not running HYMAP for this testcase, it's not an issue. Do you see other fields that are different? I put the new lis_input file here: /discover/nobackup/projects/lis/tools/pull_requests/1578/newldt |
I just pulled the latest commits, and was able to run without any issue using "mpirun" and my SLURM script. |
Huh. When I recompile from scratch with my older SLES15 environment, nldas.job runs w/o issue. The module list is:
I'll experiment more with my newer SLES15 environment. |
Huh part 2. I recompiled in my new SLES15 environment:
And nldas.job ran successfully 3 times. Maybe I was getting a bad node a few days ago? |
...aaand I get a crash on test 4. But, the 3 successful tests with my new environment executable were launched (sbatched) in my old environment. This latest test was launched in the new environment. I wonder if there is a problem introduced by the ImageMagick/7.1.1-15 module. I will investigate. |
That is what I saw as well. Thanks. |
Grrr. Removing ImageMagick/7.1.1-15 doesn't solve the problem. Worse, the most recent failure doesn't have any MPI error message, it just stops with an error code. (I changed the execution line to "mpirun -np 252 ./LIS -f lis.config || exit 1".) There is no output or lislog, the SLURM sjobexitmod command clearly indicates a runtime failure. [emkemp@discover32 /discover/nobackup/projects/usaf_lis/emkemp/pr/1578/LISF/lis]$ sjobexitmod -l 36405759 36405759 s1189 2 borgl[103,105] FAILED 1:0 0:0 |
@emkemp - But is this a reason to not accept this pull request? It runs with a standard environment. We need this PR merged into master so it's available as soon as possible for operational users. Sometime in August, the NLDAS-2 GRIB-1 stream will be turned off at the GES DISC, and users need to transition to the netCDF-4 stream. |
@dmocko Jim and I are satisfied. Pull request APPROVED. |
This pull request adds the NLDAS-2 v020 netCDF-4 LIS metforcing reader into the LIS code.
As the v002 GRIB-1 NLDAS-2 data and the v020 netCDF-4 NLDAS-2 data are nearly identical (within round-off error) - the two metforcing readers in LIS also produce nearly identical results.
Note that as part of this pull request, the "Met forcing sources:" option for the NLDAS-2 v002 GRIB-1 data has changed from "NLDAS2" to "NLDAS2 grib". To use the new v020 netCDF-4 metforcing reader, please use "NLDAS2 netcdf" for "Met forcing sources:".
Resolves: #1577