Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DA EnVar unnecessary allocation to reduce memory requirement #899

Merged
merged 1 commit into from
May 15, 2019

Conversation

jamiebresch
Copy link
Contributor

@jamiebresch jamiebresch commented May 7, 2019

TYPE: enhancement

KEYWORDS: WRFDA, EnVar, ep

SOURCE: Jamie Bresch (NCAR)

DESCRIPTION OF CHANGES:

Avoid unused ep allocation in alloc_and_configure_domain for EnVar applications.
ep is allocated in subroutine da_solve_init (var/da/da_main/da_solve_init.inc) or
subroutine reallocate_analysis_grid (var/da/da_main/da_solve_dual_res_init.inc)
with desired dimensions.
The 4th dimension of ep should be ensdim_alpha*num_fgat_time instead of just ensdim_alpha
as defined in registry.
For dual-res EnVar, the ep is re-allocated with coarse intermediate grid dimensions.
Depending on domain/ensemble sizes, the memory reduction with the fix can
allow the job to run on regular nodes instead of large-memory nodes on NCAR/cheyenne.

LIST OF MODIFIED FILES:

modified: Registry/registry.var
modified: var/da/da_main/da_wrfvar_init2.inc

TESTS CONDUCTED:

A dual-resolution EnVar case, before the fix:
alloc_space_field: domain 1 , 2088567208 bytes allocated
alloc_space_field: domain 2 , 47306336 bytes allocated
alloc_space_field: domain 2 , 2601652648 bytes allocated

A dual-resolution EnVar case, after the fix:
alloc_space_field: domain 1 , 1281764008 bytes allocated
alloc_space_field: domain 2 , 47306336 bytes allocated
alloc_space_field: domain 2 , 1594559848 bytes allocated

An EnVar case, on proc 0, before the fix:
alloc_space_field: domain 1 , 1705984056 bytes allocated

An EnVar case, on proc 0, after the fix:
alloc_space_field: domain 1 , 689084856 bytes allocated

RELEASE NOTE: Fix DA EnVar unnecessary allocation to reduce memory requirement.

modified:   Registry/registry.var
modified:   var/da/da_main/da_wrfvar_init2.inc
@davegill
Copy link
Contributor

davegill commented May 8, 2019

@jamiebresch
Jamie,
I do not see how alloc_ep ever gets set to 1.

@jamiebresch
Copy link
Contributor Author

I do not see how alloc_ep ever gets set to 1.

That is the intention.
ep is allocated either in da_solve_init.inc or da_solve_dual_res_init.inc.

@davegill
Copy link
Contributor

davegill commented May 8, 2019

@jamiebresch
Jamie,
The alloc_ep variable is new in the Registry, and it is set as derived. The derived part means that this namelist varaible has to be explicitly set in the code. Since this is a new variable in the Registry, there needs to be new code to initialize it. I see where alloc_ep can get set to zero, but should there also be new source code to set this to one?

@jamiebresch
Copy link
Contributor Author

The intention is to disable the allocation of ep in alloc_and_configure_domain.

@liujake
Copy link
Contributor

liujake commented May 8, 2019

@davegill This is not a bug fix. Should it go into develop instead of 4.1.1?

@davegill
Copy link
Contributor

davegill commented May 8, 2019

This is not a bug fix. Should it go into develop instead of 4.1.1?

@liujake @jamiebresch
Jake,
Good question, we will probably discuss this off-line.

@davegill
Copy link
Contributor

davegill commented May 8, 2019

@jamiebresch
Jamie,
Obviously there is something that I am missing. Is there another mechanism where the ep variables are allocated outside of the context of this PR?

@jamiebresch
Copy link
Contributor Author

jamiebresch commented May 8, 2019

ep is allocated in existing subroutine da_solve_init (for EnVar) or subroutine reallocate_analysis_grid (for dual-res EnVar) with desired dimensions. The 4th dimension is ensdim_alpha*num_fgat_time instead of just ensdim_alpha as defined in registry. For dual-res EnVar, the ep is re-allocated with coarse intermediate grid dimensions. The ep allocation in alloc_and_configure_domain is not needed. This PR removes the unnecessary allocation.
To me, inadequate use of memory is a bug that deserves fix asap so that my jobs consume fewer computing core hours and have less waiting time in queue.

@jjguerrette
Copy link
Contributor

@jamiebresch for the pure EnVar case, the memory for grid%ep still gets allocated later, right? Otherwise we would not be able to use grid%ep. What is the total memory used in the PBS job with and without this change? You can find this in the job summary e-mail if you are using Cheyenne.

@jjguerrette
Copy link
Contributor

jjguerrette commented May 8, 2019

Another question I have is whether da_solve_init needs to allocate grid%ep variables on memory grid bounds or if it can use patch/tile bounds? Do we ever perform communication of the ep variables? Same question for alpha.

@jjguerrette
Copy link
Contributor

Perhaps to Dave's point, there is more information initialized in alloc_and_configure_domain than simply the allocation of the data arrays. Do we ever need any of that other info?

@davegill
Copy link
Contributor

@jamiebresch
Jamie,
I am OK with this PR.

@jjguerrette
Copy link
Contributor

While these changes take care of some memory management for EnVar simulations, I think the savings for single-resolution EnVar are a very misleading. The message given in the log file for memory usage does not include hard-coded allocations after alloc_and_configured_domain (i.e., within da_solve_init). Hence why I asked about the total simulation memory footprint. If those more accurate numbers are included, I am happy moving forward.

I do not know enough about the dual resolution setup to comment on that.

@liujake
Copy link
Contributor

liujake commented May 13, 2019

I am also not familiar with this part of code. But I discussed with Craig who developed dual-res. hybrid capability: he thinks it is Ok. So I am Ok for JJ to merge it in even though It would be better to explain more clearly what was wrong in PR message to those not familiar to this part of code.

@jamiebresch
Copy link
Contributor Author

For single-resolution EnVar: in var/da/da_main/da_wrfvar_init2.inc,
head_grid is allocated by alloc_and_configure_domain, then input_grid points to head_grid.
It is input_grid that gets passed to var/da/da_main/da_solve.inc and da_solve_init.inc.
In var/da/da_main/da_solve_init.inc, grid%ep is not deallocated first, but
the allocation of grid%ep does not complain.
Seems to me that there exists redundant allocation of ep in head_grid and input_grid.

This fix removes the unnecessary allocation as stated in the PR title and message.
With the fix, I am able to run experiments (large domain and ensemble sizes) using more reasonable computing resources.

A small single-resolution EnVar case running with 6 nodes x 24 procs:
Without the fix:
Used memory in task 0: 1505.50MiB (+1.82MiB overhead).
min: 1466.70MiB max: 1566.27MiB among all processors.
resources_used.mem=224709480kb

With the fix:
Used memory in task 0: 1220.16MiB (+1.82MiB overhead).
min: 1181.18MiB max: 1253.49MiB among all processors.
resources_used.mem=179608248kb

Copy link
Contributor

@jjguerrette jjguerrette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for those updated memory benchmarks @jamiebresch. So this fix achieves a 20% improvement (for single resolution EnVar) and avoids the reallocation of the pointers in ep variables. That's very nice.

@jjguerrette
Copy link
Contributor

@liujake, I will merge by end of day unless you object.

@jjguerrette
Copy link
Contributor

After rereading your message @liujake, I will merge now.

@jjguerrette jjguerrette merged commit 1a2af9b into wrf-model:release-v4.1.1 May 15, 2019
@jamiebresch jamiebresch deleted the fix_EnVar_memory branch May 15, 2019 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants