Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breakout deallocation calls into simpler smaller files #2070

Open
wants to merge 18 commits into
base: release-v4.6.1
Choose a base branch
from

Conversation

islas
Copy link
Collaborator

@islas islas commented Jun 14, 2024

TYPE: enhancement

KEYWORDS: intel, compilation, llvm, memory

SOURCE: internal

DESCRIPTION OF CHANGES:
Problem:
The Intel oneAPI compilers (and others like nvhpc) struggle with some of the larger (15k+ lines of code) files within WRF. This causes intense memory usage that is not often available to the average user not in a resource-rich environment. This often limits compilation to single threaded if even possible or to a dedicated environment with enough memory if available. If neither of those is available to a user, they will be unable to use these configurations entirely.

Solution:
This PR focuses on the deallocs.inc sections of code used in module_domain to reduce the include size to manageable levels. The include is instead broken out into many smaller files as external subroutines. The files are fully generated source code from the registry, with the calls to the subroutines also being generated as well. This also makes it relatively easy to change the number of files generated from a source code perspective. Build rules would need to be modified accordingly as seen in these changes.

TESTS CONDUCTED:
Attached to this PR are plots of the respective effects of theses changes. Changes were tested with intel and gcc compilers, but only intel memory usage is shown as it exacerbates the memory usage issue.

@islas islas requested review from a team as code owners June 14, 2024 22:46
@islas islas changed the title Breakup dealloc inc Breakout deallocation calls into simpler smaller files Jun 14, 2024
@islas
Copy link
Collaborator Author

islas commented Jun 14, 2024

Highlighted is the region during compilation which memory usage spikes that this PR addresses (module_domain) before these changes take place :
pre_module_domain

This usage is then dropped when using this PR's edits :
post_module_domain_dealloc_split

Zooming in we can now see that deallocs_ comprise a number of smaller compilation units :
post_module_domain_dealloc_split_zoom

@weiwangncar
Copy link
Collaborator

@islas This is so cool! Thanks for working on this! Does this affect compile time in any way?

@islas
Copy link
Collaborator Author

islas commented Jun 17, 2024

It only affects compile times as number of threads go up. For typical compilation with -j 4 things stay more or less the same. For -j 12 as an example there is a decent improvement, and I suspect we would see a similar trend across most compilers if you're able to use that many threads.

Using gfortran/gcc 34/MPI-enabled with ALL PR changes (#2070, #2069, #2068)

Command Category Before After
time ./compile em_real -j 4
real 7m41.110s 7m55.655s
user 15m22.460s 16m45.250s
sys 0m28.922s 0m31.553s
time ./compile_new -j 4
real 5m19.380s 5m21.952s
user 14m44.306s 16m16.393s
sys 0m26.598s 0m31.263s
time ./compile em_real -j 12
real 7m22.949s 6m36.738s
user 20m49.918s 19m31.420s
sys 0m42.890s 0m39.320s
time ./compile_new -j 12
real 4m25.744s 3m41.141s
user 20m33.427s 22m43.411s
sys 0m36.085s 0m41.981s

@weiwangncar
Copy link
Collaborator

I tested code before and after this PR, and model produces identical results in my test. Also with 4 processors, the compile time is about 12 minutes!

@weiwangncar
Copy link
Collaborator

The regression test results:

Test Type | Expected | Received | Failed
= = = = = = = = = = = = = = = = = = = = = = = = = = = =
Number of Tests : 23 24
Number of Builds : 60 57
Number of Simulations : 158 150 0
Number of Comparisons : 95 86 0

Failed Simulations are: 
None
Which comparisons are not bit-for-bit: 
None

weiwangncar
weiwangncar previously approved these changes Aug 8, 2024
@mgduda mgduda self-requested a review September 6, 2024 21:00
tools/gen_allocs.c Outdated Show resolved Hide resolved
tools/gen_allocs.c Outdated Show resolved Hide resolved
Copy link
Collaborator

@mgduda mgduda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor request to tidy up some whitespace; otherwise, this looks good to me.

int numFiles = 12;
int idx = 0;
FILE * fpSub;
char * filename_prefix = "deallocs_" ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to use consistent whitespace here between the variable type, variable name, = character, and trailing semicolon.

weiwangncar
weiwangncar previously approved these changes Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants