Implementing CI #207

laurenchilutti · 2022-09-02T13:48:13Z

Description
Parallelworks is currently down, so I will keep this as a draft until it comes back on. - This is resolved.

This PR brings in a github actions yaml file that uses a Self-Hosted Runner (Parallelworks rdhpcs cloud resources) to automatically compile solo shield and run idealized tests on Pull Requests to the main branch of the GFDL_atmos_cubed_sphere repository. The actual scripts referenced in this yaml are stored on the Parallelworks resource. The following is a description of how the CI works:

The CI automatically triggers when anyone submits a PR to the main branch
The CI consists of 3 stages:

Build
Test
Shut down

The Build stage has 4 steps:

Checkout Code
Build the sw solo shield configuration
Build the nh solo shield configuration
Build the hydro solo shield configuration

The Test stage only contains one test in this initial release. This stage will run a test and then compare restarts against a baseline. The test will fail if the run fails or the restarts differ
We are working to speed up these tests on the cloud (currently they are too slow to run all idealized tests)

How Has This Been Tested?

This has been tested with generating test PRs in my fork:

laurenchilutti#14 shows that when a PR targets a branch not named "main" the tests will not run

laurenchilutti#15 shows that when a PR targets the main branch tests run

Checklist:

Please check all whether they apply or not

My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
(Relies on Updates to build system for use in CI on Parallelworks SHiELD_build#13)
(Relies on Build nceplibs, use external libraries, and build different configurations in parallel SHiELD_build#15)

thomas-robinson · 2022-09-02T14:03:47Z

.github/workflows/SHiELD_parallelworks.yml

+ runscript: [FV3checkoutStartClusters.py, FV3swStartClusters.py, FV3nhStartClusters.py, FV3hydroStartClusters.py]
+ steps:


Do each of these run through the cluster spin up part of the API script? Do they all build something?

They all run through the start up script and just have a different testcmd

The first only checks out code, the next 3 each build a different configuration

thomas-robinson · 2022-09-02T14:09:18Z

.github/workflows/SHiELD_parallelworks.yml

+ name: SOLO SHiELD build
+ strategy:
+ fail-fast: true
+ max-parallel: 1


How many runners do you have? You have 4 jobs, do they need to all run one after the other? You could set up 4 runners to run all 4 at the same time.

I only have 1 right now. But I will set up multiple runners so that we can do multiple tests at the same time. For the build step I need them all to build in series (and not parallel).

I believe the builds could be done in parallel if we have the CI system build the FMS library first.

Would there be an issue with the builds occurring simultaneously when they all use the same Build/exec folder? I could alter the Build system so that they all build with different exec folders

The current build system uses exec/${config}_${compiler}/ It could be altered to use a combination of config, hydro, comp, bit, and compiler

I will start making those changes - building in parallel would make everything much faster.

Remember to run the FMS build as a standalone process. Once we bring in the updates to build the few NCEPlibs needed by SHiELD, that would become part of that first step.

laurenchilutti · 2022-09-08T19:36:11Z

I have created PR NOAA-GFDL/SHiELD_build#15 to now have scripts to build libFMS and nceplibs. There is also the ability to store these libraries external to the SHiELD_build so that libFMS does not need to be built with every new CI instance. And I have updated the compile such that there are unique exec/ sub-directories for each configuration so that we can do all 3 builds in parallel.

I have also added more Github runners (3 in total) so we can run builds and tests in parallel. Once This PR from SHiELD_build is merged I will rerun the CI on this PR so we can see how it works

thomas-robinson · 2022-09-09T11:03:14Z

@laurenchilutti After this gets merged it, submit a PR with code that should fail to make sure that the tests fail when expected.

laurenchilutti · 2022-09-14T16:54:58Z

Tests passed and builds ran in parallel - future improvements I want to make:

Add more tests
Add more runners if needed (currently there are 3 so we can only run 3 jobs in parallel)
Make it such that any nh configuration tests only rely on the completion of the nhcompile, and the same for hydro and sw
Add tests that compare different layouts
Add tests that run from restarts to ensure restart functionality is as expected.

* updated fv_regional_bc.F90 to read levsp from GFS_ctl file (#152) * Updating a continue statement in fv_nudge (#153) * update in fv_nudge to fix 666 label logic * fix logic errors (#138) * update to external_ic::remap_scale to skip remapping non-existent IC tracers (#159) * Fix nudge logic (#157) * fix logic errors * fix answer changes * adding back a line that was mistakenly deleted in a previous commit (#166) * Release 042022 (#184) * Merge of updates from GFDL Weather and Climate Dynamics Division (202204): *add license header to missing files and fix typo in header *updates needed for fv3_gfsphysics to have access to bounded_domain *remove obsoleted driver/SHiELD files *updating to fix bug where long_name and units attributes were not being captured in the RESTARTS *remove unused function fv_diagnostics::max_vorticity_hy1 *remove avec timer remnants *adding ability to specify prefix and directory when reading and writing restarts *remove old style namelist read in favor of read from internal character variable *Added option for a mean wind *The planetary radius and rotation rate are now re-scalable by a namelist parameter (small_earth_scale) instead of using exclusively the hard-coded FMS constant. *fv_mapz: Cleanup and added helpful annotations to avoid getting lost so easily * remove duplicate code and fix lstatus on all grids depending on gfs_data and gfs_data.tile1 * New idealized tests *Makes the non-hydrostatic restart variables optional for reads to allow hydrostatic ICs *Fix the hydrostatic TE remapping; Add GMAO cubic for TE remapping, which is used if kord_tm=0 and remap_te=.true. *Add a TE remapping option (kord_tm=0) *Addressing GNU Warnings *Add the L75 vertical config from HAFS * clean up fms_mp_mod and remove mp_bcst * cherry pick 5193c6b from dev/emc * Attempt at integrating fixes on top of dev/emc branch. (#173) * remove outdated GFDL_tools/fv_diag_column.F90 which exists as the result of an improper merge. The correct file is tools/fv_diag_column.F90 (#191) * various io fixes (#192) * fixes io performance issues by making everyone a reader when io_layout=1,1 adds capability to use FMS feature to ignore data integrity checksums in restarts * rename enforce_rst_cksum to ignore_rst_cksum and change the default value for compatibility * fix index error (#196) * New notebooks (#190) * Moving source files into source directory * Added advection notebook * Fixed subplot spacing * New 3D case notebooks * New idealized tests Updated mtn_wave_tests_1km to restore missing graphics. * first try at held-suarez * Updated H-S superrotation notebook * New level placement tool in an interactive note * Minor updates to notebooks; deleted fv_eta binary. * Regional decomposition test fix (when nrows_blend > 0) (#194) (#200) * Regional bc blend changes to extend into interior halos and overlap on corners. Still not working for u and v. dyn_core.F90 : Fix from Jun Wang to correct sync of u,v fv_regional_bc.F90 : add check for nrows_blend > tile size; fix error when nrows_blend=1 Conflicts (not taken during chery-pick): driver/SHiELD/atmosphere.F90 model/fv_sg.F90 Co-authored-by: Ted Mansell <[email protected]> * Implementing CI (#207) * CI Parallelworks update (#211) * Update call to read_input_nml and remove unnecessary code. (#161) * Removing use of INTERNAL_FILE_NML and cleaning up read_namelist_test_cases to remove unused argument * deleting duplicate call to read_namelist_test_case_nml in fv_control * removing commented code in fv_control Co-authored-by: Rusty Benson <[email protected]> Co-authored-by: menzel-gfdl <[email protected]> Co-authored-by: Rusty Benson <[email protected]> Co-authored-by: MatthewPyle-NOAA <[email protected]> Co-authored-by: lharris4 <[email protected]> Co-authored-by: Ted Mansell <[email protected]>

adding parallelworks CI file

08ecd2e

laurenchilutti changed the title ~~Draft: Implementing CI~~ DRAFT: Implementing CI Sep 2, 2022

laurenchilutti marked this pull request as draft September 2, 2022 13:50

thomas-robinson reviewed Sep 2, 2022

View reviewed changes

laurenchilutti changed the title ~~DRAFT: Implementing CI~~ Implementing CI Sep 2, 2022

laurenchilutti marked this pull request as ready for review September 2, 2022 15:00

laurenchilutti requested a review from bensonr September 2, 2022 15:01

bensonr approved these changes Sep 8, 2022

View reviewed changes

update to allow builds in parallel

4ac55d0

laurenchilutti merged commit ebd5a46 into NOAA-GFDL:main Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing CI #207

Implementing CI #207

laurenchilutti commented Sep 2, 2022 •

edited

Loading

thomas-robinson Sep 2, 2022 •

edited

Loading

laurenchilutti Sep 2, 2022

laurenchilutti Sep 2, 2022

thomas-robinson Sep 2, 2022

laurenchilutti Sep 2, 2022

bensonr Sep 2, 2022

laurenchilutti Sep 2, 2022

bensonr Sep 2, 2022

laurenchilutti Sep 2, 2022

bensonr Sep 2, 2022

laurenchilutti commented Sep 8, 2022

thomas-robinson commented Sep 9, 2022

laurenchilutti commented Sep 14, 2022

		runscript: [FV3checkoutStartClusters.py, FV3swStartClusters.py, FV3nhStartClusters.py, FV3hydroStartClusters.py]
		steps:

Implementing CI #207

Implementing CI #207

Conversation

laurenchilutti commented Sep 2, 2022 • edited Loading

thomas-robinson Sep 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

laurenchilutti commented Sep 8, 2022

thomas-robinson commented Sep 9, 2022

laurenchilutti commented Sep 14, 2022

laurenchilutti commented Sep 2, 2022 •

edited

Loading

thomas-robinson Sep 2, 2022 •

edited

Loading