Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dumpfields=true fails in module_fcst_grid_comp.F90 for coupled model #1765

Closed
DeniseWorthen opened this issue May 22, 2023 · 18 comments · Fixed by NOAA-EMC/fv3atm#856 or #2355
Closed
Assignees
Labels
bug Something isn't working

Comments

@DeniseWorthen
Copy link
Collaborator

Description

The nems.configure variable dumpfields=true should allow the coupling fields to be written from the component itself. This feature was previously working in the fv3_cap, but now fails with the following error:

20230522 132033.780 ERROR            PET000 ESMCI_IO.C:542 ESMCI::IO::write() Operation not yet supported  - tile count of 6 != 1 - not supported yet
20230522 132033.791 ERROR            PET000 ESMCI_IO.C:942 ESMCI::IO::close() Unable to close file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMCI_IO.C:482 ESMCI::IO::write() Operation not yet supported  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMCI_ArrayBundle.C:493 ESMCI::ArrayBundle::write() Unable to write to file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMCI_ArrayBundle_F.C:436 c_esmc_arraybundlewrite() Unable to write to file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 ESMF_ArrayBundle.F90:3976 ESMF_ArrayBundleWrite() Unable to write to file  - Internal subroutine call returned Error
20230522 132033.791 ERROR            PET000 module_fcst_grid_comp.F90:1638 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 module_fcst_grid_comp.F90:298 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 module_fcst_grid_comp.F90:865 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 fv3_cap.F90:396 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 ATM:src/addon/NUOPC/src/NUOPC_ModelBase.F90:700 Unable to write to file  - Passing error in return code
20230522 132033.791 ERROR            PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:2577 Unable to write to file  - Phase 'IPDvXp01' Initialize for modelComp 2: ATM did not return ESMF_SUCCESS
20230522 132033.791 ERROR            PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:1286 Unable to write to file  - Passing error in return code
20230522 132033.792 ERROR            PET000 UFS Driver Grid Comp:src/addon/NUOPC/src/NUOPC_Driver.F90:457 Unable to write to file  - Passing error in return code
20230522 132033.792 ERROR            PET000 UFS.F90:386 Unable to write to file  - Aborting UFS

CMEPS has the capability in the mediator history files to write the fields it receives but it is sometimes useful to confirm that the fields imported by CMEPS are identical to those exported by FV3. This is no longer possible.

To Reproduce:

Run any of the global coupled configurations using the RT system. Use the run directory and set DumpFields = true in the ATM configuration attributes in nems.configure.

Additional context

Output

@uturuncoglu
Copy link
Collaborator

@DeniseWorthen Is this also failing with ESMF 8.5.0. @billsacks know that part of code better since he implemented multi-tile I/O support and could have some idea.

@billsacks
Copy link

This should work with recent versions of ESMF - or at least, recent versions of ESMF shouldn't give this particular error. Multi-tile I/O support was introduced in ESMF 8.4.0 with some limitations; some of these limitations were addressed in 8.5.0 and additional limitations were addressed in 8.6.0.

@uturuncoglu
Copy link
Collaborator

@billsacks Thanks. That is really helpful. @DeniseWorthen It would be nice to test this again when new spack-stack (1.6.0, #2036) is available with ESMF 8.6.0. If we still issue, we could try to fix it. Anyway, let me know what you think?

@DeniseWorthen
Copy link
Collaborator Author

DeniseWorthen commented Jan 18, 2024

Thanks. My understanding was the way we were doing multi-tile output needed to be re-worked, so that the State_RWFields_tiles would either not be used, or would be re-factored now that the multi-tile output I/O was enabled.

@junwang-noaa
Copy link
Collaborator

@DeniseWorthen May I ask if further code updates are required with ESMF 8.6.0?

@DeniseWorthen
Copy link
Collaborator Author

@junwang-noaa No, I think we need updates on the FV3 side that tries to use the multi-tiled IO.

@DusanJovic-NOAA
Copy link
Collaborator

I ran the cpld_control_p8 test with dumpfields set to true, and I see the following error:

20240131 183007.780 ERROR            PET000 ESMCI_IO_Handler.C:550 ESMCI::IO_Handler::getFilename() Wrong data value  - For multi-tile IO, the specified file name must have exactly one occurrence of '*', which will be replaced by the tile number. Filename <diagnostic_FV3_fcstGrid1.nc> has 0 occurrences.                                                                          
20240131 183007.781 ERROR            PET000 ESMCI_PIO_Handler.C:1323 ESMCI::PIO_Handler::openOneTileF Wrong data value  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_IO_Handler.C:744 ESMCI::IO_Handler::open() Wrong data value  - - Error opening file
20240131 183007.781 ERROR            PET000 ESMCI_IO.C:825 ESMCI::IO::open() Wrong data value  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_IO.C:469 ESMCI::IO::write() Wrong data value  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_ArrayBundle.C:496 ESMCI::ArrayBundle::write() Unable to write to file  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMCI_ArrayBundle_F.C:445 c_esmc_arraybundlewrite() Unable to write to file  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 ESMF_ArrayBundle.F90:3957 ESMF_ArrayBundleWrite() Unable to write to file  - Internal subroutine call returned Error
20240131 183007.781 ERROR            PET000 module_fcst_grid_comp.F90:1596 Unable to write to file  - Passing error in return code
20240131 183007.781 ERROR            PET000 module_fcst_grid_comp.F90:298 Unable to write to file  - Passing error in return code
20240131 183007.781 ERROR            PET000 module_fcst_grid_comp.F90:865 Unable to write to file  - Passing error in return code
20240131 183007.781 ERROR            PET000 fv3_cap.F90:397 Unable to write to file  - Passing error in return code

I added a single '*' to the file name and the code passed that point in module_fcst_grid_comp.F90 but is now crashing with the following error in module_cap_cpl.F90:

20240131 184220.046 ERROR            PET000 ESMFIO.F90:515 ESMFIO_FieldAccess() Operation not yet supported  - Only 2D fields are supported.                                                 
20240131 184220.047 ERROR            PET000 ESMFIO.F90:369 ESMFIO_Write() Operation not yet supported  - Internal subroutine call returned Error
20240131 184220.047 ERROR            PET000 module_cap_cpl.F90:155 Operation not yet supported  - Passing error in return code
20240131 184220.047 ERROR            PET000 module_cap_cpl.F90:59 Operation not yet supported  - Passing error in return code

where is ESMFIO_Write defined? Is it ESMF API, I can not fined the description in ESMF documentation.

@DeniseWorthen
Copy link
Collaborator Author

@DusanJovic-NOAA I can see it in the esmf code here src/Superstructure/IOAPI/interface/ESMFIO.F90

@uturuncoglu
Copy link
Collaborator

@DusanJovic-NOAA I don't have all the details but as I know those calls only used internally by ESMF. So, they are not exposed to user. They are internally called when you cal FieldWrite etc. (any call to write Field and Fieldbundle that is exposed to user), ESMF creates the ESMF I/O object to use PIO capability. Anyway, @billsacks could add more in here since he extended multi-tile I/O support in the ESMF side.

@DusanJovic-NOAA
Copy link
Collaborator

Okay, thanks. Let me try to use FieldWrite instead.

@uturuncoglu
Copy link
Collaborator

@DusanJovic-NOAA As I know Array write calls also use same underlying I/O infrastructure. So, If ArrayWrite etc. is failing there could be a bug in the ESMF side. There could be some limitations in writing the fields in ESMF side that I don't know. Again, @billsacks might have more information. Please open a support ticket if you think that this is bug.

@billsacks
Copy link

Interesting. This was a learning experience for me. It looks like ESMFIO.F90 is an entirely different, undocumented I/O interface that Raffaele Montuoro wrote in 2018 to enable I/O of multi-tile Fields. Unlike most ESMF I/O, this does not go through PIO, but instead calls into netcdf directly. I haven't read through it carefully, but my guess is that the functionality of this module may now be superseded by the multi-tile I/O work I did for FieldWrite, FieldBundleWrite, etc.

@DusanJovic-NOAA
Copy link
Collaborator

DusanJovic-NOAA commented Feb 9, 2024

I tried to convert the diagnose_cplFields routine (actually State_RWFields_tiles that is called by diagnose_cplFields) to use FieldBundleWrite instead of ESMFIO_Write, and now I see the following error:

20240209 191247.823 ERROR            PET000 ESMCI_IO.C:1201 ESMCI::IO::redist_arraycreate1de Operation not yet supported  - Tile count != 1 is not supported
20240209 191247.823 ERROR            PET000 ESMCI_IO.C:591 ESMCI::IO::write() Operation not yet supported  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMCI_IO.C:494 ESMCI::IO::write() Operation not yet supported  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMCI_IO_F.C:171 c_esmc_iowrite() Unable to write to file  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMF_IO.F90:523 ESMF_IOAddArray() Unable to write to file  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 ESMF_FieldBundle.F90:18015 ESMF_FieldBundleWrite() Unable to write to file  - Internal subroutine call returned Error
20240209 191247.827 ERROR            PET000 module_cap_cpl.F90:160 Unable to write to file  - Passing error in return code
20240209 191247.827 ERROR            PET000 module_cap_cpl.F90:62 Unable to write to file  - Passing error in return code                                                                    

@billsacks
Copy link

It looks like you are running into a limitation with multi-tile I/O that was removed in the ESMF 8.6.0 release: prior to 8.6.0, multi-tile I/O only worked on Arrays / Fields with 1 DE per PET. Based on this error, it seems like you have Fields that use a decomposition with multiple DEs per PET (or possibly 0 DEs per PET). Solutions to this would be to either update to the 8.6.0 release (or the soon-upcoming 8.6.1 release that will have some other patches wanted by the UFS) or, if feasible, change the decomposition of these fields to always use the default of 1 DE per PET.

@DusanJovic-NOAA
Copy link
Collaborator

It looks like you are running into a limitation with multi-tile I/O that was removed in the ESMF 8.6.0 release: prior to 8.6.0, multi-tile I/O only worked on Arrays / Fields with 1 DE per PET. Based on this error, it seems like you have Fields that use a decomposition with multiple DEs per PET (or possibly 0 DEs per PET). Solutions to this would be to either update to the 8.6.0 release (or the soon-upcoming 8.6.1 release that will have some other patches wanted by the UFS) or, if feasible, change the decomposition of these fields to always use the default of 1 DE per PET.

Thanks @billsacks I'll try to build the model with ESMF 8.6.0

@DusanJovic-NOAA
Copy link
Collaborator

I updated diagnose_cplFields routine in FV3 to use ESMF_FieldBundleWrite. I can now write the coupling fields on 6-tiles with ESMF v8.6.0. Code is this branch:

https://github.com/DusanJovic-NOAA/fv3atm/tree/dump_cpl_fields

It can be tested using the corresponding ufs-weather-model branch:

https://github.com/DusanJovic-NOAA/ufs-weather-model/tree/dump_cpl_fields

I temporarily updated the esmf to version 8.6.0 from spack-stack 1.6.0 on Hera.

@BrianCurtis-NOAA
Copy link
Collaborator

When this makes it to PR form, make sure to add a dependency on ESMF 8.6.0

@junwang-noaa
Copy link
Collaborator

@DusanJovic-NOAA now we have ESMF 8.6.0 on all platforms, would you please test again to see if this issue is resolved? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment