Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove CONTIGUOUS from dist_spec_control_argument #98

Merged
merged 12 commits into from
Jul 19, 2024

Conversation

DJDavies2
Copy link
Contributor

I am not entirely sure what is happening here, but this change does seem to fix the issue at least for the failing job in question. The change just removes the contiguous attribute from one of the arguments (PSPECG). The other argument with the contiguous attribute, PSPEC, seems to be fine, but I don't know why there would be a difference. The only obvious difference is the intent.

@samhatfield
Copy link
Collaborator

These CONTIGUOUSs were added in the same commit as the other ones. I'll see what Météo-France thinks about having these removed as well.

@DJDavies2
Copy link
Contributor Author

A bit more info:

The job concerned is a 2PE job where the failure occurs on one of the PEs but not the other (the code then stops in the mpl_wait in dist_spec_control). The PSPECG argument is present on one PE and not present on the other. The PE that crashes is the one where the argument isn't present. The origin of this difference is in trans_distspec (in transi_module.F90) where there are two calls to dist_spec, one which passes PSPECG and the other with doesn't.

@wdeconinck
Copy link
Collaborator

Is this for a specific compiler only, or is this generally a problem?

@DJDavies2
Copy link
Contributor Author

I am seeing failures in different downstream systems: saber (as noted above) and also lfric-lite-jedi. The saber failures are with gnu 9.2. For lfric-lite-jedi the failures occur with gnu 9.2 and gnu 12.2.

It seems to work with gnu 7.3.0. Other systems would take some time to test as they are not fully ported.

@DJDavies2
Copy link
Contributor Author

This is related to #95.

@DJDavies2
Copy link
Contributor Author

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

@wdeconinck
Copy link
Collaborator

@marsdeno please could you liaise with author(s) of the CONTIGUOUS change?

@samhatfield
Copy link
Collaborator

@DJDavies2 do you think this is related to #93?

@marsdeno
Copy link
Collaborator

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

I guess doing something like

#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS 
#endif

and then

REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)

might be acceptable?
The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.

@wdeconinck
Copy link
Collaborator

wdeconinck commented Jun 17, 2024

So a known working compiler is gfortran 7.3.
I guess we could disable CONTIGUOUS attribute here for gfortran >= 8 then.

@marsdeno
Copy link
Collaborator

@DJDavies2 do you have a simple reproducer I could try with latest GNU on our machine to see if the issue is still there?

@DJDavies2
Copy link
Contributor Author

@DJDavies2 do you have a simple reproducer I could try with latest GNU on our machine to see if the issue is still there?

I don't have a simple reproducer I'm afraid. I can try to produce one if you like, but I can't guarantee I will be able to do so.

@DJDavies2
Copy link
Contributor Author

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

I guess doing something like

#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS 
#endif

and then

REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)

might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.

I was thinking more like:

#ifdef CONTIG_BUGGY_COMPILER
REAL(KIND=JPRB), OPTIONAL, INTENT(IN) :: PSPECG(:,:)
#else
REAL(KIND=JPRB), OPTIONAL, INTENT(IN), CONTIGUOUS :: PSPECG(:,:)
#endif

but either way would be fine. The file already ends in .F90 so I think it is already being "pre-processed" even if their aren't any cpp statements at the moment.

@DJDavies2
Copy link
Contributor Author

@DJDavies2 do you think this is related to #93?

It is a different part of the code and a different compiler so I don't know.how connected these two issues are. It would certainly be helpful if #93 could be handled.

@marsdeno
Copy link
Collaborator

Would a possibility be to add a compile time option to have the CONTIGUOUS attribute or not? It would make the code a bit ugly, but would allow people to have the attribute or not as works best for them.

I guess doing something like

#ifdef CONTIG_BUGGY_COMPILER
#define CONTIG_STATUS ,CONTIGUOUS
#else
#define CONTIG_STATUS 
#endif

and then
REAL(KIND=JPRB) ,OPTIONAL, INTENT(IN) CONTIG_STATUS :: PSPECG(:,:)
might be acceptable? The downside is syntax would no longer be valid without preprocessing, but at least it would be be pretty clear what is being achieved.

I was thinking more like:

#ifdef CONTIG_BUGGY_COMPILER REAL(KIND=JPRB), OPTIONAL, INTENT(IN) :: PSPECG(:,:) #else REAL(KIND=JPRB), OPTIONAL, INTENT(IN), CONTIGUOUS :: PSPECG(:,:) #endif

but either way would be fine. The file already ends in .F90 so I think it is already being "pre-processed" even if their aren't any cpp statements at the moment.

The advantage of defining something like CONTIG_STATUS is it could be done once, and used wherever needed. In fact it could be done in one of the CMake files.

@wdeconinck
Copy link
Collaborator

So what's missing now is a CMake modification to add the preprocessor definition CONTIG_BUGGY_COMPILER for particular compilers...

@DJDavies2
Copy link
Contributor Author

So what's missing now is a CMake modification to add the preprocessor definition CONTIG_BUGGY_COMPILER for particular compilers...

I wasn't quite sure how to approach this. I don't know exactly which compilers/versions would need this option to be set and it could get messy and error-prone to try and handle it in the cmake file. An alternative would be to just let users set the option themselves via CMAKE_Fortran_FLAGS if they need it. What do you think?

@marsdeno
Copy link
Collaborator

marsdeno commented Jul 12, 2024

Something roughly like

if( CMAKE_Fortran_COMPILER_ID MATCHES "Intel"  )
  if( CMAKE_Fortran_COMPILER_VERSION VERSION_LESS_EQUAL 19)
    set( COMPILER_HAS_CONTIG_ISSUES True)
  endif()
elseif( CMAKE_Fortran_COMPILER_ID MATCHES "GNU"  )
  if( CMAKE_Fortran_COMPILER_VERSION MATCHES "9.2|12.2")
    set( COMPILER_HAS_CONTIG_ISSUES True)
  endif()
endif()

in the main CMakeLists.txt

and adding

if( COMPILER_HAS_CONTIG_ISSUES ) 
  target_compile_definitions( trans_${prec} PRIVATE CONTIG_BUGGY_COMPILER) 
endif()

after the ecbuild_add_library() call in src/trans/CMakeLists.txt ?

@wdeconinck
Copy link
Collaborator

wdeconinck commented Jul 15, 2024

Good @marsdeno except I'd perhaps use the variable ECTRANS_HAVE_CONTIGUOUS_ISSUE instead.
Also let's make sure to insert a link to this Github issue in comments.
I'd also wrap the first snippet with:
if (NOT DEFINED ECTRANS_HAVE_CONTIGUOUS_ISSUE) so that it could be overwritten (turn OFF) via a command line argument.

@wdeconinck
Copy link
Collaborator

@DJDavies2 I applied my suggestion on top of your latest change in
2fdc7ff

@DJDavies2
Copy link
Contributor Author

Thanks, I have tested the latest version of this branch and it seems to work for us.

Copy link
Collaborator

@wdeconinck wdeconinck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! This looks like ready to merge then.

@wdeconinck wdeconinck merged commit d17f2b4 into ecmwf-ifs:develop Jul 19, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants