-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge redgreen-optimized into develop (Part 2) #120
Conversation
This was needed in an earlier version!
@@ -1071,6 +1084,10 @@ subroutine get_command_line_arguments(nsmax, cgrid, iters, nfld, nlev, lvordiv, | |||
character(len=128) :: carg ! Storage variable for command line arguments | |||
integer :: iarg = 1 ! Argument index | |||
|
|||
#ifdef USE_GPU | |||
!$acc init | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can this not be done within setup_trans0 ?
OpenACC should be implementation detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the commit log this needs to be done before mpl_init
. @marsdeno is that really the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was told that MPI_INIT benefits from knowing OpenACC/GPUs will be used, in order to set up the right comms protocols in UCX layer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bummer. If one is using cuda or hip directly without OpenACC, how would you do this then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is USE_GPU
defined anyway? There's only one reference to it:
ac6-102:/ec/res4/hpcperm/nash/ectrans $ ack USE_GPU
src/programs/ectrans-benchmark.F90
1090:#ifdef USE_GPU
src/trans/gpu/internal/trgtol_mod.F90
569:#ifdef USE_GPU_AWARE_MPI
691:#ifdef USE_GPU_AWARE_MPI
src/trans/gpu/internal/trmtol_mod.F90
168:#ifdef USE_GPU_AWARE_MPI
181:#ifdef USE_GPU_AWARE_MPI
src/trans/gpu/internal/trltog_mod.F90
695:#ifdef USE_GPU_AWARE_MPI
730:#ifdef USE_GPU_AWARE_MPI
src/trans/gpu/internal/trltom_mod.F90
174:#ifdef USE_GPU_AWARE_MPI
187:#ifdef USE_GPU_AWARE_MPI
src/trans/gpu/external/setup_trans0.F90
205:#ifdef USE_GPU_AWARE_MPI
src/trans/gpu/CMakeLists.txt
93: $<${HAVE_GPU_AWARE_MPI}:USE_GPU_AWARE_MPI>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch Sam, this (USE_GPU) is something I have in my testing on AC that got lost in the rgo-develop preparation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bummer. If one is using cuda or hip directly without OpenACC, how would you do this then?
Good question, I'll try and find out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any documentation on this acc init
and MPI_Init
? Which MPI implementation?
Is there perhaps an environment variable as well?
It seems to have worked seemingly without as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it should also work without, but the default behaviour is much better if you do acc init before MPI_Init, because then MPI knows about which GPUs you are using, and so on. You can always tweak UCX variables to achieve the same, but I highly recommend to do acc init before MPI_init.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think then we have to keep it for now.
I've combined the three device module files in the gpu algor directory: 23ee8e2. Any feedback on that? |
Yes, I'm not sure that was actually used. |
Just to give you a sense of how well this branch performs, here are some numbers comparing rgo-develop/CPU with rgo-develop/GPU:
CPU:
GPU:
|
I've removed all adjoint routines and FLT-related stuff from the gpu source tree. Neither of these were ported to GPU, and in fact the files just contained out-of-date versions of the CPU code. |
Is it time to merge this? |
* In order to use this from fortran, add the following to callsite routine, * in the appropriate places * use device_mod, only : devicegetmeminfo * use iso_c_binding, only : c_int * integer(c_int) :: imemfree,imemtotal * DEVICEGETMEMINFO(imemfree,imemtotal) * write(nout,*) 'Current free memory: ',imemfree,' out of total ',imemtotal
Reinstate memgetinfo utility
Good to go for me now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to me!
Thank you everyone involved, especially @anmrde ! This has been a huge milestone!!! |
Celebrate good times!!! Now, let's try and at least get IFS to complete 24 hours of integration with this branch without crashing... |
* develop: Add GPU capability (ecmwf-ifs#106 , ecmwf-ifs#120)
Continuation of #106 because GitHub is too clever for its own good.