Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

Closed
tandreasr opened this issue Apr 21, 2024 · 25 comments · Fixed by #1766
Closed

Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

tandreasr opened this issue Apr 21, 2024 · 25 comments · Fixed by #1766
Assignees
Labels
bug parallel Parallel capabilities

Comments

@tandreasr
Copy link

tandreasr commented Apr 21, 2024

Hi again,
while testing the new parallel capabilites I realized that the CSV file(s) created by using the above mentioned option in parallel, lack(s) all columns related to single models ("MODELNAME"_inner_dvmax etc.)
Instead they only contain the convergence summary for all those models processed by the corresponding MPI rank.
Do you see any chance to implement that output the very same way as IMS does within a "foreseeable" future?
Meaning to add those columns to the CSV belonging to the MPI rank for all models processed by that rank)
I think that sometimes it's quite important to obtain an overview about which models have convergence issues
(not just the worst one listed in solution_inner_d?max_model).
And anyway keeping the CSV format identical between IMS and PetSC seems a logical choise?
Regards
Andreas

@tandreasr tandreasr added the bug label Apr 21, 2024
@tandreasr
Copy link
Author

One more addition to this issue - actually this is a different topic connected to the CSV in parallel.
Obviously something is still wrong with the columns "solution_inner_dvmax_model" and "solution_inner_drmax_model".
In the example CSV below they are uninitialized in the first couple of rows and always 1 afterwards.
Instead they should contain the model number associated with the MPI rank.

Bild_2024-04-22_083124867

@mjr-deltares mjr-deltares self-assigned this Apr 23, 2024
@mjr-deltares mjr-deltares added the parallel Parallel capabilities label Apr 23, 2024
@mjr-deltares
Copy link
Contributor

Hi @tandreasr , thanks for reporting this back. Will look into this to have it fixed.

@mjr-deltares
Copy link
Contributor

mjr-deltares commented Apr 29, 2024

HI @tandreasr , I am looking at your issue and maybe you can provide a minimal example? I was a bit confused by:

Instead they only contain the convergence summary for all those models processed by the corresponding MPI rank.
...
Meaning to add those columns to the CSV belonging to the MPI rank for all models processed by that rank)

As far as your question about keeping output the same. We stayed clear so far from synchronizing over all processes to gather output in a single file where possible, hence the *.p0.csv format. Also, IMS has an additionally a solver parameter (alpha/omega) that we do not print when using the parallel PETSc solver. Other than that, we could definitely look into improving the parallel csv output. I did reproduce the 'uninitialized' issue you mentioned in your second comment. Will have a look into that.

@mjr-deltares
Copy link
Contributor

I do think that I understand your point now, you are talking about these columns missing from the csv:

image

Will look into that too.

@tandreasr
Copy link
Author

You are right: those are exactly the columns I'm missing, maybe I was a little unclear :-)
The missing alpha/omega columns aren't a problem for me.
I just need to get a model-wise convergence output in order to better identify problems.

@mjr-deltares
Copy link
Contributor

Hi @tandreasr , this should be fixed now. Please reopen if not.

@tandreasr
Copy link
Author

tandreasr commented May 2, 2024

Me again :-)
your fix works as expected - Thank you very much!
Just one small thing which still needs fixing:
You've got a small typo in the naming of all residual columns - they should be named *_drmax instead of *_rmax.
This would conform with the way they are named in IMS and to there _dvmax counterparts.
Regards
Andreas

@tandreasr
Copy link
Author

tandreasr commented May 3, 2024

I just realized another behaviour, which should be corrected - at least in my opinion.
For all MPI ranks which do contain ONE MODEL ONLY you don't write those additional model-specific columns.
I know, that they are not strictly necessary, because in that specific case they are equal to the solution_* columns,
but in order to be able to automate access to those CSV files it would be much easier to write those columns anyway.
What do you think?

And yet another question concerning assignment of models to MPI ranks:
(you may open a new thread for that one, if you feel it necessary)
Currently the assignment of models to MPI ranks is done automatically.
Is there any possibility (or did you already plan one) to assign models to specific MPI ranks?
This for example would come in quite handy in case model number and number of MPI ranks are disjunct
and / or there are huge disbalances in convergence times between several models.
I even seem to remember, that in a very early prototype of parallel Modflow6 there was a possibility to do so in the simulation NAM file (rank number behind each model entry).
Regards,
Andreas

@mjr-deltares
Copy link
Contributor

Me again :-) your fix works as expected - Thank you very much! Just one small thing which still needs fixing: You've got a small typo in the naming of all residual columns - they should be named *_drmax instead of *_rmax. This would conform with the way they are named in IMS and to there _dvmax counterparts. Regards Andreas

Hi @tandreasr , this was actually changed deliberately: these columns show the maximum residual, it's not a delta with respect to the previous. So my guess is that if you rerun the serial version, you will have consistent naming again.

@mjr-deltares
Copy link
Contributor

About your question with respect to assigning models to rank: we have a PR coming up any day now that allows a HPC configuration file with that capability.

@mjr-deltares
Copy link
Contributor

"I just realized another behaviour, which should be corrected - at least in my opinion.
For all MPI ranks which do contain ONE MODEL ONLY you don't write those additional model-specific columns.
I know, that they are not strictly necessary, because in that specific case they are equal to the solution_* columns,
but in order to be able to automate access to those CSV files it would be much easier to write those columns anyway.
What do you think?"

I see what you mean, I will have a look into that.

@tandreasr
Copy link
Author

Thank you!

@tandreasr
Copy link
Author

Hi @mjr-deltares,
there still seems to be a bug somewhere.
Please have a look at the first line of this csv created from release version 6.5.0.

image

Here solution_inner_dvmax, solution_inner_dvmax_model and solution_inner_dvmax_node are all 0 which is obviously wrong.
Please don't hesitate to contact me in case you need more info!
Regards
Andreas

@mjr-deltares
Copy link
Contributor

Hi @tandreasr , thanks again for reporting and sorry for the delay. We had the MODFLOW and More conference the other week and I have been away from my desk for some time.

OK, I see. Would you mind pointing me to or sharing the model and steps to reproduce?

@tandreasr
Copy link
Author

Hi @mjr-deltares,
I was aware of the MODFLOW and More conference but unfortunately hadn't any time left this year to attend :-)
Here is the modell which produces the error:

_usgs Issue.zip

And here is the MPI call which reproduces the behaviour:

"E:\Modflow6\mf6.5.0_win64par\bin\mpiexec.exe" -np 8 "E:\Modflow6\mf6.5.0_win64par\bin\mf6.exe" -P

It occurs when called with just 8 MPI ranks inside both files zzz.inner.p4.csv and zzz.inner.p7.csv

If you have any further questions, please don't hesitate to contact me!

@mjr-deltares
Copy link
Contributor

mjr-deltares commented Jun 12, 2024

HI @tandreasr , what is going on here is that the dvmax equals zero for all cells in the processes 4 and 7 on that first iteration. There is no change in the solution for those domains so it is technically not possible to assign a specific cell and model to the change. I could maybe modify this and not write the zeros but leave a blank for those fields. Would that make more sense to you?

@tandreasr
Copy link
Author

Hi @mjr-deltares,

ok I see.
leaving those solution_inner_dvmax_* columns empty doesn't solve my problem.

In order to read the csv files of ranks which were assigned just one model
(and therefore do not contain any model specific columns for dvmax nor rmax,
as csv's for more than 1 model per MPI rank would contain),
I need to get the model number, the csv belongs to
(in order to be able to further process the csv information).
And for that I read both solution_inner_dvmax_model and solution_inner_rmax_model of the first line besides the header
and if they are pointing both to the same model,
I take it for granted, that this is such a case an it is the only model for this csv.

So what would work for me is if you leave all other solution_inner_dvmax_* columns 0 for that case,
but at least set the model number in the solution_inner_dvmax_model column correctly.
Could you do that?

Best regards
Andreas

@mjr-deltares
Copy link
Contributor

There are some complications for that case. And what if you would have those extra columns in parallel mode you asked about before? THe ones for the single model, duplicating what's in the solution convergence data part.

@mjr-deltares
Copy link
Contributor

Just curious, can you share what you are working on with parallel MODFLOW?

@tandreasr
Copy link
Author

The extra columns would be fine as well, but would add a lot of overhead regarding file size.
That's why I thought you wouldn't be so eager to implement them :-)

Regarding your second question:
in our company are a couple of engineers doing groundwater modelling and I'm responsible for the software development part,
mostly pre and postprocessing.
That's why I started evaluating MODFLOW6 in general 2-3 years ago and more specifically its parallel capabilities due to our need to speedup some of our models :-)

@mjr-deltares
Copy link
Contributor

Thanks for your explanation, still curious to learn what company that would be ;-)

About the CSV, the possibilities I see:

  1. I can put the model id in the table for the case dvmax == 0 anyway, but what if there are multiple models in the local solution that all have dvmax == 0, what id would we put in?
  2. I include the additional columns when run in parallel, accepting the increase is disk usage and cpu time
  3. Leave things as they are, and you can get the model id from the print statements in the mfsim.p*.lst file that goes with the particular process.

I don't like option 1, option 3 would be easiest for me :-) but then the CSVs are not really an interpretable set of files on their own, so that's why I am leaning towards option 2, at the cost of some redundancy.

@mjr-deltares
Copy link
Contributor

We have also discussed the interpretation of the CSV files for the parallel case in general. I guess we are considering a restructuring to make things more transparent to the user. Just so you are aware of this.

@tandreasr
Copy link
Author

The company is www.ihu-gmbh.com and it's a german engineering office of about 80 employees :-)

As for your suggestions:
My favorite would be 1.) - in case of multiple models you could just leave 0, because the model info then is already in the
(then additional) column names.
2.) would be okay as well - as it was my first suggestion.
3.) isn't a good idea because I'm using that info inside a Monitoring tool for the simulation progress
and accessing additional file during simulation time isn't a good idea for several reasons.

Anyway, just let me know, what your decision is :-)

Regards
Andreas

@mjr-deltares
Copy link
Contributor

Hi @tandreasr , I have implemented option 2. Hopefully this now works for you. Please open a new issue if there are more or other issues coming up with the parallel version.

Cheers, Martijn

@tandreasr
Copy link
Author

Hi Martijn,
Thank you very much!
Regards
Andreas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parallel Parallel capabilities
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants