Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

tandreasr · 2024-04-21T10:41:05Z

Hi again,
while testing the new parallel capabilites I realized that the CSV file(s) created by using the above mentioned option in parallel, lack(s) all columns related to single models ("MODELNAME"_inner_dvmax etc.)
Instead they only contain the convergence summary for all those models processed by the corresponding MPI rank.
Do you see any chance to implement that output the very same way as IMS does within a "foreseeable" future?
Meaning to add those columns to the CSV belonging to the MPI rank for all models processed by that rank)
I think that sometimes it's quite important to obtain an overview about which models have convergence issues
(not just the worst one listed in solution_inner_d?max_model).
And anyway keeping the CSV format identical between IMS and PetSC seems a logical choise?
Regards
Andreas

tandreasr · 2024-04-22T06:34:34Z

One more addition to this issue - actually this is a different topic connected to the CSV in parallel.
Obviously something is still wrong with the columns "solution_inner_dvmax_model" and "solution_inner_drmax_model".
In the example CSV below they are uninitialized in the first couple of rows and always 1 afterwards.
Instead they should contain the model number associated with the MPI rank.

mjr-deltares · 2024-04-23T05:45:09Z

Hi @tandreasr , thanks for reporting this back. Will look into this to have it fixed.

mjr-deltares · 2024-04-29T08:11:10Z

HI @tandreasr , I am looking at your issue and maybe you can provide a minimal example? I was a bit confused by:

Instead they only contain the convergence summary for all those models processed by the corresponding MPI rank.
...
Meaning to add those columns to the CSV belonging to the MPI rank for all models processed by that rank)

As far as your question about keeping output the same. We stayed clear so far from synchronizing over all processes to gather output in a single file where possible, hence the *.p0.csv format. Also, IMS has an additionally a solver parameter (alpha/omega) that we do not print when using the parallel PETSc solver. Other than that, we could definitely look into improving the parallel csv output. I did reproduce the 'uninitialized' issue you mentioned in your second comment. Will have a look into that.

mjr-deltares · 2024-04-29T08:38:07Z

I do think that I understand your point now, you are talking about these columns missing from the csv:

Will look into that too.

tandreasr · 2024-04-29T11:06:14Z

You are right: those are exactly the columns I'm missing, maybe I was a little unclear :-)
The missing alpha/omega columns aren't a problem for me.
I just need to get a model-wise convergence output in order to better identify problems.

mjr-deltares · 2024-05-01T15:03:10Z

Hi @tandreasr , this should be fixed now. Please reopen if not.

tandreasr · 2024-05-02T16:50:22Z

Me again :-)
your fix works as expected - Thank you very much!
Just one small thing which still needs fixing:
You've got a small typo in the naming of all residual columns - they should be named *_drmax instead of *_rmax.
This would conform with the way they are named in IMS and to there _dvmax counterparts.
Regards
Andreas

tandreasr · 2024-05-03T05:35:24Z

I just realized another behaviour, which should be corrected - at least in my opinion.
For all MPI ranks which do contain ONE MODEL ONLY you don't write those additional model-specific columns.
I know, that they are not strictly necessary, because in that specific case they are equal to the solution_* columns,
but in order to be able to automate access to those CSV files it would be much easier to write those columns anyway.
What do you think?

And yet another question concerning assignment of models to MPI ranks:
(you may open a new thread for that one, if you feel it necessary)
Currently the assignment of models to MPI ranks is done automatically.
Is there any possibility (or did you already plan one) to assign models to specific MPI ranks?
This for example would come in quite handy in case model number and number of MPI ranks are disjunct
and / or there are huge disbalances in convergence times between several models.
I even seem to remember, that in a very early prototype of parallel Modflow6 there was a possibility to do so in the simulation NAM file (rank number behind each model entry).
Regards,
Andreas

mjr-deltares · 2024-05-03T12:52:51Z

Me again :-) your fix works as expected - Thank you very much! Just one small thing which still needs fixing: You've got a small typo in the naming of all residual columns - they should be named *_drmax instead of *_rmax. This would conform with the way they are named in IMS and to there _dvmax counterparts. Regards Andreas

Hi @tandreasr , this was actually changed deliberately: these columns show the maximum residual, it's not a delta with respect to the previous. So my guess is that if you rerun the serial version, you will have consistent naming again.

mjr-deltares · 2024-05-03T12:54:28Z

About your question with respect to assigning models to rank: we have a PR coming up any day now that allows a HPC configuration file with that capability.

mjr-deltares · 2024-05-03T12:58:21Z

"I just realized another behaviour, which should be corrected - at least in my opinion.
For all MPI ranks which do contain ONE MODEL ONLY you don't write those additional model-specific columns.
I know, that they are not strictly necessary, because in that specific case they are equal to the solution_* columns,
but in order to be able to automate access to those CSV files it would be much easier to write those columns anyway.
What do you think?"

I see what you mean, I will have a look into that.

tandreasr · 2024-05-03T13:30:24Z

Thank you!

tandreasr · 2024-06-06T17:22:12Z

Hi @mjr-deltares,
there still seems to be a bug somewhere.
Please have a look at the first line of this csv created from release version 6.5.0.

Here solution_inner_dvmax, solution_inner_dvmax_model and solution_inner_dvmax_node are all 0 which is obviously wrong.
Please don't hesitate to contact me in case you need more info!
Regards
Andreas

mjr-deltares · 2024-06-11T07:09:53Z

Hi @tandreasr , thanks again for reporting and sorry for the delay. We had the MODFLOW and More conference the other week and I have been away from my desk for some time.

OK, I see. Would you mind pointing me to or sharing the model and steps to reproduce?

tandreasr · 2024-06-11T10:43:49Z

Hi @mjr-deltares,
I was aware of the MODFLOW and More conference but unfortunately hadn't any time left this year to attend :-)
Here is the modell which produces the error:

_usgs Issue.zip

And here is the MPI call which reproduces the behaviour:

"E:\Modflow6\mf6.5.0_win64par\bin\mpiexec.exe" -np 8 "E:\Modflow6\mf6.5.0_win64par\bin\mf6.exe" -P

It occurs when called with just 8 MPI ranks inside both files zzz.inner.p4.csv and zzz.inner.p7.csv

If you have any further questions, please don't hesitate to contact me!

mjr-deltares · 2024-06-12T08:48:37Z

HI @tandreasr , what is going on here is that the dvmax equals zero for all cells in the processes 4 and 7 on that first iteration. There is no change in the solution for those domains so it is technically not possible to assign a specific cell and model to the change. I could maybe modify this and not write the zeros but leave a blank for those fields. Would that make more sense to you?

tandreasr · 2024-06-12T09:33:33Z

Hi @mjr-deltares,

ok I see.
leaving those solution_inner_dvmax_* columns empty doesn't solve my problem.

In order to read the csv files of ranks which were assigned just one model
(and therefore do not contain any model specific columns for dvmax nor rmax,
as csv's for more than 1 model per MPI rank would contain),
I need to get the model number, the csv belongs to
(in order to be able to further process the csv information).
And for that I read both solution_inner_dvmax_model and solution_inner_rmax_model of the first line besides the header
and if they are pointing both to the same model,
I take it for granted, that this is such a case an it is the only model for this csv.

So what would work for me is if you leave all other solution_inner_dvmax_* columns 0 for that case,
but at least set the model number in the solution_inner_dvmax_model column correctly.
Could you do that?

Best regards
Andreas

mjr-deltares · 2024-06-12T12:22:44Z

There are some complications for that case. And what if you would have those extra columns in parallel mode you asked about before? THe ones for the single model, duplicating what's in the solution convergence data part.

mjr-deltares · 2024-06-12T12:26:43Z

Just curious, can you share what you are working on with parallel MODFLOW?

tandreasr · 2024-06-12T13:05:02Z

The extra columns would be fine as well, but would add a lot of overhead regarding file size.
That's why I thought you wouldn't be so eager to implement them :-)

Regarding your second question:
in our company are a couple of engineers doing groundwater modelling and I'm responsible for the software development part,
mostly pre and postprocessing.
That's why I started evaluating MODFLOW6 in general 2-3 years ago and more specifically its parallel capabilities due to our need to speedup some of our models :-)

mjr-deltares · 2024-06-12T14:01:02Z

Thanks for your explanation, still curious to learn what company that would be ;-)

About the CSV, the possibilities I see:

I can put the model id in the table for the case dvmax == 0 anyway, but what if there are multiple models in the local solution that all have dvmax == 0, what id would we put in?
I include the additional columns when run in parallel, accepting the increase is disk usage and cpu time
Leave things as they are, and you can get the model id from the print statements in the mfsim.p*.lst file that goes with the particular process.

I don't like option 1, option 3 would be easiest for me :-) but then the CSVs are not really an interpretable set of files on their own, so that's why I am leaning towards option 2, at the cost of some redundancy.

mjr-deltares · 2024-06-12T16:12:01Z

We have also discussed the interpretation of the CSV files for the parallel case in general. I guess we are considering a restructuring to make things more transparent to the user. Just so you are aware of this.

tandreasr · 2024-06-12T17:18:00Z

The company is www.ihu-gmbh.com and it's a german engineering office of about 80 employees :-)

As for your suggestions:
My favorite would be 1.) - in case of multiple models you could just leave 0, because the model info then is already in the
(then additional) column names.
2.) would be okay as well - as it was my first suggestion.
3.) isn't a good idea because I'm using that info inside a Monitoring tool for the simulation progress
and accessing additional file during simulation time isn't a good idea for several reasons.

Anyway, just let me know, what your decision is :-)

Regards
Andreas

mjr-deltares · 2024-06-13T09:57:57Z

Hi @tandreasr , I have implemented option 2. Hopefully this now works for you. Please open a new issue if there are more or other issues coming up with the parallel version.

Cheers, Martijn

tandreasr · 2024-06-13T10:04:50Z

Hi Martijn,
Thank you very much!
Regards
Andreas

tandreasr added the bug label Apr 21, 2024

mjr-deltares self-assigned this Apr 23, 2024

mjr-deltares added the parallel Parallel capabilities label Apr 23, 2024

mjr-deltares mentioned this issue Apr 30, 2024

fix(par): improve CSV report for convergence in parallel mode #1766

Merged

mjr-deltares closed this as completed in #1766 May 1, 2024

tandreasr mentioned this issue May 8, 2024

Convergence problems in parallel #1778

Closed

mjr-deltares mentioned this issue Jun 12, 2024

feat(par): add model columns to convergence csv for completeness in parallel case #1869

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

tandreasr commented Apr 21, 2024 •

edited

Loading

tandreasr commented Apr 22, 2024

mjr-deltares commented Apr 23, 2024

mjr-deltares commented Apr 29, 2024 •

edited

Loading

mjr-deltares commented Apr 29, 2024

tandreasr commented Apr 29, 2024

mjr-deltares commented May 1, 2024

tandreasr commented May 2, 2024 •

edited

Loading

tandreasr commented May 3, 2024 •

edited

Loading

mjr-deltares commented May 3, 2024

mjr-deltares commented May 3, 2024

mjr-deltares commented May 3, 2024

tandreasr commented May 3, 2024

tandreasr commented Jun 6, 2024

mjr-deltares commented Jun 11, 2024

tandreasr commented Jun 11, 2024

mjr-deltares commented Jun 12, 2024 •

edited

Loading

tandreasr commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

tandreasr commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

tandreasr commented Jun 12, 2024

mjr-deltares commented Jun 13, 2024

tandreasr commented Jun 13, 2024

Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

Missing CSV_INNER_OUTPUT columns for single models in parallel #1753

Comments

tandreasr commented Apr 21, 2024 • edited Loading

tandreasr commented Apr 22, 2024

mjr-deltares commented Apr 23, 2024

mjr-deltares commented Apr 29, 2024 • edited Loading

mjr-deltares commented Apr 29, 2024

tandreasr commented Apr 29, 2024

mjr-deltares commented May 1, 2024

tandreasr commented May 2, 2024 • edited Loading

tandreasr commented May 3, 2024 • edited Loading

mjr-deltares commented May 3, 2024

mjr-deltares commented May 3, 2024

mjr-deltares commented May 3, 2024

tandreasr commented May 3, 2024

tandreasr commented Jun 6, 2024

mjr-deltares commented Jun 11, 2024

tandreasr commented Jun 11, 2024

mjr-deltares commented Jun 12, 2024 • edited Loading

tandreasr commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

tandreasr commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

mjr-deltares commented Jun 12, 2024

tandreasr commented Jun 12, 2024

mjr-deltares commented Jun 13, 2024

tandreasr commented Jun 13, 2024

tandreasr commented Apr 21, 2024 •

edited

Loading

mjr-deltares commented Apr 29, 2024 •

edited

Loading

tandreasr commented May 2, 2024 •

edited

Loading

tandreasr commented May 3, 2024 •

edited

Loading

mjr-deltares commented Jun 12, 2024 •

edited

Loading