Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in parallel (floating divide by zero) #1734

Closed
tandreasr opened this issue Apr 16, 2024 · 3 comments
Closed

Crash in parallel (floating divide by zero) #1734

tandreasr opened this issue Apr 16, 2024 · 3 comments
Assignees
Labels
bug parallel Parallel capabilities

Comments

@tandreasr
Copy link

Hi again,

another bug in parallel execution related to the test case attached

Every parallel execution of:

"...\mf6.5.0.dev2_win64par\bin\mpiexec.exe" -np 2 "...\mf6.5.0.dev2_win64par\bin\mf6.exe" --PARALLEL mfsim.nam

results in "floating divide by zero" in stress period 48.
Even with -np 1 it crashes.
Only the serial call runs flawlessly:

"...\mf6.5.0.dev2_win64par\bin\mf6.exe" mfsim.nam

Tested with latest nightly build on Windows.
_Parallel Issue.zip

@tandreasr tandreasr added the bug label Apr 16, 2024
@jdhughes-usgs
Copy link
Contributor

@tandreasr IMS includes protections for over solving that is currently not available with PETSC. We are currently working on implementing similar protections with PETSc. A final PR will likely be merged in the next few days.

@mjr-deltares mjr-deltares self-assigned this Apr 17, 2024
@mjr-deltares
Copy link
Contributor

The PR #1688 went in today. I have tested the new build on your model and it did no longer cause a 'divide by zero' floating point exception. However, there appear to be convergence issues (with IMS as well as PETSc) that will need some attention.
A possible strategy from here would be: try and have the model running and converging properly with IMS, check the results, then just add the flag '-p' to activate the PETSc solver while still running on a single process and see if the results are reproduced. Then finally scale up to multiple cores using mpiexec.

@mjr-deltares mjr-deltares added the parallel Parallel capabilities label Apr 18, 2024
@tandreasr
Copy link
Author

Hi Martijn,
thank you very much - as for the convergence problems: it was just meant as a test case for the parallel workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug parallel Parallel capabilities
Projects
None yet
Development

No branches or pull requests

3 participants