Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update defaults for distributed multigrid #1612

Merged
merged 7 commits into from
May 28, 2024
Merged

Update defaults for distributed multigrid #1612

merged 7 commits into from
May 28, 2024

Conversation

pratikvn
Copy link
Member

This PR updates the defaults for distributed multigrid for the smoother and the coarse solver. It also adds a test for the distributed pgm preconditioned solver

@pratikvn pratikvn added the 1:ST:WIP This PR is a work in progress. Not ready for review. label May 16, 2024
@pratikvn pratikvn added this to the Ginkgo 1.8.0 milestone May 16, 2024
@pratikvn pratikvn self-assigned this May 16, 2024
@ginkgo-bot ginkgo-bot added mod:core This is related to the core module. type:solver This is related to the solvers labels May 16, 2024
@MarcelKoch MarcelKoch self-requested a review May 17, 2024 09:11
@upsj upsj self-requested a review May 17, 2024 09:11
Copy link
Member

@MarcelKoch MarcelKoch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the failing CI there still seems to be something missing. The existing changes look good though.

#if GINKGO_BUILD_MPI
if (gko::detail::is_distributed(matrix.get())) {
using absolute_value_type = remove_complex<value_type>;
return solver::Gmres<value_type>::build()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite similar/identical to the case for the dpcpp executor, maybe it could be extracted.

@MarcelKoch
Copy link
Member

@pratikvn I will fix the CI issues to push this for the release.

@MarcelKoch MarcelKoch force-pushed the dist-mg-fix branch 2 times, most recently from 15394ed to 1fefb41 Compare May 22, 2024 13:07
@MarcelKoch MarcelKoch self-assigned this May 22, 2024
@MarcelKoch MarcelKoch added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels May 22, 2024
Comment on lines -183 to +206
MultigridState() : nrhs{0} {}
MultigridState() : nrhs{static_cast<size_type>(-1)} {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be zero?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because then it will not initialize the cache, if the rhs has 0 columns.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there be a downside to allow initializing the cache to a nx0 vector?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the distributed solver test uses cases with zero columns and the MG preconditioned CG passes those, so it seems to work as expected.

@MarcelKoch MarcelKoch requested review from MarcelKoch and removed request for MarcelKoch May 23, 2024 06:53
Comment on lines -183 to +206
MultigridState() : nrhs{0} {}
MultigridState() : nrhs{static_cast<size_type>(-1)} {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would there be a downside to allow initializing the cache to a nx0 vector?

@MarcelKoch MarcelKoch added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels May 23, 2024
@MarcelKoch
Copy link
Member

One thing to note is that the added tests are EXTREMELY slow. On nla-gpu the preconditioned test are 100x slower than the unpreconditioned tests. But I don't think this PR should fix performance issues, so I will just leave it as it is.

@upsj
Copy link
Member

upsj commented May 23, 2024

slow as in long runtime or as in slow convergence (or both)? In any case, I think we should reconsider whether to merge this PR in this state then

@MarcelKoch
Copy link
Member

slow as in long runtime or as in slow convergence (or both)? In any case, I think we should reconsider whether to merge this PR in this state then

definitely runtime, but I think the number of iterations is also quite bad.

IMO this PR should still continue, since otherwise the user will get segfaults and the like. Even sequentially our mg is not very good, so it wouldn't be much of a difference to that.

@yhmtsai
Copy link
Member

yhmtsai commented May 23, 2024

The slow convergence is expected in the 3pt stencil because the PGM does not handle the equal-weight matrix well currently.
The corasening matrix might almost like the original matrix (n -> n - 10 for example)

@upsj
Copy link
Member

upsj commented May 23, 2024

Alright, then I withdraw my objection ;) that's an issue we should definitely tackle soon though. I think with some deterministic or randomized tie-breaking we should be able to handle that, create at least a 25%-33% reduction per level (haven't thought the worst case through entirely)

@pratikvn
Copy link
Member Author

Given that we are testing that the solver + preconditioner work, I think we should just set the factory parameter with_deterministic(false) for the PGM factory. That should speedup the convergence.

@MarcelKoch
Copy link
Member

Given that we are testing that the solver + preconditioner work, I think we should just set the factory parameter with_deterministic(false) for the PGM factory. That should speedup the convergence.

Can't confirm that this has any effect (on nla-gpu).

@MarcelKoch MarcelKoch force-pushed the dist-mg-fix branch 2 times, most recently from 45a5e93 to 4b1dcfa Compare May 27, 2024 09:36
Copy link

sonarcloud bot commented May 28, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
B Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Copy link

codecov bot commented May 28, 2024

Codecov Report

Attention: Patch coverage is 65.11628% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 89.97%. Comparing base (5b18510) to head (1fc3d7d).
Report is 52 commits behind head on develop.

Files Patch % Lines
test/matrix/csr_kernels2.cpp 0.00% 11 Missing ⚠️
core/solver/multigrid.cpp 50.00% 3 Missing ⚠️
test/mpi/solver/solver.cpp 95.83% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1612      +/-   ##
===========================================
- Coverage    90.01%   89.97%   -0.04%     
===========================================
  Files          736      752      +16     
  Lines        59831    60445     +614     
===========================================
+ Hits         53857    54387     +530     
- Misses        5974     6058      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcelKoch MarcelKoch merged commit 6abd3f3 into develop May 28, 2024
16 of 18 checks passed
@MarcelKoch MarcelKoch deleted the dist-mg-fix branch May 28, 2024 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:ready-to-merge This PR is ready to merge. mod:core This is related to the core module. type:solver This is related to the solvers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants