Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MFEM example] Wrong result when change preconditioner in example MFEM using Ginkgo #633

Closed
nghiakvnvsd opened this issue Sep 1, 2020 · 19 comments · Fixed by #659
Closed
Assignees
Labels
is:bug Something looks wrong. is:confirmed Someone confirmed this issue. mod:cuda This is related to the CUDA module. reg:external-lib-interfacing This is related to wrappers/interfaces of Ginkgo with external libraries type:preconditioner This is related to the preconditioners

Comments

@nghiakvnvsd
Copy link

nghiakvnvsd commented Sep 1, 2020

Hi!

When I run ex1 (https://github.com/mfem/mfem/blob/master/examples/ginkgo/ex1.cpp) by reference excutor and cuda excutor, it's ok. But when I change preconditioner (Jacobi preconditioner), I ran by reference is ok. But with cuda excutor the result is wrong, it does not converge.

I think it's a error. You can try it and see.

Thank you!

@nghiakvnvsd
Copy link
Author

nghiakvnvsd commented Sep 1, 2020

It also occurs, when I try to modify the example 2 (https://github.com/mfem/mfem/blob/master/examples/ex2.cpp) using Ginkgo solver. It occurs with all preconditioners using Cuda excutor. I try to use reference excutor and it's right, but with Cuda excutor it has the wrong result.

@pratikvn
Copy link
Member

pratikvn commented Sep 1, 2020

The examples with MFEM need to be modified to use the cuda executor. Unfortunately, they do not work out of the box. We are working on integrating it completely. You can have a look at @nbeams PR on the MFEM side with this branch.

But, you should still be able to use the cuda executor from Ginkgo's side, by the following modifications:

diff --git a/examples/ginkgo/ex1.cpp b/examples/ginkgo/ex1.cpp
index 2d9f565..ca8d9cb 100644
--- a/examples/ginkgo/ex1.cpp
+++ b/examples/ginkgo/ex1.cpp
@@ -195,8 +195,8 @@ int main(int argc, char *argv[])
       {
 #ifdef MFEM_USE_GINKGO
          // Solve the linear system with CG + ILU from Ginkgo.
-         std::string executor = "reference";
-         auto exec = gko::ReferenceExecutor::create();
+         std::string executor = "cuda";
+         auto exec = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
          auto ilu_precond =
             gko::preconditioner::Ilu<gko::solver::LowerTrs<>,
             gko::solver::UpperTrs<>, false>::build()

But be aware that this copies the data from the host to the gpu, performs the solve and copies back the results to the cpu and hence it can be slow. For the version without the copies, you can have a look at the PR from @nbeams.

@nghiakvnvsd
Copy link
Author

The examples with MFEM need to be modified to use the cuda executor. Unfortunately, they do not work out of the box. We are working on integrating it completely. You can have a look at @nbeams PR on the MFEM side with this branch.

But, you should still be able to use the cuda executor from Ginkgo's side, by the following modifications:

diff --git a/examples/ginkgo/ex1.cpp b/examples/ginkgo/ex1.cpp
index 2d9f565..ca8d9cb 100644
--- a/examples/ginkgo/ex1.cpp
+++ b/examples/ginkgo/ex1.cpp
@@ -195,8 +195,8 @@ int main(int argc, char *argv[])
       {
 #ifdef MFEM_USE_GINKGO
          // Solve the linear system with CG + ILU from Ginkgo.
-         std::string executor = "reference";
-         auto exec = gko::ReferenceExecutor::create();
+         std::string executor = "cuda";
+         auto exec = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
          auto ilu_precond =
             gko::preconditioner::Ilu<gko::solver::LowerTrs<>,
             gko::solver::UpperTrs<>, false>::build()

But be aware that this copies the data from the host to the gpu, performs the solve and copies back the results to the cpu and hence it can be slow. For the version without the copies, you can have a look at the PR from @nbeams.

Thank you for your replying. I mean I made the same modification as above, but the result is wrong. You can try to modify Jacobi preconditioner and build: ./ex1 -o 3 insteaf of ILU preconditioner and ./ex1

@pratikvn
Copy link
Member

pratikvn commented Sep 1, 2020

I am sorry but I cannot reproduce it. This seems to work for me:

diff --git a/examples/ginkgo/ex1.cpp b/examples/ginkgo/ex1.cpp
index 2d9f565..5539530 100644
--- a/examples/ginkgo/ex1.cpp
+++ b/examples/ginkgo/ex1.cpp
@@ -195,14 +195,16 @@ int main(int argc, char *argv[])
       {
 #ifdef MFEM_USE_GINKGO
          // Solve the linear system with CG + ILU from Ginkgo.
-         std::string executor = "reference";
-         auto exec = gko::ReferenceExecutor::create();
-         auto ilu_precond =
-            gko::preconditioner::Ilu<gko::solver::LowerTrs<>,
-            gko::solver::UpperTrs<>, false>::build()
-            .on(exec);
-         GinkgoWrappers::CGSolver ginkgo_solver(executor, 1, 2000, 1e-12, 0.0,
-                                                ilu_precond.release() );
+         using bj = gko::preconditioner::Jacobi<double, int>;
+         std::string executor = "cuda";
+         auto exec = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
+	   auto bj_precond = bj::build()
+                                     .with_max_block_size(16u)
+                                     .with_storage_optimization(
+                                         gko::precision_reduction::autodetect())
+                                     .on(exec); 
+	  GinkgoWrappers::CGSolver ginkgo_solver(executor, 1, 2000, 1e-12, 0.0,
+                                                bj_precond.release() );
          ginkgo_solver.solve(&((SparseMatrix&)(*A)), X, B);
 #endif
       }

@pratikvn
Copy link
Member

pratikvn commented Sep 1, 2020

I see that you mean for --order 3 . For that it seems that the solution does stagnate for cuda , but not for reference. I would suggest using ILU for now as that seems to converge.

@nghiakvnvsd
Copy link
Author

I see that you mean for --order 3 . For that it seems that the solution does stagnate for cuda , but not for reference. I would suggest using ILU for now as that seems to converge.

Yes, thank you for your support.

@pratikvn pratikvn self-assigned this Sep 1, 2020
@pratikvn pratikvn added is:bug Something looks wrong. is:confirmed Someone confirmed this issue. mod:cuda This is related to the CUDA module. reg:external-lib-interfacing This is related to wrappers/interfaces of Ginkgo with external libraries type:preconditioner This is related to the preconditioners labels Sep 1, 2020
@nghiakvnvsd
Copy link
Author

nghiakvnvsd commented Sep 2, 2020

Hi @pratikvn. I'm sorry to bother you. I try to follow the change of PR you mentioned but it doesn't work when I build MFEM. I use ginkgo-1.2.0. It returns:

linalg/../fem/../mesh/../fem/../linalg/ginkgo.hpp: In constructor ‘mfem::GinkgoWrappers::GinkgoPreconditionerBase::GinkgoPreconditionerBase(std::shared_ptr, mfem::SparseMatrix&, mfem::Array&, bool)’:
linalg/../fem/../mesh/../fem/../linalg/ginkgo.hpp:675:111: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_permute_ = gko::matrix::Permutation::create(
^
linalg/../fem/../mesh/../fem/../linalg/ginkgo.hpp:679:115: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_inv_permute_ = gko::matrix::Permutation::create(
^
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/matrix.cpp -o linalg/matrix.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/hypre.cpp -o linalg/hypre.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/operator.cpp -o linalg/operator.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/blockmatrix.cpp -o linalg/blockmatrix.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/sparsemat.cpp -o linalg/sparsemat.o
linalg/ginkgo.hpp: In constructor ‘mfem::GinkgoWrappers::GinkgoPreconditionerBase::GinkgoPreconditionerBase(std::shared_ptr, mfem::SparseMatrix&, mfem::Array&, bool)’:
linalg/ginkgo.hpp:675:111: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_permute_ = gko::matrix::Permutation::create(
^
linalg/ginkgo.hpp:679:115: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_inv_permute_ = gko::matrix::Permutation::create(
^
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/handle.cpp -o linalg/handle.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/blockoperator.cpp -o linalg/blockoperator.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/complex_operator.cpp -o linalg/complex_operator.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/blockvector.cpp -o linalg/blockvector.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/petsc.cpp -o linalg/petsc.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/sparsesmoothers.cpp -o linalg/sparsesmoothers.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/ginkgo.cpp -o linalg/ginkgo.o
At end of source: error: expected a "}"

1 error detected in the compilation of "/tmp/tmpxft_00002ad4_00000000-6_ginkgo.cpp1.ii".
makefile:422: recipe for target 'linalg/ginkgo.o' failed
make: *** [linalg/ginkgo.o] Error 1

@nghiakvnvsd
Copy link
Author

nghiakvnvsd commented Sep 2, 2020

Also it doesn't work when I build with ginkgo-1.3.0.

@pratikvn
Copy link
Member

pratikvn commented Sep 2, 2020

@nbeams, maybe you can help out here ?

@nbeams
Copy link
Collaborator

nbeams commented Sep 2, 2020

Hi @nghiakvnvsd,

Let me make sure I understand exactly what you are testing here.

The first error you reported was with the current master branches of MFEM and Ginkgo, correct? And you saw a problem with the block Jacobi preconditioner for ex1 with CUDA. Did you also try the OpenMP executor? Also, what happens if you set the max_block_size to 1, i.e., making it scalar Jacobi instead of block Jacobi? This should work.

For the issue with ex2, I think I might be of more use if I could see exactly what modifications you made to the existing code. Do you have the code available somewhere?

Finally, the build issue with my branch of MFEM. Did you completely check out the branch, or add its commits on top of the current MFEM master? Also, I should say, I have been using that branch with my own branch of Ginkgo, as well (https://github.com/ginkgo-project/ginkgo/tree/mfem-wrappers). I have not seen that particular build issue before, but it wouldn't necessarily surprise me that there are build issues when not using my branch of Ginkgo. There have been changes to both Ginkgo and MFEM since I last did updates on my branches.

So, I wouldn't actually recommend using my branch of MFEM as it currently stands. I think what @pratikvn meant was to look at the code in the PR for accessing an MFEM SparseMatrix on host or device, without copying, to wrap it as a Ginkgo matrix. E.g., with std::shared_ptr<const gko::Executor> exec and SparseMatrix a

   
      bool on_device = false;
      if (exec->get_master() != exec)
      {
         on_device = true;
      }

      using mtx = gko::matrix::Csr<double, int>;
      const int nnz =  a.GetMemoryData().Capacity();
      auto gko_sparse = mtx::create(
                           exec, gko::dim<2>(a.Height(), a.Width()),
                           gko::Array<double>::view(exec,
                                                    nnz,
                                                    a.ReadWriteData(on_device)),
                           gko::Array<int>::view(exec,
                                                 nnz,
                                                 a.ReadWriteJ(on_device)),
                           gko::Array<int>::view(exec, a.Height() + 1,
                                                 a.ReadWriteI(on_device)));

@nghiakvnvsd
Copy link
Author

nghiakvnvsd commented Sep 3, 2020

Hi @nbeams
The first error I reported was with the download of website mfem.org and ginkgo-1.2.0.

The modified ex2 I mentioned, it was just combined between ex1 (ginkgo) and ex2 (serial).

I have tried to use the mfem-wrappers and tried to merge ginkgo-precond and master mfem but it didn't work when I tried to run the example 1 (ginkgo) it reports:

Options used:
   --mesh ../../data/star.mesh
   --order 1
   --no-static-condensation
   --no-partial-assembly
   --device cpu
   --visualization
   --use_gko_solver
Device configuration: cpu
Memory configuration: host-std
Number of finite element unknowns: 20801
Size of linear system: 20801
Segmentation fault (core dumped)

@nbeams
Copy link
Collaborator

nbeams commented Sep 3, 2020

Sorry, I should have changed "I wouldn't actually recommend using my branch of MFEM as it currently stands" to "do not use my branch of MFEM as it currently stands." I forgot I had commented out some lines related to building Ginkgo solvers. They were causing conflicts with an update in Ginkgo that was related to fixing a CUDA compiler issue. I wasn't using the Ginkgo solvers in the tests in my branch, just the preconditioners, so it didn't matter to me at the time.

(But thanks for the reminder that I need to check again with the latest version of Ginkgo, and fix the solver integration in MFEM if needed.)

It looks like ex2 isn't configured to work with CUDA on the MFEM side. You would also have to add the Device configuration steps. I take it back. I think it should currently build the matrix on the CPU and copy to GPU, so it would actually expect MFEM to be using the CPU, anyway.

@nghiakvnvsd
Copy link
Author

Sorry, I should have changed "I wouldn't actually recommend using my branch of MFEM as it currently stands" to "do not use my branch of MFEM as it currently stands." I forgot I had commented out some lines related to building Ginkgo solvers. They were causing conflicts with an update in Ginkgo that was related to fixing a CUDA compiler issue. I wasn't using the Ginkgo solvers in the tests in my branch, just the preconditioners, so it didn't matter to me at the time.

(But thanks for the reminder that I need to check again with the latest version of Ginkgo, and fix the solver integration in MFEM if needed.)

It looks like ex2 isn't configured to work with CUDA on the MFEM side. You would also have to add the Device configuration steps. I take it back. I think it should currently build the matrix on the CPU and copy to GPU, so it would actually expect MFEM to be using the CPU, anyway.

I think I will wait for the next version MFEM 4.2. Thank you very much for your support.

@ginkgo-project ginkgo-project deleted a comment from ginkgo-bot Sep 3, 2020
@tcojean
Copy link
Member

tcojean commented Sep 3, 2020

Could you post the code of the modified ex2 to use Ginkgo together with it so that we can try to look into it? Thanks!

@nghiakvnvsd
Copy link
Author

Could you post the code of the modified ex2 to use Ginkgo together with it so that we can try to look into it? Thanks!

Yes. Thank you!!!

https://drive.google.com/drive/folders/1RNyy8UIcndXxzseSU4pO1mpcg_O83o3F?usp=sharing

@nghiakvnvsd
Copy link
Author

As you see, The ex2.cpp and ex2-cuda-executor.cpp just are different between "cuda" and "reference" executor. But I don't understand why ex2 converge while ex2-cuda-executor doesn't converge

@nbeams
Copy link
Collaborator

nbeams commented Sep 4, 2020

@nghiakvnvsd, I have built your examples ex2.cpp and ex2-cuda-executor.cpp locally with the latest MFEM master branch and Ginkgo 1.3. I will investigate the issue.

@nbeams
Copy link
Collaborator

nbeams commented Sep 14, 2020

@nghiakvnvsd: Unfortunately, I do not have an answer for you yet, but I haven't forgotten and I'm still looking into it.

@thoasm thoasm self-assigned this Oct 2, 2020
@thoasm
Copy link
Member

thoasm commented Oct 8, 2020

@nghiakvnvsd Update:
The problem for this specific matrix is the underlying generation of the ILU preconditioner. By default, we use the ParILU factorization to generate the incomplete approximates L and U. L and U have (combined) the same sparsity pattern as the original matrix (sparsity ILU(0)), however, we only approximate L and U (it is not computed directly by default).
The reference implementation for the ParILU actually does compute the ILU(0) and not the approximation, which is why it works fine (and OpenMP produces a close enough result).

Here, for CUDA, I would actually recommend to use the proper ILU(0) factorication method, which you can do with:

    factorization = share(
        gko::factorization::Ilu<value_type, index_type>::build().on(exec));
    auto solver_gen =
        gko::solver::Cg<value_type>::build()
            .with_criteria(
                gko::stop::Iteration::build().with_max_iters(2000u).on(exec),
                gko::stop::ResidualNormReduction<value_type>::build()
                    .with_reduction_factor(reduction_factor)
                    .on(exec))
            .with_preconditioner(precond_fact)
            .on(exec);

For OpenMP and Reference, we did not implement this factorization method yet, but ParILU does work fine there.

I am now looking into the Jacobi preconditioner and why it sometimes prevents conversion.
I will post an update on that here as soon as I have something notable.

thoasm pushed a commit that referenced this issue Nov 17, 2020
This PR adds the option to sort the matrix before processing it inside
the Jacobi preconditioner and the ILU factorization.
By default, it is assumed that the input matrix is not sorted,
therefore, a sorting step is performed (unless the factory parameter
`skip_sorting` is set to `true`).

Additionally, this PR adds functionality to unsort a given matrix for
tests (currently only CSR and COO are supported). This functionality
is then used in added tests that unsort the system matrix before
generating / running for CSR, COO, Jacobi preconditioner and ILU
factorization.


Related PR: #659
Closes Issue: #633
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:bug Something looks wrong. is:confirmed Someone confirmed this issue. mod:cuda This is related to the CUDA module. reg:external-lib-interfacing This is related to wrappers/interfaces of Ginkgo with external libraries type:preconditioner This is related to the preconditioners
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants