[MFEM example] Wrong result when change preconditioner in example MFEM using Ginkgo #633

nghiakvnvsd · 2020-09-01T07:28:30Z

Hi!

When I run ex1 (https://github.com/mfem/mfem/blob/master/examples/ginkgo/ex1.cpp) by reference excutor and cuda excutor, it's ok. But when I change preconditioner (Jacobi preconditioner), I ran by reference is ok. But with cuda excutor the result is wrong, it does not converge.

I think it's a error. You can try it and see.

Thank you!

nghiakvnvsd · 2020-09-01T07:48:36Z

It also occurs, when I try to modify the example 2 (https://github.com/mfem/mfem/blob/master/examples/ex2.cpp) using Ginkgo solver. It occurs with all preconditioners using Cuda excutor. I try to use reference excutor and it's right, but with Cuda excutor it has the wrong result.

pratikvn · 2020-09-01T10:02:46Z

The examples with MFEM need to be modified to use the cuda executor. Unfortunately, they do not work out of the box. We are working on integrating it completely. You can have a look at @nbeams PR on the MFEM side with this branch.

But, you should still be able to use the cuda executor from Ginkgo's side, by the following modifications:

diff --git a/examples/ginkgo/ex1.cpp b/examples/ginkgo/ex1.cpp
index 2d9f565..ca8d9cb 100644
--- a/examples/ginkgo/ex1.cpp
+++ b/examples/ginkgo/ex1.cpp
@@ -195,8 +195,8 @@ int main(int argc, char *argv[])
       {
 #ifdef MFEM_USE_GINKGO
          // Solve the linear system with CG + ILU from Ginkgo.
-         std::string executor = "reference";
-         auto exec = gko::ReferenceExecutor::create();
+         std::string executor = "cuda";
+         auto exec = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
          auto ilu_precond =
             gko::preconditioner::Ilu<gko::solver::LowerTrs<>,
             gko::solver::UpperTrs<>, false>::build()

But be aware that this copies the data from the host to the gpu, performs the solve and copies back the results to the cpu and hence it can be slow. For the version without the copies, you can have a look at the PR from @nbeams.

nghiakvnvsd · 2020-09-01T11:29:38Z

The examples with MFEM need to be modified to use the cuda executor. Unfortunately, they do not work out of the box. We are working on integrating it completely. You can have a look at @nbeams PR on the MFEM side with this branch.

But, you should still be able to use the cuda executor from Ginkgo's side, by the following modifications:
diff --git a/examples/ginkgo/ex1.cpp b/examples/ginkgo/ex1.cpp
index 2d9f565..ca8d9cb 100644
--- a/examples/ginkgo/ex1.cpp
+++ b/examples/ginkgo/ex1.cpp
@@ -195,8 +195,8 @@ int main(int argc, char *argv[])
       {
 #ifdef MFEM_USE_GINKGO
          // Solve the linear system with CG + ILU from Ginkgo.
-         std::string executor = "reference";
-         auto exec = gko::ReferenceExecutor::create();
+         std::string executor = "cuda";
+         auto exec = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
          auto ilu_precond =
             gko::preconditioner::Ilu<gko::solver::LowerTrs<>,
             gko::solver::UpperTrs<>, false>::build()
But be aware that this copies the data from the host to the gpu, performs the solve and copies back the results to the cpu and hence it can be slow. For the version without the copies, you can have a look at the PR from @nbeams.

Thank you for your replying. I mean I made the same modification as above, but the result is wrong. You can try to modify Jacobi preconditioner and build: ./ex1 -o 3 insteaf of ILU preconditioner and ./ex1

pratikvn · 2020-09-01T12:14:35Z

I am sorry but I cannot reproduce it. This seems to work for me:

diff --git a/examples/ginkgo/ex1.cpp b/examples/ginkgo/ex1.cpp
index 2d9f565..5539530 100644
--- a/examples/ginkgo/ex1.cpp
+++ b/examples/ginkgo/ex1.cpp
@@ -195,14 +195,16 @@ int main(int argc, char *argv[])
       {
 #ifdef MFEM_USE_GINKGO
          // Solve the linear system with CG + ILU from Ginkgo.
-         std::string executor = "reference";
-         auto exec = gko::ReferenceExecutor::create();
-         auto ilu_precond =
-            gko::preconditioner::Ilu<gko::solver::LowerTrs<>,
-            gko::solver::UpperTrs<>, false>::build()
-            .on(exec);
-         GinkgoWrappers::CGSolver ginkgo_solver(executor, 1, 2000, 1e-12, 0.0,
-                                                ilu_precond.release() );
+         using bj = gko::preconditioner::Jacobi<double, int>;
+         std::string executor = "cuda";
+         auto exec = gko::CudaExecutor::create(0, gko::OmpExecutor::create());
+	   auto bj_precond = bj::build()
+                                     .with_max_block_size(16u)
+                                     .with_storage_optimization(
+                                         gko::precision_reduction::autodetect())
+                                     .on(exec); 
+	  GinkgoWrappers::CGSolver ginkgo_solver(executor, 1, 2000, 1e-12, 0.0,
+                                                bj_precond.release() );
          ginkgo_solver.solve(&((SparseMatrix&)(*A)), X, B);
 #endif
       }

pratikvn · 2020-09-01T12:39:10Z

I see that you mean for --order 3 . For that it seems that the solution does stagnate for cuda , but not for reference. I would suggest using ILU for now as that seems to converge.

nghiakvnvsd · 2020-09-01T12:41:40Z

I see that you mean for --order 3 . For that it seems that the solution does stagnate for cuda , but not for reference. I would suggest using ILU for now as that seems to converge.

Yes, thank you for your support.

nghiakvnvsd · 2020-09-02T07:41:21Z

Hi @pratikvn. I'm sorry to bother you. I try to follow the change of PR you mentioned but it doesn't work when I build MFEM. I use ginkgo-1.2.0. It returns:

linalg/../fem/../mesh/../fem/../linalg/ginkgo.hpp: In constructor ‘mfem::GinkgoWrappers::GinkgoPreconditionerBase::GinkgoPreconditionerBase(std::shared_ptr, mfem::SparseMatrix&, mfem::Array&, bool)’:
linalg/../fem/../mesh/../fem/../linalg/ginkgo.hpp:675:111: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_permute_ = gko::matrix::Permutation::create(
^
linalg/../fem/../mesh/../fem/../linalg/ginkgo.hpp:679:115: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_inv_permute_ = gko::matrix::Permutation::create(
^
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/matrix.cpp -o linalg/matrix.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/hypre.cpp -o linalg/hypre.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/operator.cpp -o linalg/operator.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/blockmatrix.cpp -o linalg/blockmatrix.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/sparsemat.cpp -o linalg/sparsemat.o
linalg/ginkgo.hpp: In constructor ‘mfem::GinkgoWrappers::GinkgoPreconditionerBase::GinkgoPreconditionerBase(std::shared_ptr, mfem::SparseMatrix&, mfem::Array&, bool)’:
linalg/ginkgo.hpp:675:111: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_permute_ = gko::matrix::Permutation::create(
^
linalg/ginkgo.hpp:679:115: warning: narrowing conversion of ‘(& inv_permutation_indices)->mfem::Array::Size()’ from ‘int’ to ‘gko::dim<2>::dimension_type {aka long unsigned int}’ inside { } [-Wnarrowing]
vec_inv_permute_ = gko::matrix::Permutation::create(
^
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/handle.cpp -o linalg/handle.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/blockoperator.cpp -o linalg/blockoperator.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/complex_operator.cpp -o linalg/complex_operator.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/blockvector.cpp -o linalg/blockvector.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/petsc.cpp -o linalg/petsc.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/sparsesmoothers.cpp -o linalg/sparsesmoothers.o
nvcc -O3 -std=c++11 -x=cu --expt-extended-lambda -arch=sm_60 -ccbin g++ -isystem ./../ginkgo/install/include -c linalg/ginkgo.cpp -o linalg/ginkgo.o
At end of source: error: expected a "}"

1 error detected in the compilation of "/tmp/tmpxft_00002ad4_00000000-6_ginkgo.cpp1.ii".
makefile:422: recipe for target 'linalg/ginkgo.o' failed
make: *** [linalg/ginkgo.o] Error 1

nghiakvnvsd · 2020-09-02T07:42:17Z

Also it doesn't work when I build with ginkgo-1.3.0.

pratikvn · 2020-09-02T12:16:56Z

@nbeams, maybe you can help out here ?

nbeams · 2020-09-02T15:00:42Z

Hi @nghiakvnvsd,

Let me make sure I understand exactly what you are testing here.

The first error you reported was with the current master branches of MFEM and Ginkgo, correct? And you saw a problem with the block Jacobi preconditioner for ex1 with CUDA. Did you also try the OpenMP executor? Also, what happens if you set the max_block_size to 1, i.e., making it scalar Jacobi instead of block Jacobi? This should work.

For the issue with ex2, I think I might be of more use if I could see exactly what modifications you made to the existing code. Do you have the code available somewhere?

Finally, the build issue with my branch of MFEM. Did you completely check out the branch, or add its commits on top of the current MFEM master? Also, I should say, I have been using that branch with my own branch of Ginkgo, as well (https://github.com/ginkgo-project/ginkgo/tree/mfem-wrappers). I have not seen that particular build issue before, but it wouldn't necessarily surprise me that there are build issues when not using my branch of Ginkgo. There have been changes to both Ginkgo and MFEM since I last did updates on my branches.

So, I wouldn't actually recommend using my branch of MFEM as it currently stands. I think what @pratikvn meant was to look at the code in the PR for accessing an MFEM SparseMatrix on host or device, without copying, to wrap it as a Ginkgo matrix. E.g., with std::shared_ptr<const gko::Executor> exec and SparseMatrix a

   
      bool on_device = false;
      if (exec->get_master() != exec)
      {
         on_device = true;
      }

      using mtx = gko::matrix::Csr<double, int>;
      const int nnz =  a.GetMemoryData().Capacity();
      auto gko_sparse = mtx::create(
                           exec, gko::dim<2>(a.Height(), a.Width()),
                           gko::Array<double>::view(exec,
                                                    nnz,
                                                    a.ReadWriteData(on_device)),
                           gko::Array<int>::view(exec,
                                                 nnz,
                                                 a.ReadWriteJ(on_device)),
                           gko::Array<int>::view(exec, a.Height() + 1,
                                                 a.ReadWriteI(on_device)));

nghiakvnvsd · 2020-09-03T02:00:38Z

Hi @nbeams
The first error I reported was with the download of website mfem.org and ginkgo-1.2.0.

The modified ex2 I mentioned, it was just combined between ex1 (ginkgo) and ex2 (serial).

I have tried to use the mfem-wrappers and tried to merge ginkgo-precond and master mfem but it didn't work when I tried to run the example 1 (ginkgo) it reports:

Options used:
   --mesh ../../data/star.mesh
   --order 1
   --no-static-condensation
   --no-partial-assembly
   --device cpu
   --visualization
   --use_gko_solver
Device configuration: cpu
Memory configuration: host-std
Number of finite element unknowns: 20801
Size of linear system: 20801
Segmentation fault (core dumped)

nbeams · 2020-09-03T03:15:11Z

Sorry, I should have changed "I wouldn't actually recommend using my branch of MFEM as it currently stands" to "do not use my branch of MFEM as it currently stands." I forgot I had commented out some lines related to building Ginkgo solvers. They were causing conflicts with an update in Ginkgo that was related to fixing a CUDA compiler issue. I wasn't using the Ginkgo solvers in the tests in my branch, just the preconditioners, so it didn't matter to me at the time.

(But thanks for the reminder that I need to check again with the latest version of Ginkgo, and fix the solver integration in MFEM if needed.)

~~It looks like ex2 isn't configured to work with CUDA on the MFEM side. You would also have to add the Device configuration steps.~~ I take it back. I think it should currently build the matrix on the CPU and copy to GPU, so it would actually expect MFEM to be using the CPU, anyway.

nghiakvnvsd · 2020-09-03T11:02:39Z

Sorry, I should have changed "I wouldn't actually recommend using my branch of MFEM as it currently stands" to "do not use my branch of MFEM as it currently stands." I forgot I had commented out some lines related to building Ginkgo solvers. They were causing conflicts with an update in Ginkgo that was related to fixing a CUDA compiler issue. I wasn't using the Ginkgo solvers in the tests in my branch, just the preconditioners, so it didn't matter to me at the time.

(But thanks for the reminder that I need to check again with the latest version of Ginkgo, and fix the solver integration in MFEM if needed.)

~~It looks like ex2 isn't configured to work with CUDA on the MFEM side. You would also have to add the Device configuration steps.~~ I take it back. I think it should currently build the matrix on the CPU and copy to GPU, so it would actually expect MFEM to be using the CPU, anyway.

I think I will wait for the next version MFEM 4.2. Thank you very much for your support.

tcojean · 2020-09-03T15:14:20Z

Could you post the code of the modified ex2 to use Ginkgo together with it so that we can try to look into it? Thanks!

nghiakvnvsd · 2020-09-04T06:48:21Z

Could you post the code of the modified ex2 to use Ginkgo together with it so that we can try to look into it? Thanks!

Yes. Thank you!!!

https://drive.google.com/drive/folders/1RNyy8UIcndXxzseSU4pO1mpcg_O83o3F?usp=sharing

nghiakvnvsd · 2020-09-04T06:53:08Z

As you see, The ex2.cpp and ex2-cuda-executor.cpp just are different between "cuda" and "reference" executor. But I don't understand why ex2 converge while ex2-cuda-executor doesn't converge

nbeams · 2020-09-04T19:21:43Z

@nghiakvnvsd, I have built your examples ex2.cpp and ex2-cuda-executor.cpp locally with the latest MFEM master branch and Ginkgo 1.3. I will investigate the issue.

nbeams · 2020-09-14T16:02:48Z

@nghiakvnvsd: Unfortunately, I do not have an answer for you yet, but I haven't forgotten and I'm still looking into it.

thoasm · 2020-10-08T17:02:44Z

@nghiakvnvsd Update:
The problem for this specific matrix is the underlying generation of the ILU preconditioner. By default, we use the ParILU factorization to generate the incomplete approximates L and U. L and U have (combined) the same sparsity pattern as the original matrix (sparsity ILU(0)), however, we only approximate L and U (it is not computed directly by default).
The reference implementation for the ParILU actually does compute the ILU(0) and not the approximation, which is why it works fine (and OpenMP produces a close enough result).

Here, for CUDA, I would actually recommend to use the proper ILU(0) factorication method, which you can do with:

    factorization = share(
        gko::factorization::Ilu<value_type, index_type>::build().on(exec));
    auto solver_gen =
        gko::solver::Cg<value_type>::build()
            .with_criteria(
                gko::stop::Iteration::build().with_max_iters(2000u).on(exec),
                gko::stop::ResidualNormReduction<value_type>::build()
                    .with_reduction_factor(reduction_factor)
                    .on(exec))
            .with_preconditioner(precond_fact)
            .on(exec);

For OpenMP and Reference, we did not implement this factorization method yet, but ParILU does work fine there.

I am now looking into the Jacobi preconditioner and why it sometimes prevents conversion.
I will post an update on that here as soon as I have something notable.

This PR adds the option to sort the matrix before processing it inside the Jacobi preconditioner and the ILU factorization. By default, it is assumed that the input matrix is not sorted, therefore, a sorting step is performed (unless the factory parameter `skip_sorting` is set to `true`). Additionally, this PR adds functionality to unsort a given matrix for tests (currently only CSR and COO are supported). This functionality is then used in added tests that unsort the system matrix before generating / running for CSR, COO, Jacobi preconditioner and ILU factorization. Related PR: #659 Closes Issue: #633

pratikvn self-assigned this Sep 1, 2020

ginkgo-project deleted a comment from ginkgo-bot Sep 3, 2020

thoasm self-assigned this Oct 2, 2020

thoasm mentioned this issue Oct 30, 2020

Add sorting option to both Jacobi preconditioner and the ILU factorization #659

Merged

thoasm closed this as completed in #659 Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MFEM example] Wrong result when change preconditioner in example MFEM using Ginkgo #633

[MFEM example] Wrong result when change preconditioner in example MFEM using Ginkgo #633

nghiakvnvsd commented Sep 1, 2020 •

edited

Loading

nghiakvnvsd commented Sep 1, 2020 •

edited

Loading

pratikvn commented Sep 1, 2020

nghiakvnvsd commented Sep 1, 2020

pratikvn commented Sep 1, 2020

pratikvn commented Sep 1, 2020

nghiakvnvsd commented Sep 1, 2020

nghiakvnvsd commented Sep 2, 2020 •

edited

Loading

nghiakvnvsd commented Sep 2, 2020 •

edited

Loading

pratikvn commented Sep 2, 2020

nbeams commented Sep 2, 2020 •

edited

Loading

nghiakvnvsd commented Sep 3, 2020 •

edited

Loading

nbeams commented Sep 3, 2020 •

edited

Loading

nghiakvnvsd commented Sep 3, 2020

tcojean commented Sep 3, 2020

nghiakvnvsd commented Sep 4, 2020

nghiakvnvsd commented Sep 4, 2020

nbeams commented Sep 4, 2020

nbeams commented Sep 14, 2020

thoasm commented Oct 8, 2020

[MFEM example] Wrong result when change preconditioner in example MFEM using Ginkgo #633

[MFEM example] Wrong result when change preconditioner in example MFEM using Ginkgo #633

Comments

nghiakvnvsd commented Sep 1, 2020 • edited Loading

nghiakvnvsd commented Sep 1, 2020 • edited Loading

pratikvn commented Sep 1, 2020

nghiakvnvsd commented Sep 1, 2020

pratikvn commented Sep 1, 2020

pratikvn commented Sep 1, 2020

nghiakvnvsd commented Sep 1, 2020

nghiakvnvsd commented Sep 2, 2020 • edited Loading

nghiakvnvsd commented Sep 2, 2020 • edited Loading

pratikvn commented Sep 2, 2020

nbeams commented Sep 2, 2020 • edited Loading

nghiakvnvsd commented Sep 3, 2020 • edited Loading

nbeams commented Sep 3, 2020 • edited Loading

nghiakvnvsd commented Sep 3, 2020

tcojean commented Sep 3, 2020

nghiakvnvsd commented Sep 4, 2020

nghiakvnvsd commented Sep 4, 2020

nbeams commented Sep 4, 2020

nbeams commented Sep 14, 2020

thoasm commented Oct 8, 2020

nghiakvnvsd commented Sep 1, 2020 •

edited

Loading

nghiakvnvsd commented Sep 1, 2020 •

edited

Loading

nghiakvnvsd commented Sep 2, 2020 •

edited

Loading

nghiakvnvsd commented Sep 2, 2020 •

edited

Loading

nbeams commented Sep 2, 2020 •

edited

Loading

nghiakvnvsd commented Sep 3, 2020 •

edited

Loading

nbeams commented Sep 3, 2020 •

edited

Loading