Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend BLAS interface #3173

Merged
merged 16 commits into from
Jun 28, 2024
Merged

Extend BLAS interface #3173

merged 16 commits into from
Jun 28, 2024

Conversation

dkachuma
Copy link
Contributor

@dkachuma dkachuma commented Jun 17, 2024

Extends the BLAS/LAPACK interface to include linear solves where the solution and rhs are matrices. The addition is a function to solve
$$AX=B$$
where $A$ is an $n{\times}n$ matrix and $X$ and $B$ are $n{\times}m$ matrices.

This also adds an "in-place" version of solveLinearSystem. This takes the matrix $A$ and the rhs $B$ only. In using this, the caller is allowing LAPACK to destroy both the $A$ and $B$. On exit $A$ is replaced by an LU factorisation of $A$ and $B$ is replaced by the solution (in true LAPACK style). This has the advantage of not requiring extra allocations of memory within the call.

For the row-major version, I couldn't figure out how to call BLAS/LAPACK. So I was forced to allocate space anyway for the transposition of the solution matrix $X$. Two exceptions to this are 1) $m=1$ the vector case in which case row-major vs col-major is meaningless and 2) $m=n$ in which case $X$ can be transposed in place.

@dkachuma dkachuma self-assigned this Jun 18, 2024
@dkachuma dkachuma marked this pull request as ready for review June 18, 2024 20:13
Copy link

codecov bot commented Jun 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 55.89%. Comparing base (f05008b) to head (376b61d).

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3173      +/-   ##
===========================================
+ Coverage    55.77%   55.89%   +0.11%     
===========================================
  Files         1035     1037       +2     
  Lines        87914    88148     +234     
===========================================
+ Hits         49035    49271     +236     
+ Misses       38879    38877       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@victorapm victorapm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just some minor comments... Also: should we put a warning somewhere that this is not supported on GPUs (apparently)?

For the row-major version, I couldn't figure out how to call BLAS/LAPACK. So I was forced to allocate space anyway for the transposition of the solution matrix

It seems that lapacke follows this same approach and there's nothing much we can do, see:

https://github.com/Reference-LAPACK/lapack/blob/8b468db25c0c5a25d8e0020c7e2134e14cfd33d0/LAPACKE/src/lapacke_dgetrf_work.c
https://github.com/Reference-LAPACK/lapack/blob/8b468db25c0c5a25d8e0020c7e2134e14cfd33d0/LAPACKE/utils/lapacke_dge_trans.c

// This might require an extra allocation
real64 * solutionData = X.dataIfContiguous();
array2d< real64, MatrixLayout::COL_MAJOR_PERM > X0;
if constexpr ( USD == MatrixLayout::ROW_MAJOR )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to create a helper function for this operations (and also save some lines later when we transpose the data back)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Added some helpers.

@dkachuma
Copy link
Contributor Author

It seems that lapacke follows this same approach and there's nothing much we can do, see:

I actually looked at lapacke to get some inspiration and was gravely disappointed to see that they just copy. I also considered in-place transposes but they are so complicated and quite unreadable.

@victorapm
Copy link
Contributor

Yes, avoiding the copy would be great... Hmm, if this is to be called inside a FEM kernel or, generally speaking, called multiple times, I guess we could allocate the extra work space only once outside the solveLinearSystem function. At least, we get away with constant alloc/dealloc this way. Just sharing the idea...

@dkachuma dkachuma added the ci: run CUDA builds Allows to triggers (costly) CUDA jobs label Jun 26, 2024
@dkachuma
Copy link
Contributor Author

Yes, avoiding the copy would be great... Hmm, if this is to be called inside a FEM kernel or, generally speaking, called multiple times, I guess we could allocate the extra work space only once outside the solveLinearSystem function. At least, we get away with constant alloc/dealloc this way. Just sharing the idea...

Wasn't sure why kind of threaded code this is called from. Didn't want to introduce something that might complicate that. But I'm sure it can be done. Anyhow the current uses of this are only 1d so we shouldn't be hitting this issue yet.

@dkachuma dkachuma mentioned this pull request Jun 27, 2024
23 tasks
@dkachuma dkachuma added the flag: no rebaseline Does not require rebaseline label Jun 27, 2024
Copy link
Collaborator

@CusiniM CusiniM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I am afraid that doing that copy is the only solution.

@CusiniM CusiniM added ci: run integrated tests Allows to run the integrated tests in GEOS CI and removed flag: ready for review labels Jun 28, 2024
@CusiniM CusiniM merged commit 399ec9d into develop Jun 28, 2024
26 checks passed
@CusiniM CusiniM deleted the feature/dkachuma/extend-blas-interface branch June 28, 2024 16:45
Algiane pushed a commit that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci: run CUDA builds Allows to triggers (costly) CUDA jobs ci: run integrated tests Allows to run the integrated tests in GEOS CI flag: no rebaseline Does not require rebaseline type: feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants