Accelerating Iteratively Linear Detectors in Multi-User (ELAA-)MIMO Systems with UW-SVD

Jiuyu Liu, Yi Ma, Jinfei Wang, and Rahim Tafazolli Jiuyu Liu, Yi Ma (corresponding author), Jinfei Wang, and Rahim Tafazolli are with the 5GIC and 6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, United Kingdom, GU2 7XH, e-mails: (jiuyu.liu, y.ma, jinfei.wang, r.tafazolli)@surrey.ac.uk.This work was partially supported by the UK Department for Science, Innovation and Technology under the Future Open Networks Research Challenge project TUDOR (Towards Ubiquitous 3D Open Resilient Network).This work has been partially presented in SPAWC’2023, Shanghai [1].
Abstract

Current iterative multiple-input multiple-output (MIMO) detectors suffer from slow convergence when the wireless channel is ill-conditioned. The ill-conditioning is mainly caused by spatial correlation between channel columns corresponding to the same user equipment, known as intra-user interference. In addition, in the emerging MIMO systems using an extremely large aperture array (ELAA), spatial non-stationarity can make the channel even more ill-conditioned. In this paper, user-wise singular value decomposition (UW-SVD) is proposed to accelerate the convergence of iterative MIMO detectors. Its basic principle is to perform SVD on each user’s sub-channel matrix to eliminate intra-user interference. Then, the MIMO signal model is effectively transformed into an equivalent signal (e-signal) model, comprising an e-channel matrix and an e-signal vector. Existing iterative algorithms can be used to recover the e-signal vector, which undergoes post-processing to obtain the signal vector. It is proven that the e-channel matrix is better conditioned than the original MIMO channel for spatially correlated (ELAA-)MIMO channels. This implies that UW-SVD can accelerate current iterative algorithms, which is confirmed by our simulation results. Specifically, it can speed up convergence by up to 10101010 times in both uncoded and coded systems.

Index Terms:
Linear MIMO detectors, extremely large aperture array (ELAA), user-wise singular value decomposition (UW-SVD), channel ill-conditioning, fast convergence.

I Introduction

The primary focus of this paper is low-complexity signal detection for multi-user multiple-input multiple-output (MIMO) systems, particularly those deployed with extremely-large aperture arrays (ELAA). ELAA-MIMO systems can increase spectral efficiency by more than tenfold over current massive-MIMO systems [2]. This is because users are typically located in the near-field of the ELAA; and the near-field channels can provide higher spatial resolution compared to the far-field massive-MIMO channels [3, 4, 5]. For instance, under strong line-of-sight (LoS) conditions, ELAA-MIMO can support multiple data streams from a user equipment (UE) equipped with multiple antennas, while massive-MIMO channels can only support a single data stream per UE [6, 7]. In massive-MIMO systems, the wireless channel can become ill-conditioned due to high spatial correlation among channel columns [8]. However, ELAA channels can be even more ill-conditioned due to both channel spatial correlation and non-stationarity [9]. This makes the design of low-complexity MIMO detectors challenging, particularly those iterative algorithms with square-order complexity [10].

Maximum-likelihood (ML) detector, while achieving the optimal detection performance, is computationally impractical due to its exponentially growing complexity [11]. Linear detectors, such as zero-forcing (ZF) and minimum mean square error (LMMSE), offer a more computationally efficient alternative, providing near-optimal detection performance [12]. However, they both require a Gram matrix inverse with cubic-order complexity, which limits their applications in large-scale MIMO systems [13]. Instead, iterative algorithms achieve ZF/LMMSE detection performance with square-order complexity, bypassing the matrix inverse. Conventional algorithms with simple structures, such as Richardson iteration (RI) [14] and Neumann series [15], can offer fast convergence in well-conditioned channel matrices. Conversely, they may exhibit divergence when applied to ill-conditioned channel matrices [13]. This challenge motivates more advanced algorithms that aim to achieve fast convergence in such channel matrices.

I-A Relevant Prior Arts

Current iterative algorithms can be classified into three main categories [16, 17, 18]: 1) gradient methods, 2) belief propagation, and 3) matrix-splitting (MS) based methods.

Gradient methods can achieve global convergence in solving the problem of linear MIMO detection [19]. For instances, steepest descent (SD) method updates in the same direction as RI, but converges faster than RI because it optimizes the step size in each iteration [20]; conjugate gradient (CG) method leverages the Hermitian nature of the Gram channel matrix to determine a more efficient update direction, which further accelerates convergence compared to SD [21, 22]. In addition, quasi-Newton (QN) methods represent an important branch of gradient methods, such as symmetric rank 1 (SR1) [23] and Broyden-Fletcher-Goldfarb-Shanno (BFGS) [19]. Due to the iterative approximation of the Gram matrix inversion, QN methods typically exhibit cubic order complexity. Recently, the application of limited-memory BFGS (L-BFGS) to linear MIMO detection has demonstrated its ability to achieve convergence equivalent to that of BFGS while requiring only square-order complexity [24].

Belief propagation refers to iterative message passing (MP) algorithms. Among these algorithms, approximate MP (AMP) was initially proposed for compressive sensing and has been applied for MIMO detection in recent years [25, 26]. The complexity of AMP is comparable to that of L-BFGS, and both algorithms exhibit similar convergence rate in massive-MIMO systems. However, AMP diverges in ELAA-MIMO systems due to the channel non-stationarity [27]. AMP variants such as orthogonal AMP (OAMP) and vector AMP (VAMP) [28, 29] have been proposed to address this problem, and their detection performance is even slightly better than that of LMMSE [30]. However, they both introduce computational overhead due to the requirement for matrix inverse or singular value decomposition (SVD), resulting in cubic-order complexity.

In MS-based methods, the Gram channel matrix is divided into the sum of several individually invertible sub-matrices, the inversions of which will be used to accelerate the convergence [17]. Typically, they divide the Gram matrix into its diagonal part and its upper and lower triangular parts. MS-based methods include Jacobi iteration (JI) [31], Gauss-Seidel (GS) method [32], and successive over-relaxation (SSOR) [33]. Specifically, the inverses of the diagonal and lower triangular matrices are used for the JI and GS methods, respectively. Furthermore, SSOR converges faster than the JI and GS methods because it uses the inverses of both the upper and lower triangular matrices to further accelerate the convergence. The triangular matrix inverse has square-order complexity, making it scalable for large-scale MIMO systems [17].

I-B Motivation of This Paper

Based on recent theoretical advancements [34, 35, 36, 37, 38] and empirical field measurements [39, 40, 41, 42], it has been observed that intra-user interference is much stronger than inter-user interference in ELAA-MIMO systems. This phenomenon also holds true for conventional massive-MIMO systems when spatial correlations are taken into account (see [8] and our discussion in Section IV-B for more details). However, current iterative algorithms typically use a generalized approach to tackle the problem of channel ill-conditioning, disregarding the distinctive features of multi-user MIMO channels. As a result, when spatial correlation is considered, current algorithms still require tens of iterations to converge in (ELAA-)MIMO systems [1]. This motivates the rest of this paper.

I-C Contributions of This Paper

In this paper, we propose to utilize user-wise SVD (UW-SVD) to accelerate the convergence of current iterative algorithms in multi-user (ELAA-)MIMO systems. The concept of UW-SVD is to perform SVD on the sub-channel matrix corresponding to each UE 111 The authors are aware that this option is used in some prior works, e.g., [43, 44, 45], but they all focus on optimizing power allocation strategies using the singular value matrices. In contrast, our study focuses on the left unitary matrices to accelerate the convergence of current iterative algorithms., thereby eliminating the intra-user interference. The MIMO signal model can then be transformed to an equivalent signal (e-signal) model containing an e-channel matrix and a corresponding e-signal vector. The major differences between current iterative algorithms and UW-SVD-assisted iterative algorithms are illustrated in Fig. 1. It can be observed that current algorithms applied to the MIMO signal model converge directly to the estimation of the transmitted signal. In contrast, UW-SVD-based algorithms first converge to an estimation of the e-signal vector and then convert it back to the transmitted signal through a post-processing step. An e-signal model-based ZF detector, termed e-ZF, was developed in our previous work [1]. It is proven that, after post-processing, e-ZF detector can provide equivalent estimation to ZF detector.

Refer to caption
Figure 1: Illustration of the major differences between the current iterative algorithms and UW-SVD-assisted iterative algorithms.

In addition to [1], an LMMSE detector for the e-signal model, termed e-LMMSE, is developed in this paper. Also, it is proven to provide equivalent detection performance to the LMMSE detector. Furthermore, it is demonstrated that the e-channel matrix exhibits a lower condition number compared to the original channel matrix, particularly in ELAA-MIMO systems. Considering an ELAA-MIMO system under LoS conditions as an example, when the spatial correlation of small-scale fading is not accounted for, the condition number of the MIMO channel matrix is approximately 60606060, while the condition number of the e-channel matrix is significantly lower at approximately 5555. Moreover, when the spatial correlation is considered, the condition number of the MIMO channel matrix increases substantially to approximately 700700700700. However, even in this scenario, the condition number of the e-channel matrix remains significantly lower at approximately 7777. A lower condition number indicates that a matrix is less sensitive to perturbations, which means that iterative algorithms can converge to the correct solution more quickly. Therefore, the proposed UW-SVD can significantly accelerate the convergence of current iterative algorithms to achieve ZF/LMMSE performance. This is evident in our computer simulations. For example, the UW-SVD-assisted SSOR converges ten times faster than SSOR in both uncoded and coded ELAA-MIMO systems. Finally, it is worth noting that UW-SVD can also speed up the convergence in conventional massive-MIMO channels when the spatial correlation is taken into account.

I-D Organization and Notations

The rest of this paper is organized as follows. Section II presents the system model, preliminaries, and problem statement. Section III describes the principle of UW-SVD and its application in accelerating the convergence of current iterative algorithms. Section IV presents the convergence analysis. Section V presents the numerical and simulation results. Finally, the conclusion is presented in Section VI.

Notations

Regular letter, lower-case bold letter, and capital bold letter represent scalar, vector and matrix, respectively. The notations []Hsuperscriptdelimited-[]𝐻[\cdot]^{H}[ ⋅ ] start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT, []1superscriptdelimited-[]1[\cdot]^{-1}[ ⋅ ] start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, \|\cdot\|∥ ⋅ ∥, 𝔼{}𝔼\mathbb{E}\{\cdot\}blackboard_E { ⋅ } and cond()cond\mathrm{cond}(\cdot)roman_cond ( ⋅ ), represent the Hermitian, inverse, Euclidean norm, expectation, and condition number of a matrix (a vector or a scalar if appropriate), respectively. 𝔻()𝔻\mathbb{D}(\cdot)blackboard_D ( ⋅ ) and 𝕃()𝕃\mathbb{L}(\cdot)blackboard_L ( ⋅ ) denote a matrix formed by the diagonal and lower-triangular part of a matrix, respectively. λmax()subscript𝜆max\lambda_{\text{max}}(\cdot)italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( ⋅ ) or λmin()subscript𝜆min\lambda_{\text{min}}(\cdot)italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( ⋅ ) denote the maximum or minimum eigenvalue of a matrix. diag()diag\mathrm{diag}(\cdot)roman_diag ( ⋅ ) constructs the input matrices in a block diagonal form. 𝐈𝐈\mathbf{I}bold_I and 𝟎0\mathbf{0}bold_0 denote identity and zero matrices with compatible dimensions.

II System Model, Preliminaries, and Problem Statement

This section begins by introducing the system model. Next, it presents linear MIMO detectors and low-complexity iterative algorithms. Finally, it discusses the challenges of these algorithms in achieving ZF/LMMSE detection performance in ill-conditioned channel matrix.

II-A System Model

Let M𝑀Mitalic_M and N𝑁Nitalic_N denote the number of service antennas and user antennas, respectively. For ELAA-MIMO and massive-MIMO systems, their signal models share the same mathematical form and can be expressed as follows

𝐲=𝐇𝐱+𝐳,𝐲𝐇𝐱𝐳\mathbf{y}=\mathbf{H}\mathbf{x}+\mathbf{z},bold_y = bold_Hx + bold_z , (1)

where 𝐲M×1𝐲superscript𝑀1\mathbf{y}\in\mathbb{C}^{M\times 1}bold_y ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × 1 end_POSTSUPERSCRIPT denotes the received signal vector, 𝐱N×1𝐱superscript𝑁1\mathbf{x}\in\mathbb{C}^{N\times 1}bold_x ∈ blackboard_C start_POSTSUPERSCRIPT italic_N × 1 end_POSTSUPERSCRIPT the transmitted signal vector, 𝐳𝒞𝒩(0,σz2𝐈)similar-to𝐳𝒞𝒩0superscriptsubscript𝜎𝑧2𝐈\mathbf{z}\sim\mathcal{CN}(0,\sigma_{z}^{2}\mathbf{I})bold_z ∼ caligraphic_C caligraphic_N ( 0 , italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I ) the additive white Gaussian noise (AWGN), σz2superscriptsubscript𝜎𝑧2\sigma_{z}^{2}italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT the noise variance, and 𝐈𝐈\mathbf{I}bold_I represents an identity matrix with compatible dimensions. Each element of 𝐱𝐱\mathbf{x}bold_x is drawn from a finite alphabet-set with equal probability and fulfills: 𝔼{𝐱}=𝟎𝔼𝐱0\mathbb{E}\{\mathbf{x}\}=\mathbf{0}blackboard_E { bold_x } = bold_0 and 𝔼{𝐱𝐱H}=σx2𝐈𝔼superscript𝐱𝐱𝐻superscriptsubscript𝜎𝑥2𝐈\mathbb{E}\{\mathbf{x}\mathbf{x}^{H}\}=\sigma_{x}^{2}\mathbf{I}blackboard_E { bold_xx start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT } = italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I. Note that the random channel matrix 𝐇M×N𝐇superscript𝑀𝑁\mathbf{H}\in\mathbb{C}^{M\times N}bold_H ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT has different distributions in massive-MIMO and ELAA-MIMO systems. In Section V-A, we consider four random distributions of 𝐇𝐇\mathbf{H}bold_H for computer simulations.

In ELAA-MIMO and conventional massive-MIMO systems, the performance can be significantly degraded by spatial correlation between user antennas. This spatial correlation leads to two types of interference: 1) intra-user interference, and 2) inter-user interference. Intra-user interference occurs when signals transmitted from different antenna elements to the same user interfere with each other due to spatial correlation. Conversely, inter-user interference is caused by signals intended for other users. Typically, intra-user interference is much stronger than inter-user interference. The reason for this is that the distance between antennas serving the same user is usually less than the distance between antennas serving different users. Section IV-B provides a mathematical justification for this phenomenon. Consequently, the primary objective of this work is to mitigate the intra-user interference for both ELAA-MIMO and massive-MIMO systems.

II-B Preliminaries

The two most classical linear MIMO detectors are ZF and LMMSE, which can be expressed as follows [17]

𝐱^=𝐀1𝐛,^𝐱superscript𝐀1𝐛\widehat{\mathbf{x}}=\mathbf{A}^{-1}\mathbf{b},over^ start_ARG bold_x end_ARG = bold_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_b , (2)

where 𝐛=𝐇H𝐲𝐛superscript𝐇𝐻𝐲\mathbf{b}=\mathbf{H}^{H}\mathbf{y}bold_b = bold_H start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_y represents the matched filter vector, 𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG the estimation of 𝐱𝐱\mathbf{x}bold_x, and 𝐀𝐀\mathbf{A}bold_A is a Gram filter matrix. For ZF and LMMSE detectors, 𝐀𝐀\mathbf{A}bold_A can be expressed as follows

𝐀={𝐀zf𝐇H𝐇;𝐀lmmse𝐇H𝐇+ρ1𝐈,𝐀casessubscript𝐀zfsuperscript𝐇𝐻𝐇subscript𝐀lmmsesuperscript𝐇𝐻𝐇superscript𝜌1𝐈\mathbf{A}=\left\{\begin{array}[]{l}\mathbf{A}_{\textsc{zf}}\triangleq\mathbf{% H}^{H}\mathbf{H};\\ \mathbf{A}_{\textsc{lmmse}}\triangleq\mathbf{H}^{H}\mathbf{H}+\rho^{-1}\mathbf% {I},\end{array}\right.bold_A = { start_ARRAY start_ROW start_CELL bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ≜ bold_H start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H ; end_CELL end_ROW start_ROW start_CELL bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ≜ bold_H start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I , end_CELL end_ROW end_ARRAY (3)

where ρ=σx2/σz2𝜌superscriptsubscript𝜎𝑥2superscriptsubscript𝜎𝑧2\rho=\sigma_{x}^{2}/\sigma_{z}^{2}italic_ρ = italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT / italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT denotes the signal-to-noise ratio (SNR). However, both ZF and LMMSE detectors require the inverse calculation of 𝐀𝐀\mathbf{A}bold_A, which is computationally prohibitive for large MIMO sizes.

A number of iterative algorithms have been proposed to efficiently solve the problem in (2) bypassing 𝐀𝐀\mathbf{A}bold_A inverse [16, 17, 18]. Their general form can be expressed as follows

𝐱t+1=f(𝐱t;𝐀,𝐛),subscript𝐱𝑡1𝑓subscript𝐱𝑡𝐀𝐛\mathbf{x}_{t+1}=f(\mathbf{x}_{t};\mathbf{A},\mathbf{b}),bold_x start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_f ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_A , bold_b ) , (4)

where 𝐱tsubscript𝐱𝑡\mathbf{x}_{t}bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents the tthsuperscript𝑡𝑡t^{th}italic_t start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT estimation of 𝐱𝐱\mathbf{x}bold_x and f()𝑓f(\cdot)italic_f ( ⋅ ) is a linear function that varies depending on the specific algorithm employed. Taking RI as an example, fri()subscript𝑓rif_{\textsc{ri}}(\cdot)italic_f start_POSTSUBSCRIPT ri end_POSTSUBSCRIPT ( ⋅ ) is given by [14]

fri(𝐱t;𝐀,𝐛)=𝐱t+(𝐛𝐀𝐱t),subscript𝑓risubscript𝐱𝑡𝐀𝐛subscript𝐱𝑡𝐛subscript𝐀𝐱𝑡f_{\textsc{ri}}(\mathbf{x}_{t};\mathbf{A},\mathbf{b})=\mathbf{x}_{t}+(\mathbf{% b}-\mathbf{A}\mathbf{x}_{t}),italic_f start_POSTSUBSCRIPT ri end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_A , bold_b ) = bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( bold_b - bold_Ax start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (5)

which has a simple structure, but it diverges when 𝐀𝐀\mathbf{A}bold_A is ill-conditioned [12].

To address this issue, more advanced algorithms have been proposed to achieve faster convergence. For instance, the iterative process of MS-based methods is given by [1]

fms(𝐱t;𝐀,𝐛)=𝐱t+𝐌1(𝐛𝐀𝐱t),subscript𝑓mssubscript𝐱𝑡𝐀𝐛subscript𝐱𝑡superscript𝐌1𝐛subscript𝐀𝐱𝑡f_{\textsc{ms}}(\mathbf{x}_{t};\mathbf{A},\mathbf{b})=\mathbf{x}_{t}+\mathbf{M% }^{-1}(\mathbf{b}-\mathbf{A}\mathbf{x}_{t}),italic_f start_POSTSUBSCRIPT ms end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_A , bold_b ) = bold_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_b - bold_Ax start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (6)

where 𝐌𝐌\mathbf{M}bold_M represents the preconditioning matrix, and it is constructed based on the following matrix splitting

𝐀=𝕃(𝐀)+𝕃(𝐀)H𝔻(𝐀),𝐀𝕃𝐀𝕃superscript𝐀𝐻𝔻𝐀\mathbf{A}=\mathbb{L}(\mathbf{A})+\mathbb{L}(\mathbf{A})^{H}-\mathbb{D}(% \mathbf{A}),bold_A = blackboard_L ( bold_A ) + blackboard_L ( bold_A ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT - blackboard_D ( bold_A ) , (7)

where 𝕃()𝕃\mathbb{L}(\cdot)blackboard_L ( ⋅ ) and 𝔻()𝔻\mathbb{D}(\cdot)blackboard_D ( ⋅ ) represent matrices formed by the lower triangular and diagonal parts of the input matrix, respectively. Since 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT and 𝐀lmmsesubscript𝐀lmmse\mathbf{A}_{\textsc{lmmse}}bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT are both Hermitian matrices, 𝕃(𝐀)H𝕃superscript𝐀𝐻\mathbb{L}(\mathbf{A})^{H}blackboard_L ( bold_A ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT in (7) actually represents the upper triangular part of 𝐀𝐀\mathbf{A}bold_A. For JI, GS and SSOR methods, the preconditioning matrices are defined as follows: 𝐌ji=𝔻(𝐀)subscript𝐌ji𝔻𝐀\mathbf{M}_{\textsc{ji}}=\mathbb{D}(\mathbf{A})bold_M start_POSTSUBSCRIPT ji end_POSTSUBSCRIPT = blackboard_D ( bold_A ), 𝐌gs=𝕃(𝐀)subscript𝐌gs𝕃𝐀\mathbf{M}_{\textsc{gs}}=\mathbb{L}(\mathbf{A})bold_M start_POSTSUBSCRIPT gs end_POSTSUBSCRIPT = blackboard_L ( bold_A ) [32], and 𝐌ssor=𝕃(𝐀)𝔻(𝐀)1𝕃(𝐀)Hsubscript𝐌ssor𝕃𝐀𝔻superscript𝐀1𝕃superscript𝐀𝐻\mathbf{M}_{\textsc{ssor}}=\mathbb{L}(\mathbf{A})\mathbb{D}(\mathbf{A})^{-1}% \mathbb{L}(\mathbf{A})^{H}bold_M start_POSTSUBSCRIPT ssor end_POSTSUBSCRIPT = blackboard_L ( bold_A ) blackboard_D ( bold_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT blackboard_L ( bold_A ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT [33], respectively. MS-based methods generally have faster convergence than RI due to the use of 𝐌𝐌\mathbf{M}bold_M, which enables a more efficient update direction. In addition, the inversion of triangular matrices exhibits square-order complexity, making it computationally efficient.

Furthermore, gradient methods can also provide accelerated convergence over RI by using adaptive optimization of the step size, update direction, or both. To avoid redundancy within this paper, a detailed discussion of gradient methods will be presented in Section III-D.

II-C Problem Statement

The convergence rate of the iterative algorithms described in (4) is significantly influenced by the condition number of 𝐀𝐀\mathbf{A}bold_A [17]. Specifically, for a given iterative function, it converges faster when the condition number is smaller. In this paper, the condition number is defined as follows

cond(𝐀)λmax(𝐀)λmin(𝐀),cond𝐀subscript𝜆max𝐀subscript𝜆min𝐀\mathrm{cond}(\mathbf{A})\triangleq\dfrac{\lambda_{\text{max}}(\mathbf{A})}{% \lambda_{\text{min}}(\mathbf{A})},roman_cond ( bold_A ) ≜ divide start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A ) end_ARG , (8)

where λmax()subscript𝜆max\lambda_{\text{max}}(\cdot)italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( ⋅ ) and λmin()subscript𝜆min\lambda_{\text{min}}(\cdot)italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( ⋅ ) represent the maximum and minimum eigenvalues of the input matrix, respectively. In massive-MIMO systems without spatial correlation, current algorithms can offer fast convergence, since 𝐀𝐀\mathbf{A}bold_A is well-conditioned. However, they all demonstrate slow convergence in ELAA-MIMO systems, particularly in scenarios dominated by LoS links [1]. The reason is that ELAA channel matrices could be very ill-conditioned, meaning cond(𝐀)1much-greater-thancond𝐀1\mathrm{cond}(\mathbf{A})\gg 1roman_cond ( bold_A ) ≫ 1 [46]. As discussed in Section I-B, the main reason contributing to (ELAA-)MIMO channel ill-conditioning is the strong intra-user interference. Therefore, the objective of this paper is to efficiently eliminate the impact of intra-user interference on the iterative process, and the following sections are motivated.

III UW-SVD-Assisted Iterative Algorithms

This section introduces the concept of UW-SVD and its role in transforming the MIMO signal model into an e-signal model. Subsequently, the derivations of e-ZF and e-LMMSE detectors for the e-signal model are presented. Furthermore, existing iterative algorithms are employed to estimate the e-signal vector, which is then transformed back to the estimation of 𝐱𝐱\mathbf{x}bold_x through the post-processing step.

III-A The Concept of UW-SVD

Suppose there are K𝐾Kitalic_K UEs deployed in the MIMO system, and the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT UE is equipped with Nksubscript𝑁𝑘N_{k}italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT antennas. The system configuration satisfies: k=1KNk=Nsuperscriptsubscript𝑘1𝐾subscript𝑁𝑘𝑁\sum_{k=1}^{K}N_{k}=N∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_N. Then, the complete channel matrix can be represented in a concatenated format as follows

𝐇=[𝐇1,,𝐇K],𝐇subscript𝐇1subscript𝐇𝐾\mathbf{H}=[\mathbf{H}_{1},...,\mathbf{H}_{K}],bold_H = [ bold_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] , (9)

where 𝐇kM×Nksubscript𝐇𝑘superscript𝑀subscript𝑁𝑘\mathbf{H}_{k}\in\mathbb{C}^{M\times N_{k}}bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represents the sub-channel matrix corresponding to the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT UE. To eliminate intra-user interference, we apply the economy-size SVD 222A variant of SVD that computes only the necessary components for tall matrices, enhancing computational efficiency [47]. to each user’s sub-channel matrix as follows

𝐇k=𝐔k𝚺k𝐕kH,subscript𝐇𝑘subscript𝐔𝑘subscript𝚺𝑘subscriptsuperscript𝐕𝐻𝑘\mathbf{H}_{k}=\mathbf{U}_{k}\mathbf{\Sigma}_{k}\mathbf{V}^{H}_{k},bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , (10)

where 𝐔kM×Nksubscript𝐔𝑘superscript𝑀subscript𝑁𝑘\mathbf{U}_{k}\in\mathbb{C}^{M\times N_{k}}bold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represents the left unitary matrix, 𝚺kNk×Nksubscript𝚺𝑘superscriptsubscript𝑁𝑘subscript𝑁𝑘\mathbf{\Sigma}_{k}\in\mathbb{R}^{N_{k}\times N_{k}}bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT the diagonal matrix containing the singular values, and 𝐕kNk×Nksubscript𝐕𝑘superscriptsubscript𝑁𝑘subscript𝑁𝑘\mathbf{V}_{k}\in\mathbb{C}^{N_{k}\times N_{k}}bold_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represents the right unitary matrix. This step is the so-called UW-SVD 1.

Substituting (10) into (9) with some tidy-up work, 𝐇𝐇\mathbf{H}bold_H can be decomposed into the following three matrix multiplications

𝐇=𝚿𝚺𝐕H,𝐇𝚿𝚺superscript𝐕𝐻\mathbf{H}=\mathbf{\Psi}\mathbf{\Sigma}\mathbf{V}^{H},bold_H = bold_Ψ bold_Σ bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT , (11)

where 𝚿[𝐔1,,𝐔K]𝚿subscript𝐔1subscript𝐔𝐾\mathbf{\Psi}\triangleq[\mathbf{U}_{1},...,\mathbf{U}_{K}]bold_Ψ ≜ [ bold_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_U start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] represents the concatenation of 𝐔ksubscript𝐔𝑘\mathbf{U}_{k}bold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT. 𝚺diag(𝚺1,,𝚺K)𝚺diagsubscript𝚺1subscript𝚺𝐾\mathbf{\Sigma}\triangleq\mathrm{diag}(\mathbf{\Sigma}_{1},\dots,\mathbf{% \Sigma}_{K})bold_Σ ≜ roman_diag ( bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_Σ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) and 𝐕diag(𝐕1,,𝐕K)𝐕diagsubscript𝐕1subscript𝐕𝐾\mathbf{V}\triangleq\mathrm{diag}(\mathbf{V}_{1},\dots,\mathbf{V}_{K})bold_V ≜ roman_diag ( bold_V start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_V start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) are block diagonal matrices containing the singular value and right-unitary matrices, respectively. The notation diag()diag\mathrm{diag}(\cdot)roman_diag ( ⋅ ) represents the construction of the input matrices in a block diagonal manner. It is obvious that 𝚺𝚺\mathbf{\Sigma}bold_Σ is a positive-real diagonal matrix and 𝐕𝐕\mathbf{V}bold_V is a unitary matrix, i.e.,

𝐕H𝐕=𝐕𝐕H=𝐈.superscript𝐕𝐻𝐕superscript𝐕𝐕𝐻𝐈\mathbf{V}^{H}\mathbf{V}=\mathbf{V}\mathbf{V}^{H}=\mathbf{I}.bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_V = bold_VV start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT = bold_I . (12)

However, it is worth noting that 𝚿𝚿\mathbf{\Psi}bold_Ψ is not a unitary matrix in practical MIMO systems. This is because the left-unitary matrices are tall matrices, and 𝐔ksubscript𝐔𝑘\mathbf{U}_{k}bold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT for different UEs may not necessarily be orthogonal to each other. Next, we will explore the transformation from the MIMO signal model to e-signal model using UW-SVD.

III-B The e-Signal Model

According to the UW-SVD in (11), the MIMO signal model in (1) can be transformed into an e-signal model as follows

𝐲=𝚿𝐬+𝐳,𝐲𝚿𝐬𝐳\mathbf{y}=\mathbf{\Psi}\mathbf{s}+\mathbf{z},bold_y = bold_Ψ bold_s + bold_z , (13)

where 𝐬𝚺𝐕H𝐱𝐬𝚺superscript𝐕𝐻𝐱\mathbf{s}\triangleq\mathbf{\Sigma}\mathbf{V}^{H}\mathbf{x}bold_s ≜ bold_Σ bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_x represents the e-signal vector, and 𝚿𝚿\mathbf{\Psi}bold_Ψ represents the e-channel matrix. Therefore, 𝚿𝚿\mathbf{\Psi}bold_Ψ and 𝐬𝐬\mathbf{s}bold_s are the linear representations of 𝐇𝐇\mathbf{H}bold_H and 𝐱𝐱\mathbf{x}bold_x, respectively.

Remark 1

With this e-signal model, it is important to understand the properties of 𝐬𝐬\mathbf{s}bold_s and 𝚿𝚿\mathbf{\Psi}bold_Ψ. Taking 𝐬𝐬\mathbf{s}bold_s as an example, its expectation and covariance can be expressed as follows

𝔼{𝐬}𝔼𝐬\displaystyle\mathbb{E}\{\mathbf{s}\}\ blackboard_E { bold_s } =𝟎;absent0\displaystyle=\mathbf{0};= bold_0 ; (14)
𝔼{𝐬𝐬H}𝔼superscript𝐬𝐬𝐻\displaystyle\mathbb{E}\{\mathbf{s}\mathbf{s}^{H}\}\ blackboard_E { bold_ss start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT } =σx2𝚺2,absentsuperscriptsubscript𝜎𝑥2superscript𝚺2\displaystyle=\sigma_{x}^{2}\mathbf{\Sigma}^{2},= italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (15)

which can be easily obtained from 𝔼{𝐱}=𝟎𝔼𝐱0\mathbb{E}\{\mathbf{x}\}=\mathbf{0}blackboard_E { bold_x } = bold_0 and 𝔼{𝐱𝐱H}=σx2𝐈𝔼superscript𝐱𝐱𝐻superscriptsubscript𝜎𝑥2𝐈\mathbb{E}\{\mathbf{x}\mathbf{x}^{H}\}=\sigma_{x}^{2}\mathbf{I}blackboard_E { bold_xx start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT } = italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT bold_I. This indicates that distinct e-signal data streams are orthogonal to each other and they exhibit different transmission powers. Moreover, we have the following

𝔻(𝚿H𝚿)=𝐈,𝔻superscript𝚿𝐻𝚿𝐈\mathbb{D}\big{(}\mathbf{\Psi}^{H}\mathbf{\Psi}\big{)}=\mathbf{I},blackboard_D ( bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ ) = bold_I , (16)

which indicates that the complexity of computing 𝐃(𝚿H𝚿)𝐃superscript𝚿𝐻𝚿\mathbf{D}\big{(}\mathbf{\Psi}^{H}\mathbf{\Psi}\big{)}bold_D ( bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ ) can be ignored in certain iterative algorithms, such as JI and L-BFGS methods.

Note that the condition number of 𝚿𝚿\mathbf{\Psi}bold_Ψ is crucial for this paper because it significantly affects the convergence of the UW-SVD-assisted algorithms. This property will be examined in Section IV, where we present a comprehensive convergence analysis. Before that, we focus on the development of linear detectors for the e-signal model in the next subsection.

III-C The e-ZF and e-LMMSE Detectors

In this subsection, we develop the e-ZF and e-LMMSE detectors for the e-signal model. Additionally, it is demonstrated that they can achieve the same detection performance as the corresponding ZF or LMMSE detector after a low-complexity post-processing step.

Given that the e-signal model is a linear representation of the MIMO signal model, its two linear detectors (i.e., e-ZF and e-LMMSE) can be expressed in the following general form

𝐬^=𝚽1𝜹,^𝐬superscript𝚽1𝜹\widehat{\mathbf{s}}=\mathbf{\Phi}^{-1}\mbox{\boldmath$\delta$},over^ start_ARG bold_s end_ARG = bold_Φ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_italic_δ , (17)

where 𝜹=𝚿H𝐲𝜹superscript𝚿𝐻𝐲\mbox{\boldmath$\delta$}=\mathbf{\Psi}^{H}\mathbf{y}bold_italic_δ = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_y denotes the matched filter vector for the e-signal model. Similar to that in (3), 𝚽𝚽\mathbf{\Phi}bold_Φ for e-ZF and e-LMMSE can be expressed as follows

𝚽={𝚽zf𝚿H𝚿;𝚽lmmse𝚿H𝚿+ρ1𝚺2.𝚽casessubscript𝚽zfsuperscript𝚿𝐻𝚿subscript𝚽lmmsesuperscript𝚿𝐻𝚿superscript𝜌1superscript𝚺2\mathbf{\Phi}=\left\{\begin{array}[]{l}\mathbf{\Phi}_{\textsc{zf}}\triangleq% \mathbf{\Psi}^{H}\mathbf{\Psi};\\ \mathbf{\Phi}_{\textsc{lmmse}}\triangleq\mathbf{\Psi}^{H}\mathbf{\Psi}+\rho^{-% 1}\mathbf{\Sigma}^{-2}.\end{array}\right.bold_Φ = { start_ARRAY start_ROW start_CELL bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ≜ bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ ; end_CELL end_ROW start_ROW start_CELL bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ≜ bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT . end_CELL end_ROW end_ARRAY (18)

It is obvious that (17) and (2) share the same mathematical structure. Hence, any iterative algorithm designed to determine 𝐱^zfsubscript^𝐱zf\widehat{\mathbf{x}}_{\textsc{zf}}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT or 𝐱^lmmsesubscript^𝐱lmmse\widehat{\mathbf{x}}_{\textsc{lmmse}}over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT can be directly applied to determine 𝐬^zfsubscript^𝐬zf\widehat{\mathbf{s}}_{\textsc{zf}}over^ start_ARG bold_s end_ARG start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT or 𝐬^lmmsesubscript^𝐬lmmse\widehat{\mathbf{s}}_{\textsc{lmmse}}over^ start_ARG bold_s end_ARG start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT, respectively. Since the objective of MIMO signal detection is to reconstruct the transmitted signal vector 𝐱𝐱\mathbf{x}bold_x, a post-processing step is required to convert 𝐬^^𝐬\widehat{\mathbf{s}}over^ start_ARG bold_s end_ARG back to 𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG.

Post-Processing Step: Consistent with the definition of 𝐬𝐬\mathbf{s}bold_s, we propose to reconvert 𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG from 𝐬^^𝐬\widehat{\mathbf{s}}over^ start_ARG bold_s end_ARG as follows

𝐱^=𝐕𝚺1𝐬^.^𝐱𝐕superscript𝚺1^𝐬\widehat{\mathbf{x}}=\mathbf{V}\mathbf{\Sigma}^{-1}\widehat{\mathbf{s}}.over^ start_ARG bold_x end_ARG = bold_V bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_s end_ARG . (19)

where 𝚺𝚺\mathbf{\Sigma}bold_Σ is a diagonal matrix, so that computing 𝚺1superscript𝚺1\mathbf{\Sigma}^{-1}bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT requires only linear computational complexity. Plugging (18) into (19) with some tidy-up works, we can have the following

𝐱^zf=𝐕𝚺1𝐬^zf;subscript^𝐱zf𝐕superscript𝚺1subscript^𝐬zf\displaystyle\widehat{\mathbf{x}}_{\textsc{zf}}=\mathbf{V}\mathbf{\Sigma}^{-1}% \widehat{\mathbf{s}}_{\textsc{zf}};over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_V bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_s end_ARG start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ; (20)
𝐱^lmmse=𝐕𝚺1𝐬^lmmse.subscript^𝐱lmmse𝐕superscript𝚺1subscript^𝐬lmmse\displaystyle\widehat{\mathbf{x}}_{\textsc{lmmse}}=\mathbf{V}\mathbf{\Sigma}^{% -1}\widehat{\mathbf{s}}_{\textsc{lmmse}}.over^ start_ARG bold_x end_ARG start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT = bold_V bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT over^ start_ARG bold_s end_ARG start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT . (21)

With the post-processing step, the e-ZF and e-LMMSE detectors can provide the same detection performance as the ZF and LMMSE detectors, respectively. This implies that any iterative algorithm that converges to 𝐬^^𝐬\widehat{\mathbf{s}}over^ start_ARG bold_s end_ARG can provide ZF or LMMSE detection performance. The specific steps of the UW-SVD-assisted algorithms are discussed in the next section.

III-D UW-SVD Assisted Iterative Algorithms

As discussed in Section III-C, all the MS-based methods and gradient methods can be directly applied to estimate 𝐬𝐬\mathbf{s}bold_s 333AMP and its variants require further modifications to determine 𝐬𝐬\mathbf{s}bold_s due to their structures; this exploration is beyond the scope of this paper. Additionally, using MS-based methods and L-BFGS is sufficient to demonstrate the advantage of the proposed UW-SVD method.. These methods share the same iterative structure as (4), except that the specific parameters are adjusted as follows

𝐬t+1=f(𝐬t;𝚽,𝜹),subscript𝐬𝑡1𝑓subscript𝐬𝑡𝚽𝜹\mathbf{s}_{t+1}=f(\mathbf{s}_{t};\mathbf{\Phi},\mbox{\boldmath$\delta$}),bold_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT = italic_f ( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_Φ , bold_italic_δ ) , (22)

which is the so-called UW-SVD-assisted iterative algorithms. In the case where 𝚽=𝚽zf𝚽subscript𝚽zf\mathbf{\Phi}=\mathbf{\Phi}_{\textsc{zf}}bold_Φ = bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT, 𝐬tsubscript𝐬𝑡\mathbf{s}_{t}bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will converge to the e-ZF solution. Conversely, when 𝚽=𝚽lmmse𝚽subscript𝚽lmmse\mathbf{\Phi}=\mathbf{\Phi}_{\textsc{lmmse}}bold_Φ = bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT, 𝐬tsubscript𝐬𝑡\mathbf{s}_{t}bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT will converge to the e-LMMSE solution. It is worth noting that the convergence rate of UW-SVD-assisted algorithms is dominated by cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ), rather than cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ). The comparison between cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) and cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) will be comprehensively explored in Section V.

Similar to (6), MS-based methods assisted by UW-SVD can be expressed as follows

fms(𝐬t;𝚽,𝜹)=𝐬t+𝐌1(𝜹𝚽𝐬t),subscript𝑓mssubscript𝐬𝑡𝚽𝜹subscript𝐬𝑡superscript𝐌1𝜹𝚽subscript𝐬𝑡f_{\textsc{ms}}(\mathbf{s}_{t};\mathbf{\Phi},\mbox{\boldmath$\delta$})=\mathbf% {s}_{t}+\mathbf{M}^{-1}(\mbox{\boldmath$\delta$}-\mathbf{\Phi}\mathbf{s}_{t}),italic_f start_POSTSUBSCRIPT ms end_POSTSUBSCRIPT ( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_Φ , bold_italic_δ ) = bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + bold_M start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ( bold_italic_δ - bold_Φ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (23)

where the preconditioning matrix 𝐌𝐌\mathbf{M}bold_M should be constructed based on the matrix splitting of 𝚽𝚽\mathbf{\Phi}bold_Φ. For the case 𝚽=𝚽zf𝚽subscript𝚽zf\mathbf{\Phi}=\mathbf{\Phi}_{\textsc{zf}}bold_Φ = bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT, JI and SSOR methods can be further simplified. Specifically, JI and RI are equivalent to each other because 𝐌ji=𝔻(𝚽zf)subscript𝐌ji𝔻subscript𝚽zf\mathbf{M}_{\textsc{ji}}=\mathbb{D}(\mathbf{\Phi}_{\textsc{zf}})bold_M start_POSTSUBSCRIPT ji end_POSTSUBSCRIPT = blackboard_D ( bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) is an identity matrix; and 𝐌𝐌\mathbf{M}bold_M for SSOR can be simplified to 𝐌ssor=𝕃(𝚽zf)𝕃(𝚽zf)Hsubscript𝐌ssor𝕃subscript𝚽zf𝕃superscriptsubscript𝚽zf𝐻\mathbf{M}_{\textsc{ssor}}=\mathbb{L}(\mathbf{\Phi}_{\textsc{zf}})\mathbb{L}(% \mathbf{\Phi}_{\textsc{zf}})^{H}bold_M start_POSTSUBSCRIPT ssor end_POSTSUBSCRIPT = blackboard_L ( bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) blackboard_L ( bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT.

Gradient methods can also be employed to address the problem in (17), such as SD, CG and L-BFGS. It is demonstrated that L-BFGS converges faster than SD while maintaining similar square-order complexity [24]. Moreover, it is proven that L-BFGS and CG are equivalent when solving the convex MIMO detection problem [48]. Therefore, we consider L-BFGS as an example of gradient methods in this paper. Its iterative process is given by [19]

flbfgs(𝐬t;𝚽,𝜹)=𝐬t+ξt𝐝t,subscript𝑓lbfgssubscript𝐬𝑡𝚽𝜹subscript𝐬𝑡subscript𝜉𝑡subscript𝐝𝑡f_{\textsc{lbfgs}}(\mathbf{s}_{t};\mathbf{\Phi},\mbox{\boldmath$\delta$})=% \mathbf{s}_{t}+\xi_{t}\mathbf{d}_{t},italic_f start_POSTSUBSCRIPT lbfgs end_POSTSUBSCRIPT ( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_Φ , bold_italic_δ ) = bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (24)

where ξtsubscript𝜉𝑡\xi_{t}italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes the step size as follows

ξt=𝐠tH𝐝t𝐝tH𝚽𝐝t,subscript𝜉𝑡superscriptsubscript𝐠𝑡𝐻subscript𝐝𝑡superscriptsubscript𝐝𝑡𝐻𝚽subscript𝐝𝑡\xi_{t}=-\dfrac{\mathbf{g}_{t}^{H}\mathbf{d}_{t}}{\mathbf{d}_{t}^{H}\mathbf{% \Phi}\mathbf{d}_{t}},italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = - divide start_ARG bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG start_ARG bold_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Φ bold_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG , (25)

and 𝐝tsubscript𝐝𝑡\mathbf{d}_{t}bold_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT denotes the update direction as follows

𝐝t=𝚯t𝐠t,subscript𝐝𝑡subscript𝚯𝑡subscript𝐠𝑡\mathbf{d}_{t}=\mathbf{\Theta}_{t}\mathbf{g}_{t},bold_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = bold_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , (26)

where 𝐠t𝚽𝐬t𝜹subscript𝐠𝑡𝚽subscript𝐬𝑡𝜹\mathbf{g}_{t}\triangleq\mathbf{\Phi}\mathbf{s}_{t}-\mbox{\boldmath$\delta$}bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≜ bold_Φ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_italic_δ denotes the gradient direction. 𝚯tsubscript𝚯𝑡\mathbf{\Theta}_{t}bold_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT represents the approximation of Hessian matrix, and it can be expressed as follows

𝚯t=((𝐬t𝐬t1)(𝐠t𝐠t1)H(𝐬t𝐬t1)H(𝐠t𝐠t1)𝐈)𝚯0,subscript𝚯𝑡subscript𝐬𝑡subscript𝐬𝑡1superscriptsubscript𝐠𝑡subscript𝐠𝑡1𝐻superscriptsubscript𝐬𝑡subscript𝐬𝑡1𝐻subscript𝐠𝑡subscript𝐠𝑡1𝐈subscript𝚯0\mathbf{\Theta}_{t}=\bigg{(}\dfrac{(\mathbf{s}_{t}-\mathbf{s}_{t-1})(\mathbf{g% }_{t}-\mathbf{g}_{t-1})^{H}}{(\mathbf{s}_{t}-\mathbf{s}_{t-1})^{H}(\mathbf{g}_% {t}-\mathbf{g}_{t-1})}-\mathbf{I}\bigg{)}\mathbf{\Theta}_{0},bold_Θ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = ( divide start_ARG ( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_s start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) ( bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_g start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT end_ARG start_ARG ( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_s start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ( bold_g start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT - bold_g start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ) end_ARG - bold_I ) bold_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT , (27)

where 𝚯0subscript𝚯0\mathbf{\Theta}_{0}bold_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT represents the initial approximation. Typically, 𝚯0subscript𝚯0\mathbf{\Theta}_{0}bold_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is set as 𝔻(𝚽)1𝔻superscript𝚽1\mathbb{D}(\mathbf{\Phi})^{-1}blackboard_D ( bold_Φ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. For e-ZF detector, the term 𝚯0subscript𝚯0\mathbf{\Theta}_{0}bold_Θ start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT in (27) can be omitted, since 𝔻(𝚽zf)1=𝐈𝔻superscriptsubscript𝚽zf1𝐈\mathbb{D}(\mathbf{\Phi}_{\textsc{zf}})^{-1}=\mathbf{I}blackboard_D ( bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT = bold_I.

Algorithm UW-SVD assisted L-BFGS algorithm
0:   𝐲𝐲\mathbf{y}bold_y: received signal vector; 𝐇𝐇\mathbf{H}bold_H: MIMO channel matrix; ρ𝜌\rhoitalic_ρ: SNR; T𝑇Titalic_T: number of iterations; 𝐬0=𝟎subscript𝐬00\mathbf{s}_{0}=\mathbf{0}bold_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT = bold_0: the initialization vector.
0:   𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG: the estimation of 𝐱𝐱\mathbf{x}bold_x;
0:  
1:  let t=0𝑡0t=0italic_t = 0; call (11) to compute 𝚿𝚿\mathbf{\Psi}bold_Ψ, 𝚺𝚺\mathbf{\Sigma}bold_Σ, and 𝐕𝐕\mathbf{V}bold_V;
2:  let 𝜹=𝚿H𝐲𝜹superscript𝚿𝐻𝐲\mbox{\boldmath$\delta$}=\mathbf{\Psi}^{H}\mathbf{y}bold_italic_δ = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_y; let 𝚽=𝚿H𝚿+ρ1𝚺2𝚽superscript𝚿𝐻𝚿superscript𝜌1superscript𝚺2\mathbf{\Phi}=\mathbf{\Psi}^{H}\mathbf{\Psi}+\rho^{-1}\mathbf{\Sigma}^{-2}bold_Φ = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT;
3:  call (24) to compute 𝐬t+1subscript𝐬𝑡1\mathbf{s}_{t+1}bold_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT; then tt+1𝑡𝑡1t\leftarrow t+1italic_t ← italic_t + 1;
4:  repeat step 3 until t=T𝑡𝑇t=Titalic_t = italic_T;
5:  call (19) to compute 𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG;
5:  

Pseudocode: The UW-SVD assisted L-BFGS algorithm is presented in the Algorithm. It can provide the LMMSE detection performance, since the filter matrix 𝚽𝚽\mathbf{\Phi}bold_Φ is set to be 𝚽lmmsesubscript𝚽lmmse\mathbf{\Phi}_{\textsc{lmmse}}bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT. By setting 𝚽=𝚿H𝚿𝚽superscript𝚿𝐻𝚿\mathbf{\Phi}=\mathbf{\Psi}^{H}\mathbf{\Psi}bold_Φ = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ in step 2222, the algorithm will provide ZF detection performance. In step 3333, (24) is the iterative function of L-BFGS. Therefore, if it is replaced by (23), the Algorithm would become UW-SVD assisted MS-based methods. Step 5555 is the post-processing step. It aims to reconvert 𝐱^^𝐱\widehat{\mathbf{x}}over^ start_ARG bold_x end_ARG from 𝐬^^𝐬\widehat{\mathbf{s}}over^ start_ARG bold_s end_ARG. Additionally, UW-SVD can accelerate the convergence of numerous other iterative algorithms, such as SD and CG. The proposed UW-SVD method leverages sample structure to facilitate their application in accelerating the convergence of various existing algorithms.

III-E Complexity Analysis

TABLE I: Complexity Analysis of UW-SVD and Various Iterative MIMO Detectors
Algorithms Calculation of 𝐀𝐀\mathbf{A}bold_A or 𝚽𝚽\mathbf{\Phi}bold_Φ Matrix Inverse Per Iteration UW-SVD
ZF/LMMSE MN2𝑀superscript𝑁2MN^{2}italic_M italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N3superscript𝑁3N^{3}italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT 0 NueMN+NueN+2Nsubscript𝑁ue𝑀𝑁subscript𝑁ue𝑁2𝑁N_{\textsc{ue}}MN+N_{\textsc{ue}}N+2Nitalic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_M italic_N + italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_N + 2 italic_N
RI 00 00 2MN2𝑀𝑁2MN2 italic_M italic_N
JI 00 00 2MN+N2𝑀𝑁𝑁2MN+N2 italic_M italic_N + italic_N
GS MN2𝑀superscript𝑁2MN^{2}italic_M italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N2superscript𝑁2N^{2}italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 1.5N21.5superscript𝑁21.5N^{2}1.5 italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
SSOR MN2𝑀superscript𝑁2MN^{2}italic_M italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT N2superscript𝑁2N^{2}italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT 2N2+N2superscript𝑁2𝑁2N^{2}+N2 italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + italic_N
L-BFGS 00 00 4MN+N2+5N4𝑀𝑁superscript𝑁25𝑁4MN+N^{2}+5N4 italic_M italic_N + italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT + 5 italic_N

The objective of this section is to demonstrate that the proposed UW-SVD method has low computational-complexity. To simplify and clarify the complexity analysis, we adopt a common assumption in multi-user MIMO systems with K𝐾Kitalic_K users, where each user has Nuesubscript𝑁ueN_{\textsc{ue}}italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT antennas, i.e., KNue=N𝐾subscript𝑁ue𝑁KN_{\textsc{ue}}=Nitalic_K italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = italic_N. The computational complexity of UW-SVD assisted iterative algorithms can be divided into three main parts: UW-SVD, post-processing, and iterative process. We start the complexity analysis from the UW-SVD and post-processing steps.

Performing SVD on 𝐇ksubscript𝐇𝑘\mathbf{H}_{k}bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT has a complexity of MNue2𝑀superscriptsubscript𝑁ue2MN_{\textsc{ue}}^{2}italic_M italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT [49], resulting in a total complexity of KMNue2𝐾𝑀superscriptsubscript𝑁ue2KMN_{\textsc{ue}}^{2}italic_K italic_M italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT for all the users. Also, the complexity of UW-SVD can also be expressed as NueMNsubscript𝑁ue𝑀𝑁N_{\textsc{ue}}MNitalic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_M italic_N, since KNue=N𝐾subscript𝑁ue𝑁KN_{\textsc{ue}}=Nitalic_K italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = italic_N. In the post-processing step, computing 𝚺1superscript𝚺1\mathbf{\Sigma}^{-1}bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT has a complexity of N𝑁Nitalic_N, while the computation of [𝚺1𝐬]delimited-[]superscript𝚺1𝐬[\mathbf{\Sigma}^{-1}\mathbf{s}][ bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_s ] has the same complexity of N𝑁Nitalic_N. Moreover, given that 𝐕𝐕\mathbf{V}bold_V is a block diagonal matrix, the complexity of calculating 𝐕[𝚺1𝐬t]𝐕delimited-[]superscript𝚺1subscript𝐬𝑡\mathbf{V}[\mathbf{\Sigma}^{-1}\mathbf{s}_{t}]bold_V [ bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] is KNue2=NueN𝐾superscriptsubscript𝑁ue2subscript𝑁ue𝑁KN_{\textsc{ue}}^{2}=N_{\textsc{ue}}Nitalic_K italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_N. Therefore, the overall complexity of the post-processing step is NueN+2Nsubscript𝑁ue𝑁2𝑁N_{\textsc{ue}}N+2Nitalic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_N + 2 italic_N. Furthermore, the total complexity of UW-SVD together with the post-processing step is NueMN+NueN+2Nsubscript𝑁ue𝑀𝑁subscript𝑁ue𝑁2𝑁N_{\textsc{ue}}MN+N_{\textsc{ue}}N+2Nitalic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_M italic_N + italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT italic_N + 2 italic_N. In MIMO systems, the number of antennas per UE is usually small, typically Nue=2subscript𝑁ue2N_{\textsc{ue}}=2italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 2 or 4444. Hence, the complexity of UW-SVD method stays at the quadratic order.

It is worth noting that not all the iterative algorithms require the computation of 𝐀𝐀\mathbf{A}bold_A or 𝚽𝚽\mathbf{\Phi}bold_Φ, including RI, JI, and L-BFGS methods. Taking RI as an example, if we replace 𝚽zf=𝚿H𝚿subscript𝚽zfsuperscript𝚿𝐻𝚿\mathbf{\Phi}_{\textsc{zf}}=\mathbf{\Psi}^{H}\mathbf{\Psi}bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ in (23), its iterative process can be expressed as follows

fri(𝐬t;𝚽,𝜹)=𝐬t+(𝜹𝚿H𝚿𝐬t),subscript𝑓risubscript𝐬𝑡𝚽𝜹subscript𝐬𝑡𝜹superscript𝚿𝐻𝚿subscript𝐬𝑡f_{\textsc{ri}}(\mathbf{s}_{t};\mathbf{\Phi},\mbox{\boldmath$\delta$})=\mathbf% {s}_{t}+(\mbox{\boldmath$\delta$}-\mathbf{\Psi}^{H}\mathbf{\Psi}\mathbf{s}_{t}),italic_f start_POSTSUBSCRIPT ri end_POSTSUBSCRIPT ( bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ; bold_Φ , bold_italic_δ ) = bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( bold_italic_δ - bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) , (28)

where we can first compute [𝚿𝐬t]delimited-[]𝚿subscript𝐬𝑡[\mathbf{\Psi}\mathbf{s}_{t}][ bold_Ψ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] with a complexity of MN𝑀𝑁MNitalic_M italic_N, and then compute 𝚿H[𝚿𝐬t]superscript𝚿𝐻delimited-[]𝚿subscript𝐬𝑡\mathbf{\Psi}^{H}[\mathbf{\Psi}\mathbf{s}_{t}]bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT [ bold_Ψ bold_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ] with another complexity of MN𝑀𝑁MNitalic_M italic_N. In this successive manner, the calculation of 𝚽𝚽\mathbf{\Phi}bold_Φ can be avoided. This can also be applied to all the other iterative methods, such as the calculation of ξtsubscript𝜉𝑡\xi_{t}italic_ξ start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT in (25) in L-BFGS method. In addition, a similar complexity can be obtained by replacing 𝚽zfsubscript𝚽zf\mathbf{\Phi}_{\textsc{zf}}bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT with 𝚽lmmsesubscript𝚽lmmse\mathbf{\Phi}_{\textsc{lmmse}}bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT in (28). The complexity of computing 𝚺1superscript𝚺1\mathbf{\Sigma}^{-1}bold_Σ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is only N𝑁Nitalic_N, since it is a diagonal matrix. Furthermore, the calculation of 𝔻(𝚿H𝚿)1𝔻superscriptsuperscript𝚿𝐻𝚿1\mathbb{D}(\mathbf{\Psi}^{H}\mathbf{\Psi})^{-1}blackboard_D ( bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT in JI and L-BFGS methods can be ignored because it is an identity matrix according to (16).

The authors are aware that certain iterative algorithms require the calculation of 𝐀𝐀\mathbf{A}bold_A or 𝚽𝚽\mathbf{\Phi}bold_Φ, such as the GS and SSOR methods. This is because they need to compute 𝕃(𝐀)1𝕃superscript𝐀1\mathbb{L}(\mathbf{A})^{-1}blackboard_L ( bold_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT or 𝕃(𝚽)1𝕃superscript𝚽1\mathbb{L}(\mathbf{\Phi})^{-1}blackboard_L ( bold_Φ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT. Furthermore, the complexity of computing 𝕃(𝐀)1𝕃superscript𝐀1\mathbb{L}(\mathbf{A})^{-1}blackboard_L ( bold_A ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT or 𝕃(𝚽)1𝕃superscript𝚽1\mathbb{L}(\mathbf{\Phi})^{-1}blackboard_L ( bold_Φ ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT is N2superscript𝑁2N^{2}italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT due to the triangular structure. The complexity of these iterative algorithms, in short, remains essentially the same whether UW-SVD is applied or not. Moreover, ZF and LMMSE detectors require the computation of 𝐀𝐀\mathbf{A}bold_A with a cubic-order complexity of MN2𝑀superscript𝑁2MN^{2}italic_M italic_N start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. They also necessitate the calculation of 𝐀1superscript𝐀1\mathbf{A}^{-1}bold_A start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT with cubic-order complexity of N3superscript𝑁3N^{3}italic_N start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPT. TABLE I summarizes the complexity of UW-SVD and various MIMO detectors. The matrix inverse operation is the primary reason why ZF/LMMSE is impractical for real-time signal processing due to its serially computational complexity of cubic order [17]. This motivates traditional iterative methods to (partially) circumvent the need for matrix inversion.

Discussion of complexity reduction: Our simulation results demonstrate that the proposed UW-SVD method can reduce the computational complexity of SSOR and L-BFGS methods by up to 90%percent9090\%90 % and 67%percent6767\%67 %, respectively (see Figs. 5(d) and 7). This substantial reduction in complexity is primarily achieved by decreasing the number of iterations required, despite the additional complexity introduced by UW-SVD. As shown in TABLE I, the extra computational burden imposed by UW-SVD is equivalent to 16161616 SSOR iterations or a single L-BFGS iteration. However, the advantages of UW-SVD far outweigh its cost. For instance, the result in Fig. 5(d) shows that UW-SVD can accelerate SSOR by up to approximately 240240240240 iterations. Moreover, our simulation results in Fig. 7 demonstrate that UW-SVD can accelerate L-BFGS method by up to 13131313 iterations. This significant acceleration in convergence of UW-SVD more than offsets its additional processing cost, thus significantly reducing the overall computational complexity.

IV Convergence Analysis

In this section, the objective is to compare cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) and cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) in both massive-MIMO and ELAA-MIMO systems. The next two subsections provide detailed results of the comparison in each system, respectively.

IV-A Massive-MIMO with i.i.d. Rayleigh Fading Channels

To better understand the relationship between cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) and cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ), we first introduce the following concept of favorable propagation in massive-MIMO systems.

Property 1 (Favorable Propagation [50])

Suppose that elements of 𝐇𝐇\mathbf{H}bold_H to follow independent and identically distributed (i.i.d.) Rayleigh fading as (40), given Nk,ksubscript𝑁𝑘for-all𝑘N_{k},\forall kitalic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ∀ italic_k, as M𝑀Mitalic_M tends to infinity, we have the following

limM𝐇kH𝐇k=𝐈,k.subscript𝑀superscriptsubscript𝐇𝑘𝐻subscript𝐇𝑘𝐈for-all𝑘\lim\limits_{M\rightarrow\infty}\mathbf{H}_{k}^{H}\mathbf{H}_{k}=\mathbf{I},% \quad\forall k.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_I , ∀ italic_k . (29)
Theorem 1

Suppose that every element of 𝐇𝐇\mathbf{H}bold_H obeys an i.i.d. Rayleigh distribution in (40), given Nk,ksubscript𝑁𝑘for-all𝑘N_{k},\forall kitalic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ∀ italic_k, as M𝑀Mitalic_M tends to infinity, we have the following

limMcond(𝚽zf)=cond(𝐀zf);subscript𝑀condsubscript𝚽zfcondsubscript𝐀zf\displaystyle\lim\limits_{M\rightarrow\infty}\mathrm{cond}(\mathbf{\Phi}_{% \textsc{zf}})=\mathrm{cond}(\mathbf{A}_{\textsc{zf}});roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT roman_cond ( bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) = roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) ; (30)
limMcond(𝚽lmmse)=cond(𝐀lmmse).subscript𝑀condsubscript𝚽lmmsecondsubscript𝐀lmmse\displaystyle\lim\limits_{M\rightarrow\infty}\mathrm{cond}(\mathbf{\Phi}_{% \textsc{lmmse}})=\mathrm{cond}(\mathbf{A}_{\textsc{lmmse}}).roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT roman_cond ( bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) = roman_cond ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) . (31)
Proof:

See Appendix A

Theorem 1 implies that the UW-SVD-assisted algorithm has a comparable convergence rate comparable to the existing algorithm. The reason is that the intra-user interference in i.i.d. Rayleigh fading channels is very weak, which could limit the gain of UW-SVD. This theoretical finding is verified in the numerical results of Experiment 1 in Section V-C. On the contrary, intra-user interference is strong in spatially correlated (ELAA-)MIMO systems, especially in the presence of LoS links. In the next subsection, we will show that cond(𝚽)<cond(𝐀)cond𝚽cond𝐀\mathrm{cond}(\mathbf{\Phi})<\mathrm{cond}(\mathbf{A})roman_cond ( bold_Φ ) < roman_cond ( bold_A ) in such channels.

IV-B Spatially Correlated (ELAA-)MIMO Channels

Given that UW-SVD aims to address intra-user interference, our focus lies primarily on understanding the user-side spatial correlation 444In this section, we focus on the user-side spatial correlation to facilitate the theoretical analysis. However, in our simulations, we adopt the Kronecker model in (44), which considers both user-side and BS-side spatial correlations., which is defined as follows [8]

𝐑ue𝔼{𝐇H𝐇},subscript𝐑ue𝔼superscript𝐇𝐻𝐇\mathbf{R}_{\textsc{ue}}\triangleq\mathbb{E}\{\mathbf{H}^{H}\mathbf{H}\},bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT ≜ blackboard_E { bold_H start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H } , (32)

where 𝐑uesubscript𝐑ue\mathbf{R}_{\textsc{ue}}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT is usually described by an exponential correlation matrix in conventional massive-MIMO systems. For example, if two user antennas are situated at a distance of d𝑑ditalic_d, the correlation between these two antennas can be expressed as follows [8]

r(d)=exp(d/μ),𝑟𝑑𝑑𝜇r(d)=\exp(-d/\mu),italic_r ( italic_d ) = roman_exp ( - italic_d / italic_μ ) , (33)

where μ0𝜇0\mu\geq 0italic_μ ≥ 0 represents the scaling factor. In multi-user MIMO systems, the distance between different users is typically much greater than the distance between antennas belonging to the same UE. Therefore, we have the following assumption:

A1):𝐑uek,j=𝟎,kj,formulae-sequenceA1):superscriptsubscript𝐑ue𝑘𝑗0for-all𝑘𝑗\textit{A1):}\quad\mathbf{R}_{\textsc{ue}}^{k,j}=\mathbf{0},\quad\forall k\neq j,A1): bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_j end_POSTSUPERSCRIPT = bold_0 , ∀ italic_k ≠ italic_j , (34)

where 𝐑uek,jNk×Njsuperscriptsubscript𝐑ue𝑘𝑗superscriptsubscript𝑁𝑘subscript𝑁𝑗\mathbf{R}_{\textsc{ue}}^{k,j}\in\mathbb{R}^{N_{k}\times N_{j}}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_j end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT end_POSTSUPERSCRIPT denotes a block of 𝐑uesubscript𝐑ue\mathbf{R}_{\textsc{ue}}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT representing the correlation between user k𝑘kitalic_k and user j𝑗jitalic_j.

Let us take an example to validate this assumption. Suppose we have a UE equipped with two antennas spaced apart by half the carrier wavelength. Assuming the carrier frequency is 3.53.53.53.5 GHzGHz\mathrm{GHz}roman_GHz and the parameter μ𝜇\muitalic_μ equals 0.20.20.20.2, the correlation between the two intra-user antennas is r(0.0429)0.8𝑟0.04290.8r(0.0429)\approx 0.8italic_r ( 0.0429 ) ≈ 0.8. This suggests a significant spatial correlation between the two intra-user antennas. In contrast, when considering two distinct users separated by one meter, their correlation r(1)4.5×105𝑟14.5superscript105r(1)\approx 4.5\times 10^{-5}italic_r ( 1 ) ≈ 4.5 × 10 start_POSTSUPERSCRIPT - 5 end_POSTSUPERSCRIPT implies nearly orthogonal behavior. In real-world scenarios, user distances typically exceed one meter. Therefore, the assumption A1) in (34) is validated for practical MIMO systems.

Lemma 1

Given A1, suppose the MIMO channel is 𝐇=𝛀𝐑ue𝐇𝛀subscript𝐑ue\mathbf{H}=\mathbf{\Omega}\sqrt{\mathbf{R}_{\textsc{ue}}}bold_H = bold_Ω square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT end_ARG, where each element of 𝛀𝛀\mathbf{\Omega}bold_Ω follows an i.i.d. Rayleigh distribution, we have the following

limM𝐇kH𝐇j=𝟎,kj,formulae-sequencesubscript𝑀superscriptsubscript𝐇𝑘𝐻subscript𝐇𝑗0for-all𝑘𝑗\lim\limits_{M\rightarrow\infty}\mathbf{H}_{k}^{H}\mathbf{H}_{j}=\mathbf{0},% \quad\forall k\neq j,roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_0 , ∀ italic_k ≠ italic_j , (35)

where 𝐑uek,kNk×Nksuperscriptsubscript𝐑ue𝑘𝑘superscriptsubscript𝑁𝑘subscript𝑁𝑘\mathbf{R}_{\textsc{ue}}^{k,k}\in\mathbb{R}^{N_{k}\times N_{k}}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_k end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT × italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT is a block of 𝐑uesubscript𝐑ue\mathbf{R}_{\textsc{ue}}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT representing the correlation between the antenna elements of user k𝑘kitalic_k.

Proof:

See Appendix B. ∎

In massive-MIMO systems, the number of service-antennas can be in the hundreds or even thousands. Consequently, the condition in (35) can be approximated as follows

A2):𝐇kH𝐇j=𝟎,kjformulae-sequenceA2):superscriptsubscript𝐇𝑘𝐻subscript𝐇𝑗0for-all𝑘𝑗\textit{A2):}\quad\mathbf{H}_{k}^{H}\mathbf{H}_{j}=\mathbf{0},\quad\forall k\neq jA2): bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_0 , ∀ italic_k ≠ italic_j (36)

Moreover, as discussed in section I-B, in ELAA-MIMO systems, intra-user interference is much stronger than inter-user interference. This is because the dominant power of different users can be received by different service antennas. This phenomenon is known as spatial orthogonality [51, 6]. Therefore, it can be assumed that A2 also holds in ELAA-MIMO systems. With assumption A2, our focus shifts to the comparison between cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) and cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) for ZF and LMMSE detectors, respectively.

Theorem 2

Given A2, it can be obtained that the condition number of 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT is larger than that of 𝚽zfsubscript𝚽zf\mathbf{\Phi}_{\textsc{zf}}bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT, i.e.,

cond(𝚽zf)<cond(𝐀zf).condsubscript𝚽zfcondsubscript𝐀zf\mathrm{cond}(\mathbf{\Phi}_{\textsc{zf}})<\mathrm{cond}(\mathbf{A}_{\textsc{% zf}}).roman_cond ( bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) < roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) . (37)
Proof:

See Appendix C

Theorem 3

Given A2, suppose that the transmitted power is normalized to 1111, i.e., σx2=1superscriptsubscript𝜎𝑥21\sigma_{x}^{2}=1italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1, we have the following

cond(𝚽lmmse)<cond(𝐀lmmse),condsubscript𝚽lmmsecondsubscript𝐀lmmse\mathrm{cond}(\mathbf{\Phi}_{\textsc{lmmse}})<\mathrm{cond}(\mathbf{A}_{% \textsc{lmmse}}),roman_cond ( bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) < roman_cond ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) , (38)

when

cond(𝐀zf)λmin(𝐀zf)>σz2.condsubscript𝐀zfsubscript𝜆minsubscript𝐀zfsuperscriptsubscript𝜎𝑧2\sqrt{\mathrm{cond}(\mathbf{A}_{\textsc{zf}})}\lambda_{\text{min}}(\mathbf{A}_% {\textsc{zf}})>\sigma_{z}^{2}.square-root start_ARG roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) > italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (39)
Proof:

See Appendix D

Remark 2

Theorem 2 implies that UW-SVD-assisted algorithms can provide faster convergence to ZF detection performance compared to current algorithms. Theorem 3 suggests the similar conclusion for LMMSE detectors but with the constraint that the SNR should be greater than a threshold. Note that in any MIMO system for multiplexing transmission, it is typically necessary for the minimum eigenvalue of the channel gain to exceed the noise power, i.e., λmin(𝐀zf)>σz2subscript𝜆minsubscript𝐀zfsuperscriptsubscript𝜎𝑧2\lambda_{\text{min}}(\mathbf{A}_{\textsc{zf}})>\sigma_{z}^{2}italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) > italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. Also, the condition number should be greater than 1111 based on its definition. Therefore, the inequality in (39) will always be satisfied in practical MIMO systems if we aim for acceptable detection performance using multiplexing techniques.

Remark 3

Theorem 2 and Theorem 3 imply that M𝑀Mitalic_M is much greater than N𝑁Nitalic_N or approaches infinity. However, deploying such a large number of service antennas is economically impractical in real-world scenarios. This necessitates the determination of specific, implementable ranges for M𝑀Mitalic_M and N𝑁Nitalic_N. However, the stochastic nature of (ELAA-)MIMO channels poses a significant challenge. It is impossible to mathematically derive an exact formula for the M/N𝑀𝑁M/Nitalic_M / italic_N ratio at which UW-SVD outperforms conventional methods. To address this, we turn to experimental results for insights into practical M/N𝑀𝑁M/Nitalic_M / italic_N ratios. Our experiments, detailed in Section V, reveal that UW-SVD achieves significant gains when the M/N𝑀𝑁M/Nitalic_M / italic_N ratio is 4444 or 8888. These ratios are not only feasible but also commonly found in MIMO systems. This alignment between our findings and real-world parameters ensures the applicability of our methods to practical implementations.

V Numerical and Simulation Results

In this section, the objectives are 1) to compare cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) and cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) in various types of (ELAA-)MIMO channels; 2) to demonstrate that UW-SVD accelerates the convergence of current algorithms; and 3) to establish that the advantages observed in uncoded MIMO systems also apply to coded MIMO systems. This motivates the following three subsections.

V-A Channel Models

Model 1: In massive-MIMO systems, each element of 𝐇𝐇\mathbf{H}bold_H is usually assumed to obey i.i.d. Rayleigh fading as follows

Hm,n=ωm,n𝒞𝒩(0,1/M),subscript𝐻𝑚𝑛subscript𝜔𝑚𝑛similar-to𝒞𝒩01𝑀H_{m,n}=\omega_{m,n}\sim\mathcal{CN}(0,1/M),italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = italic_ω start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ∼ caligraphic_C caligraphic_N ( 0 , 1 / italic_M ) , (40)

where 1/M1𝑀1/M1 / italic_M denotes the normalized variance of each channel element. This indicates that the propagation environment is in non-LoS (NLoS) state 555In LoS state, the massive-MIMO channels can also be described by i.i.d. Rician fading. However, the far-field Rician channel cannot support multiple data streams per UE [37, 38], and are therefore not the scope of this paper..

Model 2: Spherical wavefront should be taken into account for ELAA channel modeling. In NLoS state, (40) should be extended to i.n.d. (n. for non-identical) Rayleigh fading as follows [52]

Hm,n=Hm,n(0)(β(0)dm,nγ(0))ωm,n,subscript𝐻𝑚𝑛superscriptsubscript𝐻𝑚𝑛0superscript𝛽0superscriptsubscript𝑑𝑚𝑛superscript𝛾0subscript𝜔𝑚𝑛H_{m,n}=H_{m,n}^{(0)}\triangleq\Bigg{(}\dfrac{\beta^{(0)}}{d_{m,n}^{\gamma^{(0% )}}}\Bigg{)}\omega_{m,n},italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT ≜ ( divide start_ARG italic_β start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG ) italic_ω start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT , (41)

where dm,nsubscript𝑑𝑚𝑛d_{m,n}italic_d start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT denotes the distance between the mthsuperscript𝑚𝑡m^{th}italic_m start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT service-antenna and the nthsuperscript𝑛𝑡n^{th}italic_n start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT user antenna; β(0)superscript𝛽0\beta^{(0)}italic_β start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT and γ(0)superscript𝛾0\gamma^{(0)}italic_γ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT represent the NLoS path-loss coefficient and exponent, respectively.

Model 3: Similarly, the ELAA channel in LoS state is described to obey i.n.d. Rician fading as follows [35]

Hm,n=Hm,n(1)β(1)dm,nγ(1)(κκ+1φm,n+1κ+1ωm,n).subscript𝐻𝑚𝑛superscriptsubscript𝐻𝑚𝑛1superscript𝛽1superscriptsubscript𝑑𝑚𝑛superscript𝛾1𝜅𝜅1subscript𝜑𝑚𝑛1𝜅1subscript𝜔𝑚𝑛H_{m,n}=H_{m,n}^{(1)}\triangleq\dfrac{\beta^{(1)}}{d_{m,n}^{\gamma^{(1)}}}% \Bigg{(}\sqrt{\dfrac{\kappa}{\kappa+1}}\varphi_{m,n}+\sqrt{\dfrac{1}{\kappa+1}% }\omega_{m,n}\Bigg{)}.italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT ≜ divide start_ARG italic_β start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_ARG start_ARG italic_d start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT end_POSTSUPERSCRIPT end_ARG ( square-root start_ARG divide start_ARG italic_κ end_ARG start_ARG italic_κ + 1 end_ARG end_ARG italic_φ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT + square-root start_ARG divide start_ARG 1 end_ARG start_ARG italic_κ + 1 end_ARG end_ARG italic_ω start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ) . (42)

where β(1)superscript𝛽1\beta^{(1)}italic_β start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT and γ(1)superscript𝛾1\gamma^{(1)}italic_γ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT represent the LoS path-loss coefficient and exponent, respectively, κ𝜅\kappaitalic_κ denotes the Rician K-factor, φm,n=exp(j2πϑdm,n)subscript𝜑𝑚𝑛𝑗2𝜋italic-ϑsubscript𝑑𝑚𝑛\varphi_{m,n}=\exp(-j\frac{2\pi}{\vartheta}d_{m,n})italic_φ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = roman_exp ( - italic_j divide start_ARG 2 italic_π end_ARG start_ARG italic_ϑ end_ARG italic_d start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ) the phase of direct LoS link, and ϑitalic-ϑ\varthetaitalic_ϑ denotes the wavelength of the carrier wave.

Model 4: ELAA channel could allow a mixed of LoS and NLoS links due to the large aperture [36]. Each element of 𝐇𝐇\mathbf{H}bold_H in this case can be expressed as follows

Hm,n=ϵm,n(ηm,n)Hm,n(ηm,n),subscript𝐻𝑚𝑛superscriptsubscriptitalic-ϵ𝑚𝑛subscript𝜂𝑚𝑛superscriptsubscript𝐻𝑚𝑛subscript𝜂𝑚𝑛H_{m,n}=\epsilon_{m,n}^{(\eta_{m,n})}H_{m,n}^{(\eta_{m,n})},italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = italic_ϵ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT , (43)

where ηm,n{0,1}subscript𝜂𝑚𝑛01\eta_{m,n}\in\{0,1\}italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ∈ { 0 , 1 } is a binary random variable, with ηm,n=0subscript𝜂𝑚𝑛0\eta_{m,n}=0italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = 0 indicates the NLoS state with Hm,n(ηm,n)superscriptsubscript𝐻𝑚𝑛subscript𝜂𝑚𝑛H_{m,n}^{(\eta_{m,n})}italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT turning into Hm,n(0)superscriptsubscript𝐻𝑚𝑛0H_{m,n}^{(0)}italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT in (40), or otherwise ηm,n=1subscript𝜂𝑚𝑛1\eta_{m,n}=1italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT = 1 indicates the LoS state with Hm,n(ηm,n)superscriptsubscript𝐻𝑚𝑛subscript𝜂𝑚𝑛H_{m,n}^{(\eta_{m,n})}italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( italic_η start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT ) end_POSTSUPERSCRIPT turning into Hm,n(1)superscriptsubscript𝐻𝑚𝑛1H_{m,n}^{(1)}italic_H start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT in (42); ϵm,nsubscriptitalic-ϵ𝑚𝑛\epsilon_{m,n}italic_ϵ start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT denotes the shadowing effects. The spatial correlations of LoS/NLoS states and shadowing effects are described by exponentially decaying window [36]. This channel model can yield computer-simulated data that fit well with real-world measurement data, e.g, [39, 40]. Therefore, we employ this ELAA channel model to conduct computer simulations.

Kronecker Model: Let 𝛀M×N𝛀superscript𝑀𝑁\mathbf{\Omega}\in\mathbb{C}^{M\times N}bold_Ω ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_N end_POSTSUPERSCRIPT be an i.i.d. complex Gaussian matrix, where its (m,n)𝑚𝑛(m,n)( italic_m , italic_n )-th element is denoted by ωm,nsubscript𝜔𝑚𝑛\omega_{m,n}italic_ω start_POSTSUBSCRIPT italic_m , italic_n end_POSTSUBSCRIPT, as defined in (40). The four channel models above can be converted to their spatially correlated versions by replacing 𝛀𝛀\mathbf{\Omega}bold_Ω with 𝛀kronsubscript𝛀kron\mathbf{\Omega}_{\text{kron}}bold_Ω start_POSTSUBSCRIPT kron end_POSTSUBSCRIPT, as follows [8]

𝛀kron=𝐑bs𝛀𝐑ue,subscript𝛀kronsubscript𝐑bs𝛀subscript𝐑ue\mathbf{\Omega}_{\text{kron}}=\sqrt{\mathbf{R}_{\textsc{bs}}}\mathbf{\Omega}% \sqrt{\mathbf{R}_{\textsc{ue}}},bold_Ω start_POSTSUBSCRIPT kron end_POSTSUBSCRIPT = square-root start_ARG bold_R start_POSTSUBSCRIPT bs end_POSTSUBSCRIPT end_ARG bold_Ω square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT end_ARG , (44)

where 𝐑bsM×Msubscript𝐑bssuperscript𝑀𝑀\mathbf{R}_{\textsc{bs}}\in\mathbb{R}^{M\times M}bold_R start_POSTSUBSCRIPT bs end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_M × italic_M end_POSTSUPERSCRIPT and 𝐑ueN×Nsubscript𝐑uesuperscript𝑁𝑁\mathbf{R}_{\textsc{ue}}\in\mathbb{R}^{N\times N}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT are both exponential correlation matrices representing the BS and UE side correlations, respectively. In MIMO systems, the minimum distance between two antennas should be ϑ/2italic-ϑ2\vartheta/2italic_ϑ / 2. Therefore, we define the following

ϱr(ϑ/2),italic-ϱ𝑟italic-ϑ2\varrho\triangleq r(\vartheta/2),italic_ϱ ≜ italic_r ( italic_ϑ / 2 ) , (45)

where ϱitalic-ϱ\varrhoitalic_ϱ is the spatial correlation between the two closest antennas. Additionally, when ϱ=0italic-ϱ0\varrho=0italic_ϱ = 0, it means that the small-scale fading of distinct user-to-service antenna links is generated independently.

V-B Baselines

The following iterative algorithms are set as baselines for our simulations: GS, SSOR, and L-BFGS. AMP is not used as a baseline because it converges slower than the L-BFGS method and diverges in the ELAA channel. Additionally, it must be modified further to recover the e-signal vector. Due to the page limitations, we cannot demonstrate all the iterative algorithms that proposed in the last sixty years [16]. However, the baselines in this section are sufficient to demonstrate the advantages of the proposed UW-SVD method.

V-C System Setup and Experiments

The carrier frequency is set to be 3.53.53.53.5 GHzGHz\mathrm{GHz}roman_GHz. The service array is configured as a uniformly linear array (ULA)666An exception is Fig. 7, where the service antenna array is configured as a uniform planar array (UPA) with M=16×16𝑀1616M=16\times 16italic_M = 16 × 16 antennas. with spacing at half the wavelength. The users are deployed parallel with the ULA at a perpendicular distance of 15151515 meters. Each user is equipped with Nuesubscript𝑁ueN_{\textsc{ue}}italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT antennas spaced at half the wavelength. The maximum distance between two users is set to be 30303030 meters. To ensure a fair comparison of different types of channel models, we normalize the channel gain for each UE, i.e., 𝐇k2=Nue,ksuperscriptnormsubscript𝐇𝑘2subscript𝑁uefor-all𝑘\|\mathbf{H}_{k}\|^{2}=N_{\textsc{ue}},{\forall k}∥ bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∥ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT , ∀ italic_k. This normalization does not change intra-user interference, which is the primary focus of this paper. The wireless environment is assumed to be urban-micro street canyon, and the propagation parameters are determined according to the 3rd Generation Partnership Project (3GPP) technical report [53], as follows: β(0)=0.020superscript𝛽00.020\beta^{(0)}=0.020italic_β start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 0.020, γ(0)=1.765superscript𝛾01.765\gamma^{(0)}=1.765italic_γ start_POSTSUPERSCRIPT ( 0 ) end_POSTSUPERSCRIPT = 1.765, β(1)=0.007superscript𝛽10.007\beta^{(1)}=0.007italic_β start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = 0.007, γ(1)=1.050superscript𝛾11.050\gamma^{(1)}=1.050italic_γ start_POSTSUPERSCRIPT ( 1 ) end_POSTSUPERSCRIPT = 1.050, κ=9𝜅9\kappa=9italic_κ = 9 dBdB\mathrm{dB}roman_dB for Model 3, κ𝒩(9dB,10dB)similar-to𝜅𝒩9dB10dB\kappa\sim\mathcal{LN}(9\ \mathrm{dB},10\ \mathrm{dB})italic_κ ∼ caligraphic_L caligraphic_N ( 9 roman_dB , 10 roman_dB ) for Model 4. The objectives of this section set the following three experiments.

Refer to caption
(a) ϱ=0italic-ϱ0\varrho=0italic_ϱ = 0
Refer to caption
(b) ϱ=0.5italic-ϱ0.5\varrho=0.5italic_ϱ = 0.5
Refer to caption
(c) ϱ=0.8italic-ϱ0.8\varrho=0.8italic_ϱ = 0.8
Figure 2: The comparison of cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) and cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) in Model 1 and Model 2; M=256𝑀256M=256italic_M = 256; K=8𝐾8K=8italic_K = 8; Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4; ρ=10𝜌10\rho=10italic_ρ = 10 dBdB\mathrm{dB}roman_dB. cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) is smaller than cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) for both ZF/LMMSE detectors, especially in correlated MIMO channels. The matrices in Model 1 and Model 2 have almost the same condition numbers.
Refer to caption
Figure 3: The comparison of cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) and cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) in Model 3 and Model 4; M=256𝑀256M=256italic_M = 256; K=8𝐾8K=8italic_K = 8; Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4. cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) is much smaller than cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) for both ZF/LMMSE detectors in the presence of LoS links. The condition number of 𝚽𝚽\mathbf{\Phi}bold_Φ is similar to that of i.i.d. Rayleigh channel at different correlation levels.

Experiment 1: The objective is to demonstrate that the relationship between cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) and cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) is consistent with the theoretical analysis presented in Section IV. The cumulative distribution functions (CDFs) of the condition numbers are shown in Figs. 2 and 3. In these two figures, there are M=256𝑀256M=256italic_M = 256 service antennas and K=8𝐾8K=8italic_K = 8 UEs, each equipped with Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4 antennas. In Fig. 2(a), where the channel elements are generated independently, it can be seen that cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) is only slightly smaller than cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) in both Model 1 and Model 2. Note that all the condition numbers in this figure are relatively small, which means that numerous iterative algorithms can achieve fast convergence. However, it is more practical to consider spatial correlations and such results are shown in Figs. 2(b) and 2(c). By comparing these two figures, it can be observed that 𝚽𝚽\mathbf{\Phi}bold_Φ is better conditioned than 𝐀𝐀\mathbf{A}bold_A, especially in highly correlated MIMO channels. This implies that the advantage of the proposed UW-SVD method will be more evident when the correlation increases.

Refer to caption
(a) ϱ=0italic-ϱ0\varrho=0italic_ϱ = 0; ρ=18𝜌18\rho=18italic_ρ = 18 dBdB\mathrm{dB}roman_dB
Refer to caption
(b) ϱ=0.5italic-ϱ0.5\varrho=0.5italic_ϱ = 0.5; ρ=20.5𝜌20.5\rho=20.5italic_ρ = 20.5 dBdB\mathrm{dB}roman_dB
Refer to caption
(c) ϱ=0.8italic-ϱ0.8\varrho=0.8italic_ϱ = 0.8; ρ=26.5𝜌26.5\rho=26.5italic_ρ = 26.5 dBdB\mathrm{dB}roman_dB
Figure 4: Convergence comparison between different iterative algorithms in Model 1. M=256𝑀256M=256italic_M = 256; K=8𝐾8K=8italic_K = 8; Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4; 16161616 QAM. UW-SVD-assisted algorithms can provide faster convergence compared to the corresponding existing algorithms, especially for correlated MIMO channels.

Fig. 3 shows the results in the presence of LoS links, which can make the wireless channel more ill-conditioned. In Model 3 (ϱ=0italic-ϱ0\varrho=0italic_ϱ = 0), it can be observed that cond(𝐀zf)condsubscript𝐀zf\mathrm{cond}(\mathbf{A}_{\textsc{zf}})roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) is approximately 60606060, meaning that 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT is ill-conditioned. Moreover, it will become even worse as the spatial correlation becomes higher, e.g., cond(𝐀zf)600condsubscript𝐀zf600\mathrm{cond}(\mathbf{A}_{\textsc{zf}})\approx 600roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) ≈ 600 when ϱ=0.8italic-ϱ0.8\varrho=0.8italic_ϱ = 0.8. In addition, cond(𝐀lmmse)condsubscript𝐀lmmse\mathrm{cond}(\mathbf{A}_{\textsc{lmmse}})roman_cond ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) is smaller than cond(𝐀zf)condsubscript𝐀zf\mathrm{cond}(\mathbf{A}_{\textsc{zf}})roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) due to the regularization term. Similar observations can also be found in Model 4. Moreover, cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) in Model 4 can be well-conditioned with a probability of about 0.20.20.20.2. This is because this model allows the mixture of LoS/NLoS links, and the randomly generated channel matrix could be in a fully NLoS state with a certain probability. This also leads to higher CDF fluctuations for cond(𝐀)cond𝐀\mathrm{cond}(\mathbf{A})roman_cond ( bold_A ) in Model 4 than in Model 3. In contrast, the fluctuations of cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) are very small, and the value of cond(𝚽)cond𝚽\mathrm{cond}(\mathbf{\Phi})roman_cond ( bold_Φ ) is close to that of i.i.d. Rayleigh fading channels. This implies that UW-SVD-assisted iterative methods can maintain consistently fast convergence even in the presence of increased intra-user interference.

Experiment 2: The objective is to demonstrate that the proposed UW-SVD-assisted iterative algorithms converge faster than corresponding existing algorithms in (ELAA-)MIMO systems. In this experiment, four figures (i.e., Fig. 4 - Fig. 7) are presented to highlight the advantages of the proposed UW-SVD method from different perspectives. In Fig. 4, the convergence comparison between different iterative algorithms at high SNRs is shown. It shows the average symbol error rate (SER) over the iterations in Model 1, considering three correlation levels. For the case ϱ=0italic-ϱ0\varrho=0italic_ϱ = 0, it can be seen that the proposed UW-SVD method can slightly accelerate the convergence of existing algorithms. However, it is worth noting that the advantage of the proposed UW-SVD method becomes more apparent as the correlation becomes larger. This is consistent with the numerical results in Experiment 1, and this figure indicates that the proposed UW-SVD method can accelerate the current iterative algorithm in conventional massive MIMO channels.

Refer to caption
(a) Model 2; 16161616 QAM; ϱ=0.5italic-ϱ0.5\varrho=0.5italic_ϱ = 0.5
Refer to caption
(b) Model 2; 16161616 QAM; ϱ=0.8italic-ϱ0.8\varrho=0.8italic_ϱ = 0.8
Refer to caption
(c) Model 4; 64646464 QAM; ϱ=0.5italic-ϱ0.5\varrho=0.5italic_ϱ = 0.5
Refer to caption
(d) Model 4; 64646464 QAM; ϱ=0.8italic-ϱ0.8\varrho=0.8italic_ϱ = 0.8
Figure 5: Convergence comparison between UW-SVD-assisted SSOR (red lines) and SSOR (black lines) methods converging to LMMSE detection performance; M=256𝑀256M=256italic_M = 256; K=8𝐾8K=8italic_K = 8; Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4. It is shown that SSOR (UW-SVD) converges faster than SSOR at all different SNR levels.
Refer to caption
(a) ϖ=20italic-ϖ20\varpi=20italic_ϖ = 20 dBdB\mathrm{dB}roman_dB
Refer to caption
(b) ϖ=15italic-ϖ15\varpi=15italic_ϖ = 15 dBdB\mathrm{dB}roman_dB
Refer to caption
(c) ϖ=10italic-ϖ10\varpi=10italic_ϖ = 10 dBdB\mathrm{dB}roman_dB
Figure 6: Convergence comparison between L-BFGS (UW-SVD) and L-BFGS method with channel estimation error; Model 2; M=256𝑀256M=256italic_M = 256; K=8𝐾8K=8italic_K = 8; Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4. 16161616 QAM; ϱ=0.2italic-ϱ0.2\varrho=0.2italic_ϱ = 0.2. The proposed UW-SVD method can double the convergence speed of L-BFGS method for all three levels of channel estimation error.

In Fig. 5, we aim to demonstrate the advantages of UW-SVD at different SNRs, by using SSOR that converges to LMMSE detection performance as an example. Two ELAA channels (i.e., Model 2 and Model 4) are considered in the figure, each with two correlation factors (i.e., ϱ=0.5italic-ϱ0.5\varrho=0.5italic_ϱ = 0.5 and ϱ=0.8italic-ϱ0.8\varrho=0.8italic_ϱ = 0.8). As can be seen in each sub-figure, the advantage of UW-SVD diminishes with decreasing SNR. This is consistent with our theoretical analysis in section IV. However, it is worth noting that UW-SVD-assisted SSOR still converges significantly faster than the original SSOR method even at lower SNRs. For instance, in Fig. 5(b), when the SNR is 16161616 dBdB\mathrm{dB}roman_dB, the original SSOR method requires approximately 20202020 iterations to converge in Model 2 using 16161616 QAM. However, with the assistance of UW-SVD, convergence is achieved in just 4444 iterations under the same system configuration. As shown in Figs. 5(c) and 5(d), the original SSOR method requires tens or even hundreds of iterations to achieve the LMMSE detection performance, even in low SNR scenarios. In contrast, the UW-SVD-assisted SSOR method only requires fewer than 10101010 iterations to converge. This figure implies that the proposed UW-SVD method can accelerate the convergence of iterative algorithms at different SNR levels.

In Fig. 6, the objective is to demonstrate the robustness of the proposed UW-SVD method when channel estimation error is considered. Let us consider the conventional LS channel estimation approach, and the estimated channel matrix is given by [54]

𝐇^=𝐇+𝐙,^𝐇𝐇𝐙\widehat{\mathbf{H}}=\mathbf{H}+\mathbf{Z},over^ start_ARG bold_H end_ARG = bold_H + bold_Z , (46)

where 𝐇^^𝐇\widehat{\mathbf{H}}over^ start_ARG bold_H end_ARG denotes the estimated channel matrix, and 𝐙𝐙\mathbf{Z}bold_Z is the AWGN matrix. The ratio (denoted by ϖitalic-ϖ\varpiitalic_ϖ) between the power of channel and noise elements is set to be 10101010, 15151515, and 20202020 dBdB\mathrm{dB}roman_dB for the three sub-figures, respectively. The MIMO channel is set to be Model 2 with ϱ=0.2italic-ϱ0.2\varrho=0.2italic_ϱ = 0.2 and the modulation scheme is set to be 16161616 QAM. It can be observed that the LMMSE detector with channel estimation error can only provide sub-optimal detection performance. Therefore, all the iterative algorithms will only converge to this sub-optimal detection performance. The proposed UW-SVD method consistently accelerate the convergence of the L-BFGS method by a factor of two, irrespective of the level of channel estimation error. More specifically, the UW-SVD-assisted L-BFGS method converges within 10101010, 7777, and 5555 iterations for ϖ=20italic-ϖ20\varpi=20italic_ϖ = 20 dBdB\mathrm{dB}roman_dB, 15151515 dBdB\mathrm{dB}roman_dB, and 10101010 dBdB\mathrm{dB}roman_dB, respectively. In contrast, the original L-BFGS method requires 20202020, 14141414, and 10101010 iterations to converge for the same respective levels of ϖitalic-ϖ\varpiitalic_ϖ. These results show that UW-SVD can improve the convergence speed of L-BFGS by approximately two times for different levels of channel estimation error.

Refer to caption
Figure 7: Convergence comparison between SSOR (UW-SVD) and SSOR method for UPA configuration; Model 3; 64646464 QAM; ϱ=0.5italic-ϱ0.5\varrho=0.5italic_ϱ = 0.5.

In Fig. 7, the objective is to show that UW-SVD can accelerate current iterative algorithms in another type of ELAA, i.e., UPA. The UPA is configured with M=16×16𝑀1616M=16\times 16italic_M = 16 × 16 antennas. The simulation results depicted in Fig. 7 suggest a performance degradation for the LMMSE detector in UPA compared to its performance in ULA. The reason for this is that UPA antennas are more tightly distributed, resulting in higher spatial correlations. In this figure, we utilize the SSOR method that converges to the LMMSE detection performance to demonstrate the advantages of the proposed UW-SVD method. It is noteworthy that the UW-SVD-assisted SSOR method converges faster than the original SSOR method across different SNR levels. For instance, the original SSOR method necessitates over 50505050 iterations to converge, and it requires more than 20202020 iterations even at relatively low SNR. Conversely, the SSOR method assisted by UW-SVD achieves convergence in only 4444 iterations at all SNR levels.

Experiment 3:

Refer to caption
(a) Model 3; Convolutional code; 16161616 QAM; K=8𝐾8K=8italic_K = 8
Refer to caption
(b) Model 4; Polar code; 64646464 QAM; K=16𝐾16K=16italic_K = 16
Figure 8: Convergence comparison of iterative algorithms in ELAA-MIMO systems considering channel coding; M=256𝑀256M=256italic_M = 256; Nue=4subscript𝑁ue4N_{\textsc{ue}}=4italic_N start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT = 4. The UW-SVD-assisted SSOR and L-BFGS methods achieve convergence ten and five times faster than the original SSOR and L-BFGS methods, respectively.

The objective of this experiment is to demonstrate that, with channel coding, the UW-SVD can still significantly accelerate the convergence of current algorithms. Two coding schemes are considered: 1/2121/21 / 2 convolutional code with a codeword length of 200200200200 bits, and 1/4141/41 / 4 polar code with a codeword length of 1,02410241,0241 , 024 bits. The decoding schemes are Viterbi decoder and successive cancellation list for convolutional code and polar code, respectively. The modulation schemes are 16161616 QAM and 64646464 QAM for convolutional and polar codes, respectively. In addition, the performance metric is set to block error rate (BLER) versus Eb/No. As shown in Fig. 8, the performance gap between uncoded and coded systems is approximately 6666 dBdB\mathrm{dB}roman_dB for both channel models. In Fig. 8(a), the UW-SVD-assisted SSOR method converges to the LMMSE detection performance in only 2222 iterations, while the SSOR method requires over 15151515 iterations to achieve the same level of convergence. In coded MIMO systems, the improvements achieved by UW-SVD for SSOR remain comparable to those observed in uncoded MIMO systems. Moreover, as shown in Fig. 8(b), UW-SVD-assisted L-BFGS methods can achieve ZF detection performance within three iterations, while the standard L-BFGS algorithm requires over 15151515 iterations to achieve the same level of performance. Together with the results in Experiment 2, it can be claimed that UW-SVD can significantly accelerate the convergence of current algorithms by up to ten times, in both uncoded and coded MIMO systems.

VI Conclusion

In this paper, we propose the UW-SVD method to accelerate the convergence of current iterative algorithms for spatially correlated (ELAA-)MIMO channels. The results demonstrate that the UW-SVD-assisted algorithms achieve convergence up to more than ten times faster compared to the corresponding current algorithms in both coded and uncoded systems. The core principle is to perform SVD on each user’s sub-channel matrix, transforming the original MIMO signal model into an e-signal model. For this e-signal model, we develop e-ZF and e-LMMSE detectors with detection performance proven to be equivalent to ZF and LMMSE detectors for the original model. Crucially, it is shown that the e-channel matrix exhibits a significantly better condition number than the original MIMO channel matrix, when considering the channel spatial correlation or non-stationarity or both. By applying current iterative algorithms to iteratively invert the better-conditioned e-channel matrix, followed by a post-processing step to recover the transmitted signals, remarkable convergence acceleration is achieved.

Appendix A Proof of Theorem 1

According to Property 1, it is straightforward that

limM𝚺k=𝐈,k.subscript𝑀subscript𝚺𝑘𝐈for-all𝑘\lim\limits_{M\rightarrow\infty}\mathbf{\Sigma}_{k}=\mathbf{I},\quad\forall k.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_I , ∀ italic_k . (47)

Hence, we have

limM𝚺=diag(𝚺1,,𝚺K)=𝐈.subscript𝑀𝚺diagsubscript𝚺1subscript𝚺𝐾𝐈\lim\limits_{M\rightarrow\infty}\mathbf{\Sigma}=\mathrm{diag}(\mathbf{\Sigma}_% {1},\ldots,\mathbf{\Sigma}_{K})=\mathbf{I}.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_Σ = roman_diag ( bold_Σ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_Σ start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) = bold_I . (48)

Plugging (11) into 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT yields

𝐀zf=𝐕𝚺𝚿H𝚿𝚺𝐕H.subscript𝐀zf𝐕𝚺superscript𝚿𝐻𝚿𝚺superscript𝐕𝐻\mathbf{A}_{\textsc{zf}}=\mathbf{V}\mathbf{\Sigma}\mathbf{\Psi}^{H}\mathbf{% \Psi}\mathbf{\Sigma}\mathbf{V}^{H}.bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_V bold_Σ bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ bold_Σ bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT . (49)

Plugging (48) into (49) yields

limM𝐀zf=𝐕𝚿H𝚿𝐕H.subscript𝑀subscript𝐀zf𝐕superscript𝚿𝐻𝚿superscript𝐕𝐻\lim\limits_{M\rightarrow\infty}\mathbf{A}_{\textsc{zf}}=\mathbf{V}\mathbf{% \Psi}^{H}\mathbf{\Psi}\mathbf{V}^{H}.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_V bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT . (50)

Plugging 𝚽zf=𝚿H𝚿subscript𝚽zfsuperscript𝚿𝐻𝚿\mathbf{\Phi_{\textsc{zf}}}=\mathbf{\Psi}^{H}\mathbf{\Psi}bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ into (50) yields

limM𝐀zf=𝐕𝚽zf𝐕H.subscript𝑀subscript𝐀zf𝐕subscript𝚽zfsuperscript𝐕𝐻\lim\limits_{M\rightarrow\infty}\mathbf{A}_{\textsc{zf}}=\mathbf{V}\mathbf{% \Phi}_{\textsc{zf}}\mathbf{V}^{H}.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_V bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT . (51)

Given that 𝐕𝐕\mathbf{V}bold_V is a unitary matrix, it does not change the condition number of the matrix being multiplied. Hence, (30) in Theorem 1 is proved. Similarly, plugging (11) and (48) into 𝐀lmmsesubscript𝐀lmmse\mathbf{A}_{\textsc{lmmse}}bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT yields

limM𝐀lmmsesubscript𝑀subscript𝐀lmmse\displaystyle\lim\limits_{M\rightarrow\infty}\mathbf{A}_{\textsc{lmmse}}roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT =𝐕𝚿H𝚿𝐕H+ρ1𝐈,absent𝐕superscript𝚿𝐻𝚿superscript𝐕𝐻superscript𝜌1𝐈\displaystyle=\mathbf{V}\mathbf{\Psi}^{H}\mathbf{\Psi}\mathbf{V}^{H}+\rho^{-1}% \mathbf{I},= bold_V bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I , (52)
=𝐕(𝚿H𝚿+ρ1𝐈)𝐕H.absent𝐕superscript𝚿𝐻𝚿superscript𝜌1𝐈superscript𝐕𝐻\displaystyle=\mathbf{V}(\mathbf{\Psi}^{H}\mathbf{\Psi}+\rho^{-1}\mathbf{I})% \mathbf{V}^{H}.= bold_V ( bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I ) bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT .

According to (48), 𝚽lmmsesubscript𝚽lmmse\mathbf{\Phi}_{\textsc{lmmse}}bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT in (18) can be expressed as follows

limM𝚽lmmse=𝚿H𝚿+ρ1𝐈.subscript𝑀subscript𝚽lmmsesuperscript𝚿𝐻𝚿superscript𝜌1𝐈\lim\limits_{M\rightarrow\infty}\mathbf{\Phi}_{\textsc{lmmse}}=\mathbf{\Psi}^{% H}\mathbf{\Psi}+\rho^{-1}\mathbf{I}.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT = bold_Ψ start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ψ + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I . (53)

Plugging (53) into (52) yields

limM𝐀lmmse=𝐕𝚽lmmse𝐕H.subscript𝑀subscript𝐀lmmse𝐕subscript𝚽lmmsesuperscript𝐕𝐻\lim\limits_{M\rightarrow\infty}\mathbf{A}_{\textsc{lmmse}}=\mathbf{V}\mathbf{% \Phi}_{\textsc{lmmse}}\mathbf{V}^{H}.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT = bold_V bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT bold_V start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT . (54)

Together with (51), Theorem 1 is proved.

Appendix B Proof of Lemma 1

According to the assumption A1, the correlation matrix 𝐑uesubscript𝐑ue\mathbf{R}_{\textsc{ue}}bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT is a block diagonal matrix. This indicates that 𝐑uesubscript𝐑ue\sqrt{\mathbf{R}_{\textsc{ue}}}square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT end_ARG is also a block diagonal matrix, and it can be expressed as follows

𝐑ue=diag(𝐑ue1,1,,𝐑ueK,K).subscript𝐑uediagsuperscriptsubscript𝐑ue11superscriptsubscript𝐑ue𝐾𝐾\sqrt{\mathbf{R}_{\textsc{ue}}}=\mathrm{diag}\bigg{(}\sqrt{\mathbf{R}_{\textsc% {ue}}^{1,1}},\dots,\sqrt{\mathbf{R}_{\textsc{ue}}^{K,K}}\bigg{)}.square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT end_ARG = roman_diag ( square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 1 , 1 end_POSTSUPERSCRIPT end_ARG , … , square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_K , italic_K end_POSTSUPERSCRIPT end_ARG ) . (55)

Hence, the sub-channel matrix of the kthsuperscript𝑘𝑡k^{th}italic_k start_POSTSUPERSCRIPT italic_t italic_h end_POSTSUPERSCRIPT user can be expressed by 𝐇k=𝛀k𝐑uek,ksubscript𝐇𝑘subscript𝛀𝑘superscriptsubscript𝐑ue𝑘𝑘\mathbf{H}_{k}=\mathbf{\Omega}_{k}\sqrt{\mathbf{R}_{\textsc{ue}}^{k,k}}bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = bold_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_k end_POSTSUPERSCRIPT end_ARG, resulting in

𝐇kH𝐇j=𝐑uek,k𝛀kH𝛀j𝐑uej,j,superscriptsubscript𝐇𝑘𝐻subscript𝐇𝑗superscriptsubscript𝐑ue𝑘𝑘superscriptsubscript𝛀𝑘𝐻subscript𝛀𝑗superscriptsubscript𝐑ue𝑗𝑗\mathbf{H}_{k}^{H}\mathbf{H}_{j}=\sqrt{\mathbf{R}_{\textsc{ue}}^{k,k}}\mathbf{% \Omega}_{k}^{H}\mathbf{\Omega}_{j}\sqrt{\mathbf{R}_{\textsc{ue}}^{j,j}},bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k , italic_k end_POSTSUPERSCRIPT end_ARG bold_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT square-root start_ARG bold_R start_POSTSUBSCRIPT ue end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j , italic_j end_POSTSUPERSCRIPT end_ARG , (56)

where 𝛀kM×Nksubscript𝛀𝑘superscript𝑀subscript𝑁𝑘\mathbf{\Omega}_{k}\in\mathbb{C}^{M\times N_{k}}bold_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ blackboard_C start_POSTSUPERSCRIPT italic_M × italic_N start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT end_POSTSUPERSCRIPT represents the i.i.d. Rayleigh distributed matrix. According to Property 1, we have the following

limM𝛀kH𝛀j=𝟎,kj.formulae-sequencesubscript𝑀superscriptsubscript𝛀𝑘𝐻subscript𝛀𝑗0for-all𝑘𝑗\lim\limits_{M\rightarrow\infty}\mathbf{\Omega}_{k}^{H}\mathbf{\Omega}_{j}=% \mathbf{0},\quad\forall k\neq j.roman_lim start_POSTSUBSCRIPT italic_M → ∞ end_POSTSUBSCRIPT bold_Ω start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_Ω start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_0 , ∀ italic_k ≠ italic_j . (57)

Applying (57) into (56), Lemma 1 are therefore obtained.

Appendix C Proof of Theorem 2

Plugging (9) into 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT yields

𝐀zf=[𝐇1H,,𝐇KH]T[𝐇1,,𝐇K].subscript𝐀zfsuperscriptsuperscriptsubscript𝐇1𝐻superscriptsubscript𝐇𝐾𝐻𝑇subscript𝐇1subscript𝐇𝐾\mathbf{A}_{\textsc{zf}}=[\mathbf{H}_{1}^{H},\dots,\mathbf{H}_{K}^{H}]^{T}[% \mathbf{H}_{1},...,\mathbf{H}_{K}].bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = [ bold_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT , … , bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT ] start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT [ bold_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ] . (58)

According to A2, it can be found that all the non-diagonal parts of 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT are 𝟎0\mathbf{0}bold_0. Hence, we have the following

𝐀zf=diag(𝐇1H𝐇1,,𝐇KH𝐇K),subscript𝐀zfdiagsuperscriptsubscript𝐇1𝐻subscript𝐇1superscriptsubscript𝐇𝐾𝐻subscript𝐇𝐾\mathbf{A}_{\textsc{zf}}=\mathrm{diag}(\mathbf{H}_{1}^{H}\mathbf{H}_{1},\dots,% \mathbf{H}_{K}^{H}\mathbf{H}_{K}),bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = roman_diag ( bold_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) , (59)

which indicate that 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT is a block diagonal matrix. Therefore, cond(𝐀zf)condsubscript𝐀zf\mathrm{cond}(\mathbf{A}_{\textsc{zf}})roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) should not be smaller than the condition number of any of its blocks, i.e.,

cond(𝐀zf)max{cond(𝐇kH𝐇k)}.condsubscript𝐀zfcondsuperscriptsubscript𝐇𝑘𝐻subscript𝐇𝑘\mathrm{cond}(\mathbf{A}_{\textsc{zf}})\geq\max\{\mathrm{cond}(\mathbf{H}_{k}^% {H}\mathbf{H}_{k})\}.roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) ≥ roman_max { roman_cond ( bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) } . (60)

Since the intra-user channel columns are correlated, it is clear that cond(𝐇kH𝐇k)>1condsuperscriptsubscript𝐇𝑘𝐻subscript𝐇𝑘1\mathrm{cond}(\mathbf{H}_{k}^{H}\mathbf{H}_{k})>1roman_cond ( bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ) > 1, and we have the following

cond(𝐀zf)>1.condsubscript𝐀zf1\mathrm{cond}(\mathbf{A}_{\textsc{zf}})>1.roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) > 1 . (61)

Also, given A1, performing SVD on 𝐇ksubscript𝐇𝑘\mathbf{H}_{k}bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and 𝐇jsubscript𝐇𝑗\mathbf{H}_{j}bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT yields

𝐇kH𝐇j=𝐕k𝚺k𝐔kH𝐔j𝚺j𝐕jH=𝟎,kj.formulae-sequencesuperscriptsubscript𝐇𝑘𝐻subscript𝐇𝑗subscript𝐕𝑘subscript𝚺𝑘superscriptsubscript𝐔𝑘𝐻subscript𝐔𝑗subscript𝚺𝑗superscriptsubscript𝐕𝑗𝐻0for-all𝑘𝑗\mathbf{H}_{k}^{H}\mathbf{H}_{j}=\mathbf{V}_{k}\mathbf{\Sigma}_{k}\mathbf{U}_{% k}^{H}\mathbf{\mathbf{U}}_{j}\mathbf{\Sigma}_{j}\mathbf{V}_{j}^{H}=\mathbf{0},% \quad\forall k\neq j.bold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_H start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT bold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT = bold_0 , ∀ italic_k ≠ italic_j . (62)

Right multiplying 𝚺k1𝐕kHsuperscriptsubscript𝚺𝑘1superscriptsubscript𝐕𝑘𝐻\mathbf{\Sigma}_{k}^{-1}\mathbf{V}_{k}^{H}bold_Σ start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_V start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT and left multiplying 𝐕j𝚺j1subscript𝐕𝑗superscriptsubscript𝚺𝑗1\mathbf{V}_{j}\mathbf{\Sigma}_{j}^{-1}bold_V start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT bold_Σ start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT on (62) yields

𝐔kH𝐔j=𝟎,kj.formulae-sequencesuperscriptsubscript𝐔𝑘𝐻subscript𝐔𝑗0for-all𝑘𝑗\mathbf{U}_{k}^{H}\mathbf{U}_{j}=\mathbf{0},\quad\forall k\neq j.bold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_U start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_0 , ∀ italic_k ≠ italic_j . (63)

Similar to that of 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT, i.e., (59), 𝚽zfsubscript𝚽zf\mathbf{\Phi}_{\textsc{zf}}bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT can also be expressed as follows

𝚽zf=diag(𝐔1H𝐔1,,𝐔KH𝐔K),subscript𝚽zfdiagsuperscriptsubscript𝐔1𝐻subscript𝐔1superscriptsubscript𝐔𝐾𝐻subscript𝐔𝐾\mathbf{\Phi}_{\textsc{zf}}=\mathrm{diag}(\mathbf{U}_{1}^{H}\mathbf{U}_{1},% \dots,\mathbf{U}_{K}^{H}\mathbf{U}_{K}),bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = roman_diag ( bold_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , … , bold_U start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_H end_POSTSUPERSCRIPT bold_U start_POSTSUBSCRIPT italic_K end_POSTSUBSCRIPT ) , (64)

which indicates that 𝚿zf=𝐈subscript𝚿zf𝐈\mathbf{\Psi}_{\textsc{zf}}=\mathbf{I}bold_Ψ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT = bold_I with condition number 1111, since 𝐔k,ksubscript𝐔𝑘for-all𝑘\mathbf{U}_{k},\forall kbold_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ∀ italic_k is a unitary matrix. Together with (61), (37) in Theorem 2 is therefore obtained.

Appendix D Proof of Theorem 3

According to (3), 𝐀lmmsesubscript𝐀lmmse\mathbf{A}_{\textsc{lmmse}}bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT can be expressed as follows

𝐀lmmse=𝐀zf+ρ1𝐈.subscript𝐀lmmsesubscript𝐀zfsuperscript𝜌1𝐈\mathbf{A}_{\textsc{lmmse}}=\mathbf{A}_{\textsc{zf}}+\rho^{-1}\mathbf{I}.bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT = bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_I . (65)

Therefore, cond(𝐀lmmse)condsubscript𝐀lmmse\mathrm{cond}(\mathbf{A}_{\textsc{lmmse}})roman_cond ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) can be expressed as follows

cond(𝐀lmmse)=(λmax(𝐀zf)+ρ1λmin(𝐀zf)+ρ1).condsubscript𝐀lmmsesubscript𝜆maxsubscript𝐀zfsuperscript𝜌1subscript𝜆minsubscript𝐀zfsuperscript𝜌1\mathrm{cond}(\mathbf{A}_{\textsc{lmmse}})=\Bigg{(}\dfrac{\lambda_{\text{max}}% (\mathbf{A}_{\textsc{zf}})+\rho^{-1}}{\lambda_{\text{min}}(\mathbf{A}_{\textsc% {zf}})+\rho^{-1}}\Bigg{)}.roman_cond ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) = ( divide start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) . (66)

According to (18), 𝚽lmmsesubscript𝚽lmmse\mathbf{\Phi}_{\textsc{lmmse}}bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT can be expressed as follows

𝚽lmmse=𝚽zf+ρ1𝚺2.subscript𝚽lmmsesubscript𝚽zfsuperscript𝜌1superscript𝚺2\mathbf{\Phi}_{\textsc{lmmse}}=\mathbf{\Phi}_{\textsc{zf}}+\rho^{-1}\mathbf{% \Sigma}^{-2}.bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT = bold_Φ start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT . (67)

According to (64) in Theorem 2, we have the following

cond(𝚽lmmse)=(1+ρ1λmax(𝚺2)1+ρ1λmin(𝚺2)).condsubscript𝚽lmmse1superscript𝜌1subscript𝜆maxsuperscript𝚺21superscript𝜌1subscript𝜆minsuperscript𝚺2\mathrm{cond}(\mathbf{\Phi}_{\textsc{lmmse}})=\Bigg{(}\dfrac{1+\rho^{-1}% \lambda_{\text{max}}(\mathbf{\Sigma}^{-2})}{1+\rho^{-1}\lambda_{\text{min}}(% \mathbf{\Sigma}^{-2})}\Bigg{)}.roman_cond ( bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) = ( divide start_ARG 1 + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) end_ARG start_ARG 1 + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) end_ARG ) . (68)

According to (59), 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT is a block diagonal matrix. Moreover, 𝚺𝚺\mathbf{\Sigma}bold_Σ contains the singular values of 𝐇k,ksubscript𝐇𝑘for-all𝑘\mathbf{H}_{k},\forall kbold_H start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , ∀ italic_k, so that 𝚺2superscript𝚺2\mathbf{\Sigma}^{2}bold_Σ start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT contains the eigenvalues of every block in 𝐀zfsubscript𝐀zf\mathbf{A}_{\textsc{zf}}bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT. Hence, we have the following

λmax(𝚺2)=λmin(𝐀zf)1;subscript𝜆maxsuperscript𝚺2subscript𝜆minsuperscriptsubscript𝐀zf1\lambda_{\text{max}}(\mathbf{\Sigma}^{-2})=\lambda_{\text{min}}(\mathbf{A}_{% \textsc{zf}})^{-1};italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) = italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT ; (69)
λmin(𝚺2)=λmax(𝐀zf)1.subscript𝜆minsuperscript𝚺2subscript𝜆maxsuperscriptsubscript𝐀zf1\lambda_{\text{min}}(\mathbf{\Sigma}^{-2})=\lambda_{\text{max}}(\mathbf{A}_{% \textsc{zf}})^{-1}.italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_Σ start_POSTSUPERSCRIPT - 2 end_POSTSUPERSCRIPT ) = italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT . (70)

Plugging (69) and (70) into (68) with some tidy up works yields

cond(𝚽lmmse)=(λmax(𝐀zf)λmin(𝐀zf))(λmin(𝐀zf)+ρ1λmax(𝐀zf)+ρ1).condsubscript𝚽lmmsesubscript𝜆maxsubscript𝐀zfsubscript𝜆minsubscript𝐀zfsubscript𝜆minsubscript𝐀zfsuperscript𝜌1subscript𝜆maxsubscript𝐀zfsuperscript𝜌1\mathrm{cond}(\mathbf{\Phi}_{\textsc{lmmse}})=\Bigg{(}\dfrac{\lambda_{\text{% max}}(\mathbf{A}_{\textsc{zf}})}{\lambda_{\text{min}}(\mathbf{A}_{\textsc{zf}}% )}\Bigg{)}\Bigg{(}\dfrac{\lambda_{\text{min}}(\mathbf{A}_{\textsc{zf}})+\rho^{% -1}}{\lambda_{\text{max}}(\mathbf{A}_{\textsc{zf}})+\rho^{-1}}\Bigg{)}.roman_cond ( bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) = ( divide start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG ) ( divide start_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) . (71)

It is obvious that the left term in (71) is cond(𝐀zf)condsubscript𝐀zf\mathrm{cond}(\mathbf{A}_{\textsc{zf}})roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ). Moreover, according to (66), cond(𝚽lmmse)condsubscript𝚽lmmse\mathrm{cond}(\mathbf{\Phi}_{\textsc{lmmse}})roman_cond ( bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) in (71) can be expressed as follows

cond(𝚽lmmse)=cond(𝐀zf)cond(𝐀lmmse).condsubscript𝚽lmmsecondsubscript𝐀zfcondsubscript𝐀lmmse\mathrm{cond}(\mathbf{\Phi}_{\textsc{lmmse}})=\dfrac{\mathrm{cond}(\mathbf{A}_% {\textsc{zf}})}{\mathrm{cond}(\mathbf{A}_{\textsc{lmmse}})}.roman_cond ( bold_Φ start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) = divide start_ARG roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG start_ARG roman_cond ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) end_ARG . (72)

To obtain the condition under which (38) in Theorem 3 holds, plugging (72) into (38) yields

cond(𝐀zf)<cond2(𝐀lmmse).condsubscript𝐀zfsuperscriptcond2subscript𝐀lmmse\mathrm{cond}(\mathbf{A}_{\textsc{zf}})<\mathrm{cond}^{2}(\mathbf{A}_{\textsc{% lmmse}}).roman_cond ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) < roman_cond start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ( bold_A start_POSTSUBSCRIPT lmmse end_POSTSUBSCRIPT ) . (73)

Plugging (66) into (73)

(λmax(𝐀zf)λmin(𝐀zf))<(λmax(𝐀zf)+ρ1λmin(𝐀zf)+ρ1)2.subscript𝜆maxsubscript𝐀zfsubscript𝜆minsubscript𝐀zfsuperscriptsubscript𝜆maxsubscript𝐀zfsuperscript𝜌1subscript𝜆minsubscript𝐀zfsuperscript𝜌12\Bigg{(}\dfrac{\lambda_{\text{max}}(\mathbf{A}_{\textsc{zf}})}{\lambda_{\text{% min}}(\mathbf{A}_{\textsc{zf}})}\Bigg{)}<\Bigg{(}\dfrac{\lambda_{\text{max}}(% \mathbf{A}_{\textsc{zf}})+\rho^{-1}}{\lambda_{\text{min}}(\mathbf{A}_{\textsc{% zf}})+\rho^{-1}}\Bigg{)}^{2}.( divide start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG start_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG ) < ( divide start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG start_ARG italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) + italic_ρ start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (74)

With some tidy-up works, this inequality holds if

ρ>1λmax(𝐀zf)λmin(𝐀zf).𝜌1subscript𝜆maxsubscript𝐀zfsubscript𝜆minsubscript𝐀zf\rho>\dfrac{1}{\sqrt{\lambda_{\text{max}}(\mathbf{A}_{\textsc{zf}})\lambda_{% \text{min}}(\mathbf{A}_{\textsc{zf}})}}.italic_ρ > divide start_ARG 1 end_ARG start_ARG square-root start_ARG italic_λ start_POSTSUBSCRIPT max end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) italic_λ start_POSTSUBSCRIPT min end_POSTSUBSCRIPT ( bold_A start_POSTSUBSCRIPT zf end_POSTSUBSCRIPT ) end_ARG end_ARG . (75)

Given σx2=1superscriptsubscript𝜎𝑥21\sigma_{x}^{2}=1italic_σ start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT = 1, we have ρ=1/σz2𝜌1superscriptsubscript𝜎𝑧2\rho=1/\sigma_{z}^{2}italic_ρ = 1 / italic_σ start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT, and plugging it into (75), (39) in Theorem 3 is therefore obtained.

References

  • [1] J. Liu, Y. Ma, and R. Tafazolli, “Leveraging user-wise SVD for accelerated convergence in iterative ELAA-MIMO detections,” in Proc. IEEE 24th Int. Workshop Signal Process. Advances Wireless Commun. (SPAWC), 2023.
  • [2] E. Björnson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy, and hardware efficiency,” Foundations and Trends® Signal Process., vol. Nov., no. 3-4, pp. 154–655, Nov. 2017.
  • [3] R. Ji, S. Chen, C. Huang, J. Yang, W. E. I. Sha, Z. Zhang, C. Yuen, and M. Debbah, “Extra DoF of near-field holographic MIMO communications leveraging evanescent waves,” IEEE Wireless Commun. Lett., vol. 12, no. 4, pp. 580–584, Jan. 2023.
  • [4] M. Cui, Z. Wu, Y. Lu, X. Wei, and L. Dai, “Near-field MIMO communications for 6G: Fundamentals, challenges, potentials, and future directions,” IEEE Commun. Mag., vol. 61, no. 1, pp. 40–46, Jan. 2023.
  • [5] C. Ouyang, Y. Liu, X. Zhang, and L. Hanzo, “Near-field communications: A degree-of-freedom perspective,” arXiv: 2308.00362, Aug. 2023.
  • [6] Z. Wu and L. Dai, “Multiple access for near-field communications: SDMA or LDMA?” IEEE J. Sel. Areas Commun., vol. 41, no. 6, pp. 1918–1935, Jun. 2023.
  • [7] D. Dardari, “Communicating with large intelligent surfaces: Fundamental limits and models,” IEEE J. Sel. Areas Commun., vol. 38, no. 11, pp. 2526–2537, Jul. 2020.
  • [8] S. Loyka, “Channel capacity of MIMO architecture using the exponential correlation matrix,” IEEE Commun. Lett., vol. 5, no. 9, pp. 369–371, Sep. 2001.
  • [9] A. Elzanaty, J. Liu, A. Guerra, F. Guidi, Y. Ma, and R. Tafazolli, “Near and far field model mismatch: Implications on 6G communications, localization, and sensing,” arXiv:2310.06604, Oct. 2023.
  • [10] E. Björnson, L. Sanguinetti, H. Wymeersch, J. Hoydis, and T. L. Marzetta, “Massive MIMO is a reality–What is next? Five promising research directions for antenna arrays,” Digit. Signal Process., vol. 94, pp. 3–20, Nov. 2019.
  • [11] M.-X. Chang and W.-Y. Chang, “Maximum-likelihood detection for MIMO systems based on differential metrics,” IEEE Trans. Signal Process., vol. 65, no. 14, pp. 3718–3732, Jul. 2017.
  • [12] Z. Wang, R. M. Gower, Y. Xia, L. He, and Y. Huang, “Randomized iterative methods for low-complexity large-scale MIMO detection,” IEEE Trans. Signal Process., vol. 70, pp. 2934–2949, Jun. 2022.
  • [13] Z. Wang, W. Xu, Y. Xia, Q. Shi, and Y. Huang, “A new randomized iterative detection algorithm for uplink large-scale MIMO systems,” IEEE Trans. Commun., vol. 71, no. 9, pp. 5093–5107, Apr. 2023.
  • [14] X. Gao, L. Dai, Y. Ma, and Z. Wang, “Low-complexity near-optimal signal detection for uplink large-scale MIMO systems,” Electron. Lett., vol. 50, no. 18, pp. 1326–1328, Aug. 2014.
  • [15] D. Zhu, B. Li, and P. Liang, “On the matrix inversion approximation based on Neumann series in massive MIMO systems,” in Proc. IEEE Int. Conf. Commun. (ICC), 2015, pp. 1763–1769.
  • [16] S. Yang and L. Hanzo, “Fifty years of MIMO detection: The road to large-scale MIMOs,” IEEE Commun. Surveys Tuts., vol. 1, no. 4, pp. 1941–1988, 4th Quart. 2015.
  • [17] M. A. Albreem, M. Juntti, and S. Shahabuddin, “Massive MIMO detection techniques: A survey,” IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3109–3132, 4th Quart. 2019.
  • [18] O. Axelsson, “A survey of preconditioned iterative methods for linear systems of algebraic equations,” BIT Numer. Math., vol. 25, pp. 165–187, Mar. 1985.
  • [19] L. Li and J. Hu, “An efficient linear detection scheme based on L-BFGS method for massive MIMO systems,” IEEE Commun. Lett., vol. 26, no. 1, pp. 138–142, Oct. 2022.
  • [20] X. Qin, Z. Yan, and G. He, “A near-optimal detection scheme based on joint steepest descent and Jacobi method for uplink massive MIMO systems,” IEEE Commun. Lett., vol. 20, no. 2, pp. 276–279, Feb. 2016.
  • [21] B. Yin, M. Wu, J. R. Cavallaro, and C. Studer, “Conjugate gradient-based soft-output detection and precoding in massive MIMO systems,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2014, pp. 3696–3701.
  • [22] L. Liu, G. Peng, P. Wang, S. Zhou, Q. Wei, S. Yin, and S. Wei, “Energy- and area-efficient recursive-conjugate-gradient-based MMSE detector for massive MIMO systems,” IEEE Trans. Signal Process., vol. 68, pp. 573–588, Jan. 2020.
  • [23] J. Wang, Y. Ma, N. Yi, and R. Tafazolli, “Sherman-Morrison regularization for ELAA iterative linear precoding,” in Proc. IEEE Int. Conf. Commun. (ICC), 2023, pp. 3546–3552.
  • [24] L. Li and J. Hu, “Fast-converging and low-complexity linear massive MIMO detection with L-BFGS method,” IEEE Trans. Veh. Technol., vol. 71, no. 10, pp. 10 656–10 665, Oct. 2022.
  • [25] D. L. Donoho, A. Maleki, and A. Montanari, “Message-passing algorithms for compressed sensing,” Proc. Natl. Acad. Sci., vol. 106, no. 45, pp. 18 914–18 919, 2009.
  • [26] S. Lyu and C. Ling, “Hybrid vector perturbation precoding: The blessing of approximate message passing,” IEEE Trans. Signal Process., vol. 67, no. 1, pp. 178–193, Oct. 2019.
  • [27] J. Liu, Y. Ma, and R. Tafazolli, “Alternative normalized-preconditioning for scalable iterative large-mimo detection,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2023, pp. 2924–2929.
  • [28] J. Ma and L. Ping, “Orthogonal AMP,” IEEE Access, vol. 5, pp. 2020–2033, Mar. 2017.
  • [29] S. Rangan, P. Schniter, and A. K. Fletcher, “Vector approximate message passing,” IEEE Trans. Inf. Theory, vol. 65, no. 10, pp. 6664–6684, May 2019.
  • [30] H. He, C.-K. Wen, S. Jin, and G. Y. Li, “A model-driven deep learning network for MIMO detection,” in Proc. IEEE Global Conf. Signal Inf. Process. (GlobalSIP), 2018, pp. 584–588.
  • [31] B. Y. Kong and I.-C. Park, “Low-complexity symbol detection for massive MIMO uplink based on Jacobi method,” in Proc. IEEE 27th Annu. Int. Symp. Personal, Indoor, Mobile Radio Commun. (PIMRC), 2016, pp. 1–5.
  • [32] C. Zhang, Z. Wu, C. Studer, Z. Zhang, and X. You, “Efficient soft-output Gauss–Seidel data detector for massive MIMO systems,” IEEE Trans. Circuits Syst. I, vol. 68, no. 12, pp. 5049–5060, Dec. 2021.
  • [33] T. Xie, L. Dai, X. Gao, X. Dai, and Y. Zhao, “Low-complexity SSOR-based precoding for massive MIMO systems,” IEEE Commun. Lett., vol. 20, no. 4, pp. 744–747, Apr. 2016.
  • [34] J. Nam, G. Caire, and J. Ha, “On the role of transmit correlation diversity in multiuser MIMO systems,” IEEE Trans. Inf. Theory, vol. 63, no. 1, pp. 336–354, Oct. 2017.
  • [35] J. Liu, Y. Ma, J. Wang, N. Yi, R. Tafazolli, S. Xue, and F. Wang, “A non-stationary channel model with correlated NLoS/LoS states for ELAA-mMIMO,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2021, pp. 1–6.
  • [36] J. Liu, Y. Ma, and R. Tafazolli, “A spatially non-stationary fading channel model for simulation and (semi-) analytical study of ELAA-MIMO,” IEEE Trans. Wireless Commun., vol. 23, no. 5, pp. 5203–5218, May 2024.
  • [37] Y. Zhang, C. You, L. Chen, and B. Zheng, “Mixed near- and far-field communications for extremely large-scale array: An interference perspective,” IEEE Commun. Lett., vol. 27, no. 9, pp. 2496–2500, Jul. 2023.
  • [38] Y. Lu and L. Dai, “Near-field channel estimation in mixed LoS/NLoS environments for extremely large-scale MIMO systems,” IEEE Trans. Commun., vol. 71, no. 6, pp. 3694–3707, Jun. 2023.
  • [39] Á. O. Martínez, E. De Carvalho, and J. Ø. Nielsen, “Towards very large aperture massive MIMO: A measurement based study,” in Proc. IEEE GLOBECOM Workshops (GC Wkshps), 2014, pp. 281–286.
  • [40] P. Harris, S. Zhang, M. A. Beach, E. Mellios, A. R. Nix, S. Armour, A. Doufexi, K. Nieman, and N. Kundargi, “LOS throughput measurements in real-time with a 128-antenna massive MIMO testbed,” in Proc. IEEE Global Commun. Conf. (GLOBECOM), 2016, pp. 1–7.
  • [41] Y. Yuan, C. Wang, C. Li, Z. Zhong, W. Han, and C.-X. Wang, “Spatial correlations of measured MIMO channels with an extremely large aperture array (ELAA),” in Proc. IEEE 95th Veh. Technol. Conf. (VTC), 2022, pp. 1–5.
  • [42] A. Karstensen, J. O. Nielsen, P. C. F. Eggers, E. De Carvalho, G. F. Pedersen, M. Alm, and G. Steinböck, “Multiuser spatial consistency analysis of outdoor massive-MIMO measurements,” IEEE Trans. Antennas Propag., vol. 70, no. 1, pp. 680–691, Jan. 2022.
  • [43] H. Lou, M. Ghosh, P. Xia, and R. Olesen, “A comparison of implicit and explicit channel feedback methods for MU-MIMO WLAN systems,” in Proc. IEEE 24th Annu. Int. Symp. Personal, Indoor, Mobile Radio Commun. (PIMRC), 2013, pp. 419–424.
  • [44] A. Li and C. Masouros, “Hybrid precoding and combining design for millimeter-wave multi-user MIMO based on SVD,” in Proc. IEEE Int. Conf. Commun. (ICC), 2017, pp. 1–6.
  • [45] L. Khamidullina, A. L. F. de Almeida, and M. Haardt, “Multilinear generalized singular value decomposition (ML-GSVD) and its application to multiuser MIMO systems,” IEEE Trans. Signal Process., vol. 70, pp. 2783–2797, May 2022.
  • [46] J. Liu, Y. Ma, A. Elzanaty, and R. Tafazolli, “Near-field fading channel modeling for ELAAs: From communication to ISAC,” arXiv:2401.17014, Jan. 2024.
  • [47] Z. H. Shaik and E. G. Larsson, “Distributed signal processing for out-of-system interference suppression in cell-free massive MIMO,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP), 2023, pp. 1–5.
  • [48] L. Nazareth, “A relationship between the BFGS and conjugate gradient algorithms and its implications for new algorithms,” SIAM J. Numer. Anal., vol. 16, no. 5, pp. 794–800, 1979.
  • [49] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed.   The Johns Hopkins University Press, 1996.
  • [50] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Aspects of favorable propagation in massive MIMO,” in Proc. 22nd Eur. Signal Process. Conf. (EUSIPCO), 2014, pp. 76–80.
  • [51] L. Sanguinetti, E. Björnson, and J. Hoydis, “Toward massive MIMO 2.0: Understanding spatial correlation, interference suppression, and pilot contamination,” IEEE Trans. Commun., vol. 68, no. 1, pp. 232–257, Jan. 2020.
  • [52] A. Amiri, M. Angjelichinoski, E. De Carvalho, and R. W. Heath, “Extremely large aperture massive MIMO: Low complexity receiver architectures,” in Proc. IEEE GLOBECOM Workshops (GC Wkshps), 2018, pp. 1–6.
  • [53] 3GPP, “Study on channel model for frequencies from 0.5 to 100 GHz,” 3rd Generation Partnership Project (3GPP), Technical Report (TR) 38.901, Mar. 2022, version 17.0.0.
  • [54] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.   Cambridge University Press, 2015.
[Uncaptioned image] Jiuyu Liu (Graduate Student Member, IEEE) received the B.Eng. degrees in computer science and technology from the Xi’an JiaoTong-Liverpool University, Suzhou, China, as well as in computer science and electronic engineering from the University of Liverpool, U.K., in 2019. He received the M.Sc. degree in electronic engineering from the University of Surrey, U.K., in 2020. He is currently pursuing the Ph.D. degree in electronic engineering with the 5GIC & 6GIC, Institute for Communication Systems, University of Surrey, U.K. His main research interests include multiple-input multiple-output, extremely large aperture array, spatially non-stationary channel modeling, stochastic process, digital signal processing.
[Uncaptioned image] Yi Ma (Senior Member, IEEE) is currently a Chair Professor with the Institute for Communication Systems (ICS), University of Surrey, Guildford, U.K. He has authored or coauthored more than 200 peer reviewed IEEE journal articles and conference papers in the areas of deep learning, cooperative communications, cognitive radios, interference utilization, cooperative localization, radio resource allocation, multiple-input multiple-output, estimation, synchronization, and modulation and detection techniques. He holds ten international patents in the areas of spectrum sensing and signal modulation and detection. He has served as the Tutorial Chair for EuroWireless 2013, PIMRC 2014, and CAMAD 2015. He is the Co-Chair of the Signal Processing for Communications Symposium in ICC 2019. He was the Founder of the Crowd-Net Workshop in conjunction with ICC 2015, ICC 2016, and ICC 2017.
[Uncaptioned image] Jinfei Wang received his Ph.D. degree from the University of Surrey, U.K., in 2023. He is currently a Research Fellow at the 5GIC & 6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, U.K. His main research interests include: physical layer design of massive multiple-input multiple-output (MIMO) systems, ultra-reliable low-latency communication (URLLC), physical layer design of extremely large aperture array (ELAA) systems and stochastic process.
[Uncaptioned image] Rahim Tafazolli (Senior Member, IEEE) is Regius Professor of Electronic Engineering, Professor of Mobile and Satellite Communications, Founder and Director of 5GIC, 6GIC and ICS (Institute for Communication System) at the University of Surrey. He has over 30 years of experience in digital communications research and teaching. He has authored and co-authored more than 1000 research publications and is regularly invited to deliver keynote talks and distinguished lectures to international conferences and workshops. He was the leader of study on “grand challenges in IoT” (Internet of Things) in the UK, 2011-2012, for RCUK (Research Council UK) and the UK TSB (Technology Strategy Board). He is the Editor of two books on Technologies for Wireless Future (Wiley) vol. 1, in 2004 and vol. 2, in 2006. He holds Fellowship of Royal Academy of Engineering, Institute of Engineering and Technology (IET) as well as that of Wireless World Research Forum. He was also awarded the 28th KIA Laureate Award- 2015 for his contribution to communications technology.