Memory-Assisted Quantized LDPC Decoding

Philipp Mohr^{\orcidlink0000-0003-4350-9969}, , and Gerhard Bauch Philipp Mohr and Gerhard Bauch are with the Institute of Communications, Hamburg University of Technology, Hamburg, 21073, Germany. E-mail: {philipp.mohr; bauch}@tuhh.de.

Abstract

We enhance coarsely quantized LDPC decoding by reusing computed check node messages from previous iterations. Typically, variable and check nodes generate and replace old messages in every iteration. We show that, under coarse quantization, discarding old messages involves a significant loss of mutual information. The loss is avoided with additional memory, improving performance up to 0.36 dB. We propose a modified information bottleneck algorithm to design node operations taking messages from the previous iteration(s) into account as side information. Finally, we reveal a 2-bit row-layered decoder that can operate within 0.25 dB w.r.t. 32-bit belief propagation.

Index Terms:

LDPC decoder, layered decoding, rate-compatible, coarse quantization, information bottleneck

I Introduction

Efficient and reliable decoding of low-density parity-check (LDPC) codes is vital in modern technologies with high data rate requirements, such as 5G[1]. Particularly the exchange of messages in iterative message passing decoding algorithms like belief propagation demands significant complexity[2]. To overcome this bottleneck many works focus on reducing the bit width of the exchanged messages in these algorithms through quantization operations, see e.g. [3, 4, 5, 6, 7, 8, 9, 10, 11].

The quantized messages represent reliability levels that encode reliability information exchanged between variable nodes (VNs) and check nodes (CNs). The choice of reliability levels is crucial for excellent decoding performance with low-resolution messages [11]. Information optimum reliability levels can be found with the information bottleneck (IB) method which is a clustering framework, that enables the design of compression operations for maximizing preserved relevant mutual information[12, 13, 8, 9, 10, 11]. Relevant mutual information measures the average amount of information between the transmitted code bits and exchanged decoding messages.

Typically, calculated messages from a previous iteration are replaced by updated messages from the current iteration [3, 8, 9, 4, 5, 10, 11, 6, 7]. One might question whether discarding previously computed and exchanged messages wastes valuable information. Indeed, under coarse quantization, this work confirms that preserving old messages of the previous iteration can significantly improve the decoding performance. For the design of a memory-assisted decoder we modify the sequential IB algorithm from [8] to be aware of the messages retained in memory. This algorithm is specifically suited for the design of deterministic compression mappings realized with symmetric thresholds. It has significantly reduced computational costs compared to more general solutions[13].

We combine the memory-assisted decoder structure with our recently proposed region-specific CN-aware quantizer design [11]. Region-specific quantization allows individual alphabets of reliability levels for subsets of exchanged messages particularly improving low-resolution decoding of highly irregular 5G-LDPC codes. A CN-aware quantizer design for the VN extends the optimization scope to maximize preserved relevant information at the output of the subsequent CN update[10]. The combination of this work and [11] yields up to 0.68 dB gain w.r.t. 2-bit decoding without those techniques.

Refer to caption — Figure 1: Tanner graph of a 5G LDPC code[11].

II Preliminaries on LDPC Decoding with Mutual Information Maximizing Quantization

Most standards define LDPC codes through a base matrix $\bm{\mathbf{H}}_{b}$ with entries $H^{b}_{ij}{\in}\{-1,\ldots,Z\}$ . The base matrix $\bm{\mathbf{H}}_{b}$ can be represented by a Tanner graph illustrated in Fig. 1. Each column $j$ turns into a variable node (VN) and each row $i$ into a check node (CN). The non-negative entries $H^{b}_{ij}$ are edges between VNs and CNs. The node degree, i.e., the number of connected edges to a node, is $d_{v,j}$ for a VN and $d_{c,i}$ for a CN.

Lifting replaces every edge with $Z$ edges that are subjected to $H^{b}_{ij}$ -cyclic permutation. The lifted graph can be equivalently represented by a lifted parity check matrix $\bm{\mathbf{H}}$ . The encoder maps the information bits $\bm{u}$ to code bits $\bm{b}$ such that $\bm{\mathbf{H}}\bm{b}{=}\bm{0}$ [11]. The decoder in the receiver assumes a memoryless channel. For every $b{\in}\bm{b}$ a binary channel LLR $L^{ch}$ is quantized to a $w^{ch}$ -bit message $t^{ch}$ as input to the decoder. The quantization maximizes mutual information between $b$ and $t^{ch}$ as in [11].

II-A Decoding with Arbitrary Schedules

Message passing decoding computes and exchanges messages between VNs and CNs to aggregate soft information for error correction from the parity check constraints. Each edge of the graph contains a VN and CN memory location enumerated with $m{\in}\mathcal{N}{=}\{1,\ldots,\sum_{j}d_{v,j}\}$ for the VN-to-CN messages and $n{\in}\mathcal{N}$ for the CN-to-VN messages (cf. Fig.1). A memory location stores $Z$ messages after lifting the graph as illustrated within the orange box in Fig. 1. The sets $\mathcal{U}^{v}{\subseteq}\mathcal{N}$ and $\mathcal{U}^{c}{\subseteq}\mathcal{N}$ specify target memory locations for VN and CN updates, respectively. The decoding schedule defines the order in which memory locations are updated as

\displaystyle\bm{\mathcal{U}}=(\mathcal{U}_{0}^{v},\mathcal{U}_{1}^{c},% \mathcal{U}_{2}^{v},\mathcal{U}_{3}^{c},\ldots)

(1)

followed by a final hard decision update that uses the most recent updated CN messages.

II-B Node Operations

We introduce the discrete random variables $T^{ch}_{j}$ , $T_{m}^{v}$ and $T_{n}^{c}$ for modeling the channel, VN and CN messages. A realization $t^{\diamond}_{\triangleright}$ of $T^{\diamond}_{\triangleright}$ takes values from an LLR-sorted alphabet $\mathcal{T}^{\diamond}_{\triangleright}{=}\{\pm 1,{\mathinner{{\ldotp}{\ldotp}% }},\pm 2^{w_{\diamond}-1}\}{=}\{-2^{w_{\diamond}-1},\mathinner{{\ldotp}{\ldotp% }},-1,1,\mathinner{{\ldotp}{\ldotp}},2^{w_{\diamond}-1}\}$ where $\diamond{\in}\{ch,v,c\}$ and $\triangleright{\in}\{j,m,n\}$ . We set $w_{ch}{=}5$ and $w_{v}{=}w_{c}{=}w$ in this paper. For the design of the decoder, we keep track of $p(x_{m},t^{v}_{m})$ and $p(x_{n},t^{c}_{n})$ that change with every VN and CN update. Updating a VN memory location $m\in\mathcal{U}^{v}$ yields[11]

\displaystyle y^{v}_{m}

\displaystyle=L(x_{m}|t^{ch}_{\operatorname{col}(m)})+\sum_{\mathclap{n\in\xi^% {v}_{m}}}\bar{L}(t^{c}_{n}|x_{m})

(2)

with extrinsic CN locations $\xi_{m}^{v}{=}(n{:}n{\neq}m,\operatorname{col}(n){=}\operatorname{col}(m))$ . An LLR reconstruction of a message $t^{c}_{n}$ is denoted as $\bar{L}(t^{c}_{n}|x){=}\log p(\bar{T}^{c}_{n}{=}t^{c}_{n}|x{=}0)/p(\bar{T}^{c}% _{n}{=}t^{c}_{n}|x{=}1)$ using the aligned variables $\bar{T}^{c}_{n}$ introduced in the next subsection II-C. The hard decision yields $\hat{x}_{m}{=}(1{-}\operatorname{sgn}(\hat{L}_{m}))/2$ with the a-posteriori probability (APP) LLR $\hat{L}_{m}{=}y^{v}_{m}{+}\bar{L}(t^{c}_{m}|x_{m})$ . Row-layered decoder structures, such as the one in Fig. 8, typically use the APP LLR for computing (2) efficiently as $y^{v}_{m}{=}\hat{L}_{m}{-}\bar{L}(t^{c}_{m}|x_{m})$ . Those decoders initialize the APP LLR with $\hat{L}_{m}{=}L(x_{m}|t^{ch}_{\operatorname{col}(m)})$ . Hence, the reconstruction of the channel message must be done only once for all iterations. In an implementation the reconstruction in (2) is typically carried out with integer scaled LLRs of bit width $w^{\prime}{\approx}8$ to avoid performance loss[11].

Updating a CN memory location $n{\in}\mathcal{U}^{c}$ , exploiting the minimum-approximation[3], yields[11]

\displaystyle t^{c}_{n}=Q(y^{c}_{n})\text{ with }y^{c}_{n}

\displaystyle=\prod_{\mathclap{m\in\xi^{c}_{n}}}\operatorname{sgn}(y^{v}_{m})% \min_{\mathclap{m\in\xi^{c}_{n}}}|y^{v}_{m}|

(3)

with extrinsic VN locations $\xi^{c}_{n}{=}(m{:}m{\neq}n,\operatorname{row}(m){=}\operatorname{row}(n))$ . Without performance loss, the quantizer $Q$ can be moved before the CN update (cf. Fig. 2(a) & 2(b)) lowering complexity.

II-C Alignment Regions

The reconstruction $\bar{L}(t^{c}_{n}|x)$ in (2) and quantization $Q{:}\mathcal{Y}_{n}^{c}{\to}\mathcal{T}_{n}^{c}$ in (3) can be designed such that decoding messages from different memory locations $n$ can share the same functions. Using the same functions reduces the overall number of parameters to be designed and implemented. Common functions are realized through an alignment operation applied to the variables $T_{n}^{c}$ and $X_{n}$ before the node design as[11]

\displaystyle\bar{T}^{c}_{n}=\frac{1}{|\mathcal{A}_{n}|}\sum_{n^{\prime}\in% \mathcal{A}_{n}}T^{c}_{n^{\prime}}\text{ and }\bar{X}_{n}=\frac{1}{|\mathcal{A% }_{n}|}\sum_{n^{\prime}\in\mathcal{A}_{n}}X_{n^{\prime}}

(4)

where $\mathcal{A}_{n}$ comprises all elements from the same region. This work considers a row-alignment $\mathcal{A}_{n}{=}\{n^{\prime}{:}\operatorname{row}(n^{\prime})=\operatorname{% row}(n)\}$ or matrix-alignment $\mathcal{A}_{n}{=}\mathcal{N}$ [11].

II-D Region-Specific Quantization with Check Node Awareness

A compact version of the CN messages can be obtained with threshold quantization $t^{c}_{n}{=}Q(y_{n}^{c})$ (cf. Fig. 2(a)). The objective is to maximize the mutual information $\max_{Q}I(\bar{X}_{n};\bar{T}^{c}_{n})$ preserved by any CN message in the alignment region $\mathcal{A}_{n}$ . The optimization of $Q$ can be performed with the sequential IB algorithm[8], also described in section IV. This algorithm requires the distribution $p(\bar{x}_{n},\bar{y}_{n}^{c}){=}\sum_{n^{\prime}\in\mathcal{A}_{n}}p(x_{n^{% \prime}},Y^{c}_{n^{\prime}}{=}\bar{y}_{n}^{c})/|\mathcal{A}_{n}|$ . In case of a layered schedule, each update points to a subset of all memory locations $\mathcal{U}{\subset}\mathcal{N}$ . A single quantizer design suffices for each region $\mathcal{N}_{a}{=}\{\mathcal{A}_{n}{:}n{\in}\mathcal{U}\}$ where $a{\in}\{1,\ldots,|\mathcal{A}|\}$ enumerates the distinct regions. The notation $\{\mathcal{A}_{n}{:}n{\in}\mathcal{U}\}$ builds a set with unique elements, e.g., $\{\{1,2\},\{2,1\},\ldots\}$ reduces to $\{\{1,2\},\ldots\}$ . The quantizer designed for region $\mathcal{N}_{a}$ is used only for updating the locations defined by $\mathcal{N}_{a}\cap\mathcal{U}$ [11].

\includestandalone

[mode=buildmissing]figs_tikz/equivalent_quant_impl/source

(a) CN-aware design of

Q

\includestandalone

[mode=buildmissing]figs_tikz/equivalent_quant_impl2/source

(b)

Q

is implemented before CN.

Figure 2: Structure (a) is used during the design phase and structure (b) is used in the implementation.

III Memory-Assisted Reconstruction

\includestandalone

[mode=buildmissing]figs_tikz/mem_aware_setup/source

Figure 3: Quantizer design with memory-assisted reconstruction.

A small bit width $w$ of the exchanged messages significantly lowers the decoder complexity for several reasons:

•

One quantization operation uses $w{-}1$ comparisons[10].
•

The routing network size scales with $w$ .
•

The min-CN update can be carried out much faster. For example using $w{=}2$ instead of $w{=}3$ bits potentially reduces the logic gate delay by a factor of 4 [10].

Unfortunately, with $w{=}2$ bit these major complexity savings can lead to a noticeable performance degradation[11]. This section proposes a novel approach to overcome most of the degradation when using e.g. $w{=}2$ instead of $w{=}3$ bits. Fig. 3 extends the setup by preserving the old CN message $t^{c}_{n}$ of the previous iteration as $s^{c}_{n}$ . We remark that preserving $s^{c}_{n}$ for another iteration would give only marginal performance gains.

III-A Modification of Existing Decoder Design

Instead of aiming for $\max_{Q}I(\bar{X};\bar{T}^{c}_{n})$ we propose to take the statistics of the message $s^{c}_{n}$ into account. The message $s^{c}_{n}$ already provides a certain amount of mutual information $I(X^{c}_{n};S^{c}_{n})$ . The optimization of the quantizer shall optimize preservation of additional mutual information $\max_{Q}I(X^{c}_{n};T^{c}_{n}|S^{c}_{n})$ . The variable node update now takes into account $s^{c}_{n}$ as

\displaystyle y^{v}_{m}

\displaystyle=L(x_{m}|t^{ch}_{\operatorname{col}(m)})+\sum_{n\in\xi_{m}^{v}}% \bar{L}(t^{c}_{n},s^{c}_{n}|x_{m})

(5)

For the design of $Q{:}\mathcal{Y}^{c}_{n}{\to}\mathcal{T}^{c}_{n}$ in (3) we measure the joint distribution $p(x_{n},y^{c}_{n},s^{c}_{n})$ which considers correlations between $t^{c}_{n}$ and $s^{c}_{n}$ . Therefore, we generate a large set of decoding messages $x_{n},y^{c}_{n},s^{c}_{n}$ under a specific design- $E_{b}/N_{0}$ . In the next section IV we introduce a side-information aware IB algorithm which is used for optimization of the quantization thresholds. We define $X_{n}$ , $Z^{c}_{n}$ , $T^{c}_{n}$ , and $S^{c}_{n}$ as the relevant, observed, compressed and side-information variable, $X$ , $Y$ , $T$ , and $S$ , respectively. The optimization aims for $\max_{p(t|y)}I(X;T|S)$ . We remark that the alphabet $\mathcal{Y}$ is not strictly LLR-sorted as defined in (7) as a result of the CN minimum approximation.

IV Information Bottleneck Algorithm with Side-Information Awareness

Figure 4: Information bottleneck setup with side information.

Typically, an IB setup is defined by a relevant, observed and compressed random discrete variable $X,x{\in}\mathcal{X}{=}\{0,1\}$ , $Y,y{\in}\mathcal{Y}{\in}\{1,\ldots,|\mathcal{Y}|\}$ and $T,t{\in}\mathcal{T}{=\{1,\ldots,|\mathcal{T}|\}}$ that form a Markov chain $X{\to}Y{\to}T$ . The IB method is a generic clustering framework for designing compression operations $p(t|y)$ with optimization objective $\max_{p(t|y)}I(X;T)-\beta^{-1}I(Y;T)$ . The choice of $\beta\geq 0$ allows to trade preservation of relevant information $I(X;T)$ for compression. Of very high practical interest is the case where $\beta{\to}\infty$ because it can be achieved with a deterministic mapping through threshold quantization

\displaystyle t=Q(y)=\mathcal{T}[k]\quad\tau_{k}{\leq}y{<}\tau_{k+1},0{\leq}k{% <}|\mathcal{T}|

(6)

with outer thresholds $\tau_{0}{=}0$ and $\tau_{|\mathcal{T}|}{=}|\mathcal{Y}|$ , and $\mathcal{T}[k]$ identifying the $k$ th element of the ordered set $\mathcal{T}$ . The mapping with thresholds can be information-optimum if [14]

\displaystyle L(x|y{=}1){\leq}L(x|y{=}2){\leq}{\ldots}{\leq}L(x|y{=}|\mathcal{% Y}|).

(7)

This section extends the conventional setup with a fourth variable $S$ which provides side-information about $X$ [13], see Fig. 4. Prior works[12, 13] also consider a setup with side information, however, those works do not explicitly provide a low-complexity solution for optimizing a threshold quantizer in the context of LDPC decoding. We propose Algorithm 1 which is a modified variant of the sequential IB algorithm from [8] taking into account the side information $S$ . The algorithm exploits (7) to sequentially optimize initial random boundaries $\tau_{k}$ defining the target clusters $\mathcal{Y}_{t}{=}\{y{:}\tau_{t}{\leq}y{<}\tau_{t+1}\}{\subset}\mathcal{Y}$ . To avoid local optima, we run 500 different initializations in parallel. We enforce symmetric thresholds $\tau_{k}{=}|\mathcal{T}|{-}\tau_{|\mathcal{T}|-k}$ , reducing the number of design and implementation parameters.

Algorithm 1 A sequential IB algorithm from [8] considering side information

s

in the merger cost computation (line 18).

p(x,y,s),|\mathcal{T}|

p(t|y)

p(x,t,s)

3:Create random symmetric clustering

p(t|y)

;

4:Compute

p(x,t,s)=\sum_{y}p(t|y)p(x,y,s)

;

5:Extend clusters

\mathcal{Y}_{1},\ldots,\mathcal{Y}_{|\mathcal{T}|}

with empty singleton clusters

\mathcal{Y}_{|\mathcal{T}|{+}1}

and

\mathcal{Y}_{|\mathcal{T}|{+}2}

;

6:repeat

7: Save

p(t|y)

p^{\mathrm{old}}(t|y)

;

8: for

t\in\{1,\ldots,|\mathcal{T}|/2{-}1\}

9: for

(t_{1},t_{2})\in\{{\color[rgb]{1,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{% 1,0,0}(t,t{+}1)},{\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,1}(t{+}1,t)}\}

10: repeat

11: if

t_{1}==t

then

12:

y

is rightmost element in cluster

\mathcal{Y}_{t_{1}}

;

13: else if

t_{1}==t{+}1

then

14:

y

is leftmost element in cluster

\mathcal{Y}_{t_{1}}

;

15: end if

16: Move

y

into

\mathcal{Y}_{|\mathcal{T}|{+}1}

, and

y^{\prime}{=}|\mathcal{Y}|{+}1{-}y

into

\mathcal{Y}_{|\mathcal{T}|{+}2}

;

17: Update

p(x,t,s)

;

18: Compute optimal cluster using (12):

\displaystyle k=\arg\min_{k^{*}\in\{1,2\}}C_{sym}(y,k^{*})

(8)

\includestandalone[mode=buildmissing]figs_tikz/side_info_aware_ib_algo/source

19: Merge

\mathcal{Y}_{|\mathcal{T}|{+}1}

into

\mathcal{Y}_{t_{k}}

, and

\mathcal{Y}_{|\mathcal{T}|{+}2}

into

\mathcal{Y}_{|\mathcal{T}|{+}1{-}t_{k}}

;

20: Update

p(x,t,s)

and

p(t|y)

;

21: until

t_{k}==t_{1}

22: end for

23: end for

24:until

p(t|y)==p^{\mathrm{old}}(t|y)

25:return

p(t|y)

p(x,t,s)

IV-A Merger Costs with Side Information

In line 16, the element $y$ and counterpart element $y^{\prime}$ are moved into singleton clusters $\mathcal{Y}_{|\mathcal{T}|{+}1}$ and $\mathcal{Y}_{|\mathcal{T}|{+}2}$ , respectively. The temporary decompression is modeled with a discrete random variable $\ddot{T},\ddot{t}{\in}\ddot{\mathcal{T}}{=}\mathcal{T}\cup\{|\mathcal{T}|{+}1,% |\mathcal{T}|{+}2\}$ . Line 18 optimizes the deterministic mapping $p(t|\ddot{t}){=}\delta(t{-}f_{k}(\ddot{t}))$ with $f_{k}:\mathcal{\ddot{T}}{\to}\mathcal{T}$ . The algorithm restricts merging $y$ into an adjacent cluster $t_{1}$ or $t_{2}$ . Thus, two mapping options exist $k{\in}\{1,2\}$ with

\displaystyle f_{k}(\ddot{t}){=}\begin{dcases}\ddot{t}&\ddot{t}{\in}\mathcal{T% }\\ t_{k}&\ddot{t}{=}|\mathcal{T}|{+}1\\ t_{k}^{\prime}&\ddot{t}{=}|\mathcal{T}|{+}2\end{dcases}\quad\text{ with $t_{k}% ^{\prime}{=}|\mathcal{T}|{+}1{-}t_{k}$.}

(9)

The mutual information loss from merging is

$\displaystyle C_{sym}(y,k)=$	$\displaystyle I(X;\ddot{T}\|S)-I(X;T\|S)$	(10)
$\displaystyle=$	$\displaystyle\sum_{\ddot{t},s}p(\ddot{t},s)\operatorname{D_{KL}}(p(x\|\ddot{t},% s)\|\|p(x\|f_{k}(\ddot{t}),s))$	(11)
$\displaystyle=$	$\displaystyle\sum_{s}p(s)(C(y,t_{k}\|s){+}C(y^{\prime},t_{k}^{\prime}\|s))$	(12)

where the individual merger costs are

\displaystyle\begin{split}C(y,t|s)&{=}p(\ddot{T}{=}t|s)\operatorname{D_{KL}}% \left\{p(x|\ddot{T}{=}t,s)||p(x|T{=}t,s)\right\}\\ +&p(Y{=}y|s)\operatorname{D_{KL}}\left\{p(x|Y{=}y,s)||p(x|T{=}t,s)\right\}\end% {split}

(13)

with $\operatorname{D_{KL}}(p||q){=}\sum_{x}p(x)\log_{2}(p(x)/q(x))$ .

V Evaluation with 5G Codes

This section investigates the performance of the proposed decoders with memory-assistance. As in [11] we use a 5G-LDPC code with length $8448$ , base graph 1 and various code rates[1]. Furthermore, we consider an AWGN channel with BPSK modulation. All decoders use the initialization schedule described in [11] to avoid useless CN updates resulting from processing punctured messages. If not mentioned otherwise, the remaining schedule follows the flooding scheme with a maximum of 30 decoder iterations. Each decoder design uses a large set of training data with 10000 transmitted and received code words generated for a specific design $E_{b}/N_{0}$ . Analytical tracking of joint probabilities for the design of memory-assisted decoders seems infeasible. For a fair comparison, also the conventional decoders are designed with the training data, leading to slightly different results compared to our work[11].

V-A Evolution of Mutual Information

Fig. 5 depicts the evolution of mutual information between code bit $X$ and the corresponding hard decision $\hat{X}$ for every iteration. The quantized messages are matrix-aligned such that all messages use the same alphabet of reliability levels in one iteration. The design process is initialized with the same design $E_{b}/N_{0}$ . Particularly under 2-bit decoding, the mutual information gains per iteration are significantly improved with the proposed memory-assistance. For 3-bit decoding those gains appear smaller. Nevertheless, the proposed 3-bit decoding almost achieves the same performance as the conventional 4-bit decoding.

V-B Boundary Placement for Memory-Assisted Reconstruction

This section analyzes the placement of quantizer boundaries $\tau_{k}$ for every iteration to explain the performance gains achieved with the proposed structure. Now, all decoders use an individual design- $E_{b}/N_{0}$ so that the mutual information converges after 30 iterations $I(X,\hat{X})(30)\approx 0.9999$ . A careful optimization is very important to ensure minimum frame error rate for a given budget of decoding iterations.

Fig. 6 shows boundary levels which increase for higher iterations as the reliability of messages improves. One key observation is that boundary magnitudes for the proposed 2-bit decoder show up an alternating rising and falling trend. Thus, memory-assistance enhances the resolution by using different quantizers in successive iterations. This behavior clearly shows the effectiveness of the side-information aware IB algorithm: A CN message $t$ is a compressed version of the non-quantized LLR $y^{c}_{n}$ in (3) from the current iteration. A CN message $s$ is a compressed version of $\underline{y}^{c}_{n}$ from the previous iteration. The difference $\Delta y^{c}_{n}{=}y^{c}_{n}{-}\underline{y}^{c}_{n}$ is sufficiently small on average, such that $s$ can approximately resolve $y^{c}_{n}$ . Different boundaries for $s$ and $t$ improve their combined capability to resolve $y^{c}$ in relevant ranges. The combined resolution approaches the 3-bit quantizer in Fig. 6.

We remark that the side information $s$ is only actively used during reconstruction $L(t,s|x)$ . For example, consider a sign-magnitude alphabet sorted by underlying LLR with $t{\in}\mathcal{T}{=}\{-2,-1,+1,+2\}$ . For $t{=}{+}2$ the reconstructed LLR $L(t{=}{+}2,s|x)$ is more reliable if the message in memory is matching, e.g., $s{=}{+}2$ . It is less reliable if $s$ does not agree, e.g., $s{=}{-}2$ . Thus, $L(t{=}{+}2,s{=}{+}2|x){>}L(t{=}{+}2,s{=}{-}2|x)$ . Combined knowledge about the message $s$ and $t$ allows better inference of the non-quantized LLR $y^{c}_{n}$ .

\includestandalone

[mode=buildmissing]figs_tikz/alignments_cd_ms_flooding_cn_aware_mi_evolution_1_3/source_miapp

Figure 5: Evolution of mutual information with code rate

1/3

. All results obtained with same design

E_{b}/N_{0}{=}1.0

dB.

\includestandalone

[mode=buildmissing]figs_tikz/alignments_cd_ms_flooding_cn_aware_mi_evolution_1_3/source_bounds

Figure 6: Evolution of boundary placement each with code rate

1/3

and individual design

E_{b}/N_{0}=.55

.73

and

1.0

dB.

V-C Reduced Complexity with Merged Check Node Messages

In the previous sections the side information $s^{c}$ is equal to $t^{c}$ from the previous iteration and is discarded after it has been used. Instead of discarding $s^{c}$ , this section considers a side information $s^{\Psi}{\in}\{\pm 1,{\mathinner{{\ldotp}{\ldotp}}},\pm 2^{w_{\Psi}-1}\}$ which is merged with $t^{c}$ into a compressed message $t^{\Upsilon}{=}\Upsilon(s^{\Psi},t^{c}){\in}\{\pm 1,{\mathinner{{\ldotp}{% \ldotp}}},\pm 2^{w_{\Upsilon}-1}\}$ obtained through a compression table $\Upsilon$ . The table is designed using the IB method with a measured joint distribution $p(x,s^{\Psi},t^{c})$ . The reconstruction in (5) now uses $\bar{L}(t^{\Upsilon}|x)$ instead of $\bar{L}(s^{\Psi},t^{c}|x)$ . The side information can propagate over many iterations as ${\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,1}s^{\psi}}{=}% \Psi(\Upsilon({\color[rgb]{0,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{% 0,0,1}\underline{s}^{\psi}},\underline{t}^{c}))$ where $\Psi$ is another compression for reducing the table size of $\Upsilon$ .

The compression reduces the memory demand from $|\mathcal{N}|(w_{\Psi}{+}w_{c})$ to $|\mathcal{N}|w_{\Upsilon}$ bits in a row-layered decoder structure depicted in Fig. 8. The row-layered schedule sequentially updates orthogonal sets of rows of $\bm{\mathbf{H}}$ [11]. One layer update consists of partial VN and a full CN update (see section II-B). A partial VN update computes the VN-to-CN message as $t^{v}{=}Q(L^{v})$ with $L^{v}{=}\underline{\hat{L}}{-}\underline{L}^{c}$ . The reconstruction $\underline{L}^{c}{=}\underline{\phi}(\underline{t}^{\Upsilon})$ translates $\underline{t}^{\Upsilon}$ from the previous iteration to a $w^{\prime}$ -bit integer representations of $\bar{L}(\underline{t}^{\Upsilon}|x)$ . Significant loss is avoided if $w^{\prime}{\geq}7$ bits[11]. A permutation performs a cyclic shift of $Z$ parallel messages. A full CN update computes CN-to-VN messages for all connected VNs according to (3). After an inverse permutation, a CN-to-VN message $t^{v}$ is merged with the side information $s^{\Psi}$ . Finally, $\hat{L}{=}L^{v}{+}\phi(t^{\Upsilon})$ .

V-D Frame Error Rate Performance

The decoding performance with different resolutions, code rates and schedules is depicted in Fig. 7. If not indicated otherwise all quantized decoders apply CN-aware quantization with row-alignment as in [11]. For rate $1/3$ , the proposed $w{=}2$ -bit ( $w{=}3$ ) memory-assisted decoder improves performance by 0.36 dB (0.1 dB) compared to the conventional decoders. Relative to a decoder with matrix alignment, no CN-aware quantization and no memory-assistance the performance gain is 0.68 dB. The $w{=}3$ -bit memory-assisted decoder is able to slightly outperform a $w{=}4$ -bit decoder. The labels (3 4 2) and (2 3 2) specify the bit widths ( $w$ $w_{\Upsilon}$ $w_{\Psi}$ ) of the decoders with merged old and new CN message (see section V-C), respectively. They reduce the memory demand from 6 to 4 and 4 to 3 bits without losing performance compared to the non-merged decoders. The row-layered schedule with 15 iterations can achieve almost similar performance as the flooding schedule with 30 iterations. The improvements also translate to other code rates $2/3$ to $11/12$ . Remarkably, under high rate $11/12$ , the (2 3 2)-bit decoder operates within $0.25$ dB compared to a belief propagation decoder with accurate box-plus operation and 32-bit LLR messages[3].

\includestandalone

[mode=buildmissing]figs_tikz/alignments_cd_ms_flooding_cn_aware_low_to_high_rate/source

Figure 7: Performance for rates

1/3

2/3

5/6

and

11/12

with and without memory assistance.

\includestandalone

[mode=buildmissing]figs_tikz/hardware/horizontal

Figure 8: Efficient structure of a row-layered decoder with memory-assistance and merged reconstruction messages

t^{\Upsilon}

VI Conclusions

This letter proposed coarsely quantized decoding which is assisted through messages retained in memory. We extended an existing IB algorithm for the design of threshold quantization which is aware of side information provided by the memory. Further, we proposed a new structure which merges the side information with a newly generated CN message to reduce the memory overhead. In summary, 2-bit (layered) decoding is improved up to 0.36 dB by increasing the memory from 2 to 3 bits, but preserving high speed 2-bit operations for quantization, routing network and the CN update.

References

[1] 3GPP, “5G NR: Multiplexing and Channel Coding, TS 38.212,” 2018.
[2] R. Gallager, “Low-Density Parity-Check Codes,” IRE Transactions on Information Theory, vol. 8, no. 1, pp. 21–28, 1962.
[3] J. Chen, A. Dholakia et al., “Reduced-complexity decoding of LDPC codes,” IEEE Trans. on Commu., vol. 53, no. 8, pp. 1288–1299, 2005.
[4] P. Kang, K. Cai et al., “Generalized Mutual Information-Maximizing Quantized Decoding of LDPC Codes With Layered Scheduling,” IEEE Trans. on Vehicular Technology, vol. 71, no. 7, pp. 7258–7273, 2022.
[5] L. Wang, C. Terrill et al., “Reconstruction-Computation-Quantization (RCQ): A Paradigm for Low Bit Width LDPC Decoding,” IEEE Trans. on Communications, vol. 70, no. 4, pp. 2213–2226, 2022.
[6] M. Geiselhart, A. Elkelesh et al., “Learning quantization in LDPC decoders,” in 2022 IEEE Globecom Workshops, 2022, pp. 467–472.
[7] Y. Ren, H. Harb et al., “A Generalized Adjusted Min-Sum Decoder for 5G LDPC Codes: Algorithm and Implementation,” 2024.
[8] J. Lewandowsky and G. Bauch, “Information-Optimum LDPC Decoders Based on the Information Bottleneck Method,” IEEE Access, vol. 6, pp. 4054–4071, 2018.
[9] M. Stark, L. Wang et al., “Decoding Rate-Compatible 5G-LDPC Codes With Coarse Quantization Using the Information Bottleneck Method,” IEEE Open Journal of the Comm. Society, vol. 1, pp. 646–660, 2020.
[10] P. Mohr and G. Bauch, “A Variable Node Design with Check Node Aware Quantization Leveraging 2-Bit LDPC Decoding,” in GLOBECOM 2022 - 2022 IEEE Global Communications Conf., 2022, pp. 3484–3489.
[11] ——, “Region-Specific Coarse Quantization with Check Node Awareness in 5G-LDPC Decoding,” arXiv:2406.14233, 2024.
[12] G. Chechik and N. Tishby, “Extracting Relevant Structures with Side Information,” in Advances in Neural Inf. Proc. Sys. MIT Press, 2002.
[13] S. Steiner and V. Kuehn, “Distributed compression using the information bottleneck principle,” in ICC 2021 - IEEE International Conference on Communications, 2021, pp. 1–6.
[14] B. M. Kurkoski and H. Yagi, “Quantization of Binary-Input Discrete Memoryless Channels,” IEEE Transactions on Information Theory, vol. 60, no. 8, pp. 4544–4552, Aug. 2014.