\addbibresource

literature.bib

Information-Set Decoding for Convolutional Codes

Niklas Gassner, Julia Lieb, Abhinaba Mazumder, and Michael Schaller

Abstract

In this paper, we present a framework for generic decoding of convolutional codes, which allows us to do cryptanalysis of code-based systems that use convolutional codes. We then apply this framework to information set decoding, study success probabilities and give tools to choose variables. Finally, we use this to attack two cryptosystems based on convolutional codes. In the case of [bolkema2017variations], our code recovered about 74% of errors in less than 10 hours each, and in the case of [almeidaBNS21], we give experimental evidence that 80% of the errors can be recovered in times corresponding to about 60 bits of operational security, with some instances being significantly lower.

1 Introduction

Current cryptographic systems rely on the hardness of integer factorisation or the discrete logarithm problem. These problems can be solved efficiently with Shor’s algorithm on a quantum computer [shor]. This led to a new interest in post-quantum cryptography, a part of which is cryptography based on linear codes. The first code-based cryptosystem was proposed in 1978 by McEliece [mceliece] and uses binary Goppa-codes. The proposal is still fundamentally intact and the proposal Classic McEliece [NISTMcEliece] is based on it. Round 4 of the currently ongoing post-quantum competition of the National Institute of Standards and Technology (NIST) features three code-based public-key submissions, the already mentioned Classic McEliece, BIKE [NISTBike], and HQC [NISTHQC].

There have been proposals not featured in the NIST post-quantum competition based on convolutional codes, such as [londahl2012new] (attacked in [londahlattack]), [almeidaBNS23], or [moufek2018new]. The rationale behind this is that the key size is linear in the memory of the convolutional code, while security levels are reliant on the size of the sliding generator or parity-check matrix, leading to an expected exponential increase in time for generic decoding methods and thus, increased security. For example, in the Viterbi algorithm [viterbi] the decoding complexity is exponential in the degree. Additionally for the receiver of the ciphertext, decoding a convolutional code can be done sequentially.

In this paper, we provide a framework for generic decoding of convolutional codes (which are not tail-biting) which is similar to sliding window decoding and apply it to information-set decoding. The framework resembles the idea of sequential decoding, which was introduced in [wozencraft]. We study success probabilities of several aspects of the algorithm and give tools to choose parameters in the framework. Finally, we use it to attack the system proposed in [bolkema2017variations] and a set of parameters of the system proposed in [almeidaBNS21]. Note that there exists an updated version [almeidaBNS23] with tail-biting convolutional codes, where our attack does not apply.

The paper is organised as follows.

In the second chapter, we give an overview of linear codes and especially convolutional codes, where we introduce all the notions and notations necessary for understanding our work.

In the third chapter, we cover the basics of information-set decoding, a class of generic decoding methods for linear codes, where we mainly talk about Prange’s algorithm.

The remaining chapters cover our own contribution. In Chapter 4, we discuss information-set decoding for convolutional codes. We first give a general framework for generic decoding of convolutional codes that reduces the problem to decoding a smaller block code multiple times and then discuss how we apply it to information-set decoding. We justify our choice of a depth-first algorithm, then discuss the issues posed by low weight codewords and how we adapt the algorithm to address these issues. Many of the components of our algorithm are probability based, so we provide tools that allow us to compute success probability for given parameters and thus help us to choose parameters.

Finally, in Section 5 we discuss our implementation of the attack and the results of attacking the cryptosystem proposed in [bolkema2017variations] and a parameter set of the cryptosystem proposed in [almeidaBNS21]. Since computation times were in most cases infeasible for the latter, we only provide an estimate of the computation time necessary for recovering errors for most random seeds. We also ran the algorithm in full for two cases, where the estimates of the computation time was low enough, and managed to recover the correct error in both cases.

1.1 Notation and Conventions

For the convenience of the reader, we summarize some notation that we will use throughout the paper.

–

$\mathbb{F}_{q}$ the finite field with $q$ elements,
–

$\mathcal{C}\subset\mathbb{F}_{q}[z]^{n}$ a convolutional code, where $n$ is its length and $k$ its rank as an $\mathbb{F}_{q}[z]$ -submodule of $\mathbb{F}_{q}[z]^{n}$ ,
–

$C\subseteq\mathbb{F}_{q}^{N}$ a block code of length $N$ and dimension $K$ ,
–

matrices and vectors are denoted with bold capital respectively small letters (e.g. $\mathbf{G}$ and $\mathbf{v}$ ),
–

$\mathbf{G}(z)$ and $\mathbf{H}(z)$ for a generator matrix respectively parity-check matrix of a convolutional code $\mathcal{C}$ ,
–

$I$ an information set.

2 Basics of Linear Codes and Convolutional Codes

In this section we present definitions and results for linear block and convolutional codes that will be important in later sections of this paper.

Definition 1.

Let $K,N\in\mathbb{N}$ with $K\leq N$ . A linear $[N,K]$ -block code $C$ is a $K$ -dimensional subspace of $\mathbb{F}_{q}^{N}$ . Hence, there is a full rank matrix $\mathbf{G}\in\mathbb{F}_{q}^{K\times N}$ such that

C=\{\mathbf{c}\in\mathbb{F}_{q}^{N}\ |\ \mathbf{c}=\mathbf{m\mathbf{G}}\ \text% {for}\ \mathbf{m}\in\mathbb{F}_{q}^{K}\}.

$\mathbf{G}$ is called generator matrix, $N$ length and ${K}/{N}$ rate of $C$ .

While the generator matrix is used for the encoding of a message, for the decoding of a received word, one usually uses another matrix, called parity-check matrix, as defined in the following.

Definition 2.

Let $C$ be a linear $[N,K]$ -block code. A full rank matrix $\mathbf{H}\in\mathbb{F}_{q}^{(N-K)\times N}$ such that

C=\{\mathbf{c}\in\mathbb{F}_{q}^{N}\ |\ \mathbf{Hc}^{\top}=0\}

is called parity-check matrix of $C$ .

Lemma 3.

Let $\mathbf{G}\in\mathbb{F}_{q}^{K\times N}$ be a generator matrix of a linear block code $C$ . A matrix $\mathbf{H}\in\mathbb{F}_{q}^{(N-K)\times N}$ is a parity-check matrix for $C$ if and only if $\mathbf{HG}^{\top}=0$ and $\mathbf{H}$ is full rank.

Definition 4.

The support of $\mathbf{c}=(\mathbf{c}_{1},\ldots,\mathbf{c}_{n})\in\mathbb{F}_{q}^{n}$ is defined as

{\rm supp}(\mathbf{c}):=\{i\in\{1,\ldots,n\}\,:\,\mathbf{c}_{i}\neq 0\}.

The (Hamming) weight of $\mathbf{c}=(\mathbf{c}_{1},\ldots,\mathbf{c}_{n})\in\mathbb{F}_{q}^{n}$ is defined as $|{\rm supp}(\mathbf{c})|$ , i.e. as the number of nonzero components of the vector $\mathbf{c}$ .

Definition 5.

The minimum distance of a linear block code $C$ is defined as

d:=\min\{\mathrm{wt}(\mathbf{c})\>|\>\mathbf{c}\in C\setminus\{\mathbf{0}\}\}.

In the following, we introduce the basics of convolutional codes, which can be understood as a generalization of linear block codes. More details about convolutional codes can e.g. be found in [bookchapter].

Definition 6.

A convolutional code $\mathcal{C}$ of length $n$ is defined as an $\mathbb{F}_{q}[z]$ -submodule of $\mathbb{F}_{q}[z]^{n}$ . Let $k$ be the rank of $\mathcal{C}$ . Then, $\frac{k}{n}$ is the rate of $\mathcal{C}$ and we call $\mathcal{C}$ an $(n,k)$ convolutional code. There exists a polynomial matrix $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ such that

\displaystyle\mathcal{C}=\{\mathbf{c}(z)=\mathbf{m}(z)\mathbf{G}(z)\ |\ % \mathbf{m}(z)\in\mathbb{F}_{q}[z]^{k}\}

$\mathbf{G}(z)$ is called generator matrix of $\mathcal{C}$ . If we write $\mathbf{G}(z)=\sum_{i=0}^{\mu}\mathbf{G}_{i}z^{i}$ with $\mathbf{G}_{\mu}\not=0$ , then $\mu$ is called memory of $\mathbf{G}(z)$ . The maximal degree of the full size (i.e. $k\times k$ ) minors of $\mathbf{G}(z)$ is called is called degree of $\mathcal{C}$ .

Note that a generator matrix is not unique and the memory $\mu$ depends on $\mathbf{G}(z)$ , however the degree $\delta$ of $\mathcal{C}$ does not depend on the choice of the generator matrix.

Remark 7.

Two full rank matrices $\mathbf{G}(z),\widetilde{\mathbf{G}}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ are generator matrices of the same code if and only if there exists a unimodular matrix $\mathbf{U}(z)\in\mathbb{F}_{q}[z]^{k\times k}$ such that

\displaystyle\widetilde{\mathbf{G}}(z)=\mathbf{U}(z)\mathbf{G}(z).

$\mathbf{G}(z)$ and $\widetilde{\mathbf{G}}(z)$ are then called equivalent.

In the next part of this section, we want to consider parity-check matrices for convolutional codes. In contrast to linear block codes, not each convolutional code admits a parity-check matrix.

Definition 8.

Let $\mathcal{C}$ be an $(n,k)$ convolutional code. A full row rank polynomial matrix $\mathbf{H}(z)\in\mathbb{F}_{q}[z]^{(n-k)\times n}$ is called parity-check matrix of $\mathcal{C}$ , if

\displaystyle\mathcal{C}=\{\mathbf{c}(z)\in\mathbb{F}_{q}[z]^{n}\ |\ \mathbf{H% }(z)\mathbf{c}(z)^{\top}=0\}.

If such a parity-check matrix exists for the code $\mathcal{C}$ , then $\mathcal{C}$ is called non-catastrophic.

In the following, we will describe how to check whether a convolutional code possesses a parity-check matrix and how to calculate it.

Definition 9.

For $k\leq n$ , $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ is called left prime if in all factorisations $\mathbf{G}(z)=\mathbf{L}(z)\widehat{\mathbf{G}}(z)$ , with $\mathbf{L}(z)\in\mathbb{F}_{q}[z]^{k\times k}$ and $\widehat{\mathbf{G}}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ , the left factor $\mathbf{L}(z)$ is unimodular.

Definition 10.

Let $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ with $k\leq n$ be full (row) rank. Then, there exists a unimodular matrix $\mathbf{U}(z)\in\mathbb{F}_{q}[z]^{n\times n}$ such that

\displaystyle\mathbf{G}_{cH}(z)=\mathbf{G}(z)\mathbf{U}(z)=\begin{pmatrix}h_{1% 1}(z)&0&\ldots&0&0&\ldots&0\\ \vdots&\ddots&\ddots&\vdots&\vdots&&\vdots\\ \vdots&&\ddots&0&\vdots&&\vdots\\ h_{k1}(z)&\dots&\ldots&h_{kk}(z)&0&\ldots&0\\ \end{pmatrix}

(1)

where $h_{ii}(z)$ are monic for $i=1,\dots,k$ and $\deg(h_{ii})>\deg(h_{ij})$ for each $j<i$ . The matrix $\mathbf{G}_{cH}(z)$ is called column Hermite form of $\mathbf{G}(z)$ .

Theorem 11.

[bookchapter] Consider $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ with $k\leq n$ . The following statements are equivalent:

1.

$\mathbf{G}(z)$ is left prime.
2.

The column Hermite form of $\mathbf{G}(z)$ is $[I_{k}\;0]$ .
3.

There exists $\mathbf{M}(z)\in\mathbb{F}_{q}[z]^{n\times k}$ such that $\mathbf{G}(z)\mathbf{M}(z)=I_{k}$ .
4.

$\mathbf{G}(z)$ can be completed to a unimodular matrix, i.e. there exists $\mathbf{E}(z)\in\mathbb{F}_{q}[z]^{(n-k)\times n}$ such that $\begin{pmatrix}\mathbf{G}(z)\\ \mathbf{E}(z)\end{pmatrix}$ is unimodular.
5.

$rk(\mathbf{G}(\lambda))=k$ for all $\lambda\in\bar{\mathbb{F}}_{q}$ , where $\bar{\mathbb{F}}_{q}$ denotes the algebraic closure of the field $\mathbb{F}_{q}$ .

Definition 12.

A polynomial matrix $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ is said to be delay-free if $\mathbf{G}(0)$ is full row rank. A convolutional code $\mathcal{C}\subset\mathbb{F}_{q}[z]^{n}$ is called delay-free if its generator matrices are delay-free.

It is easy to see that if one generator matrix of a convolutional code is delay-free, then all its generator matrices are delay-free and hence it makes sense to speak of delay-free convolutional codes. Moreover, note that by Theorem 11 all non-catastrophic convolutional codes are delay-free.

Theorem 13.

[bookchapter] Let $\mathcal{C}$ be an $(n,k)$ convolutional code. Then, $\mathcal{C}$ admits a parity-check matrix $\mathbf{H}(z)\in\mathbb{F}_{q}[z]^{(n-k)\times n}$ if and only if any of the generator matrices of $\mathcal{C}$ is left prime.

Corollary 14.

Let $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ be a left-prime generator matrix of a convolutional code $\mathcal{C}$ . A matrix $\mathbf{H}(z)\in\mathbb{F}_{q}[z]^{(n-k)\times n}$ is a parity-check matrix for $\mathcal{C}$ if and only if $\mathbf{H}(z)\mathbf{G}(z)^{\top}=0$ and $\mathbf{H}(z)$ is full rank.

Next we describe, how a parity-check matrix for a convolutional code can be calculated.

First the column Hermite form $\mathbf{G}_{cH}(z)$ of $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ is computed. According to Theorem 11 there exists a parity-check matrix if and only if $\mathbf{G}_{cH}(z)=[I_{k}\ \ 0]$ . If this is true, calculate $\mathbf{V}(z)\in\mathbb{F}_{q}[z]^{n\times n}$ such that $\mathbf{G}(z)\mathbf{V}(z)=[I_{k}\ \ 0]$ . Then, the last $n-k$ rows of $\mathbf{V}(z)^{\top}$ form a parity-check matrix for the convolutional code with generator matrix $\mathbf{G}(z)$ .

If $\mathbf{G}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ is not left prime, the corresponding convolutional code $\mathcal{C}$ does not posses a parity-check matrix. However, it is still possible to find a left prime polynomial matrix $\mathbf{H}(z)\in\mathbb{F}_{q}[z]^{(n-k)\times n}$ that plays the role of a parity check matrix for $\mathcal{C}$ in the sense that

\mathcal{C}\subsetneqq\ker\mathbf{H}(z).

Since $\mathbf{G}(z)$ is not left prime, one has

\mathbf{G}(z)=\begin{bmatrix}\mathbf{L}(z)&\mathbf{0}\end{bmatrix}\mathbf{U}(z)

where $\begin{bmatrix}\mathbf{L}(z)&\mathbf{0}\end{bmatrix}$ is the column Hermite form of $\mathbf{G}(z)$ and $\mathbf{U}(z)\in\mathbb{F}_{q}[z]^{k\times k}$ unimodular and $\mathbf{L}(z)\in\mathbb{F}_{q}[z]^{k\times k}$ is not the identity matrix. It follows that $\mathbf{G}(z)=\mathbf{L}(z)\mathbf{U}_{1}(z)$ , where $\mathbf{U}_{1}(z)$ consists of the first $k$ rows of $\mathbf{U}(z)$ , i.e. $\mathbf{U}_{1}(z)$ is left prime. Hence, the convolutional code $\widetilde{\mathcal{C}}$ with generator matrix $\mathbf{U}_{1}(z)\in\mathbb{F}_{q}[z]^{k\times n}$ is non-catastrophic and has a parity-check matrix $\mathbf{H}(z)$ . Finally, one has

\mathcal{C}\subsetneqq\widetilde{\mathcal{C}}=\ker\mathbf{H}(z).

In the following we introduce distance measures for convolutional codes.

Definition 15.

The weight of a polynomial vector $\mathbf{c}(z){=}\displaystyle\sum_{i\in{\mathbb{Z}_{\geq 0}}}\mathbf{c}_{i}z^{% i}\in\mathbb{F}_{q}[z]^{n}$ is defined as

{\rm wt}(\mathbf{c}(z))=\sum_{i\in{\mathbb{Z}_{\geq 0}}}{\rm wt}(\mathbf{c}_{i% }).

Definition 16.

Let $\cal C$ be an $(n,k)$ convolutional code. The free distance of $\cal C$ is defined as

d_{free}({\cal C})=\min\{{\rm wt}(\mathbf{c}(z))\,:\,\mathbf{c}(z)\in{\cal C}% \backslash\{\mathbf{0}\}\}.

The free distance is a measure for the total number of errors a convolutional code can correct. However, we will carry out the decoding of a convolutional code by splitting the received word and codeword into parts, so-called windows, and decode one window after the other. To this end, we need to introduce so-called sliding matrices.

Let

\mathbf{G}(z)=\displaystyle\sum_{i\in{\mathbb{Z}_{\geq 0}}}\mathbf{G}_{i}z^{i}% \in\mathbb{F}_{q}[z]^{k\times n}

be a generator matrix and

\mathbf{H}(z)=\displaystyle\sum_{i\in{\mathbb{Z}_{\geq 0}}}\mathbf{H}_{i}z^{i}% \in\mathbb{F}_{q}[z]^{(n-k)\times n}

be a parity-check matrix of an $(n,k)$ convolutional code, where $\mathbf{G}_{i}=\mathbf{0}$ if $i>\deg(\mathbf{G}(z))$ and $\mathbf{H}_{i}=\mathbf{0}$ if $i>\deg(\mathbf{H}(z))$ . For $\gamma\in{\mathbb{Z}_{\geq 0}}$ , consider the silding generator-matrix

\tilde{\mathbf{G}}^{\gamma}_{0}:=\begin{pmatrix}\mathbf{G}_{0}&\mathbf{G}_{1}&% \cdots&\mathbf{G}_{\gamma}\\ &\mathbf{G}_{0}&\cdots&\mathbf{G}_{\gamma-1}\\ &&\ddots&\vdots\\ &&&\mathbf{G}_{0}\end{pmatrix}

and the sliding parity-check matrix

\tilde{\mathbf{H}}^{\gamma}_{0}:=\begin{pmatrix}\mathbf{H}_{0}&&\\ \vdots&\ddots&\\ \mathbf{H}_{\gamma}&\cdots&\mathbf{H}_{0}\end{pmatrix}.

Definition 17.

Let $\cal C$ be an $(n,k)$ convolutional code. For $\gamma\in{\mathbb{Z}_{\geq 0}}$ , the $\gamma$ -th column distance of $\mathcal{C}$ is defined as

\displaystyle d_{\gamma}^{c}(\mathcal{C}):

\displaystyle=\min\left\{\sum_{t=0}^{\gamma}{\rm wt}(\mathbf{c}_{t})\ |\ % \mathbf{c}(z)\in\mathcal{C}\ \text{and}\ \mathbf{c}_{0}\neq 0\right\}.

If $\mathcal{C}$ is delay-free with generator matrix $\mathbf{G}(z)=\displaystyle\sum_{i\in{\mathbb{Z}_{\geq 0}}}\mathbf{G}_{i}z^{i}% \in\mathbb{F}_{q}[z]^{k\times n}$ , it follows

d^{c}_{\gamma}(\mathcal{C})=\min\{{\rm wt}((\mathbf{m}_{0}\cdots\mathbf{m}_{% \gamma})\cdot\tilde{\mathbf{G}}^{\gamma}_{0})\,:\;\mathbf{m}(z)\in{\mathbb{F}_% {q}[z]^{k}}\mbox{ with }\mathbf{m}_{0}\neq 0\}.

If $\mathcal{C}$ is non-catastrophic with parity-check matrix

\mathbf{H}(z)=\displaystyle\sum_{i\in{\mathbb{Z}_{\geq 0}}}\mathbf{H}_{i}z^{i}% \in\mathbb{F}_{q}[z]^{(n-k)\times n},

one obtains

d^{c}_{\gamma}(\mathcal{C})=\min\left\{\sum_{t=0}^{\gamma}{\rm wt}(\mathbf{c}_% {t})\ |\ \tilde{\mathbf{H}}^{\gamma}_{0}\cdot\begin{pmatrix}\mathbf{c}_{0}\\ \vdots\\ \mathbf{c}_{\gamma}\end{pmatrix}=\mathbf{0}\ \text{and}\ \mathbf{c}_{0}\neq% \mathbf{0}\right\}.

One has that

d^{c}_{0}(\mathcal{C})\leq d^{c}_{1}(\mathcal{C})\leq\cdots\leq\lim_{\gamma% \rightarrow\infty}d^{c}_{\gamma}(\mathcal{C})\leq d_{free}({\cal C}),

reflecting the fact that the larger we choose the decoding window, the more errors can be corrected inside this window.

We now write the equation $\mathbf{m}(z)\mathbf{G}(z)=\mathbf{c}(z)$ with $\ell:=\deg(\mathbf{m}(z))$ as

\displaystyle(\mathbf{m}_{0}\cdots\mathbf{m}_{\ell})\left(\begin{array}[]{% cccccc}\mathbf{G}_{0}&\dots&\mathbf{G}_{\mu}&&&\\ &\mathbf{G}_{0}&\cdots&\mathbf{G}_{\mu}&&\\ &&\ddots&&\ddots&\\ &&&\mathbf{G}_{0}&\cdots&\mathbf{G}_{\mu}\\ \end{array}\right)=(\mathbf{c}_{0}\cdots\mathbf{c}_{\ell+\mu}).

(6)

Assume we decoded the messages up to a time instant $t-1$ , then we can use the following equation:

\displaystyle(\mathbf{m}_{0},\dots,\mathbf{m}_{t-1}\ |\ \mathbf{m}_{t},\dots,% \mathbf{m}_{t+\gamma})\left(\begin{array}[]{cccccc}\mathbf{G}_{0}&\dots&% \mathbf{G}_{t-1}&\mathbf{G}_{t}&\dots&\mathbf{G}_{t+\gamma}\\ &\ddots&\vdots&\vdots&&\vdots\\ &&\mathbf{G}_{0}&\mathbf{G}_{1}&\cdots&\mathbf{G}_{\gamma+1}\\ \hline\cr&&&\mathbf{G}_{0}&\cdots&\mathbf{G}_{\gamma}\\ &&&&\ddots&\vdots\\ &&&&&\mathbf{G}_{0}\end{array}\right)=(\mathbf{c}_{0}\cdots\mathbf{c}_{t+% \gamma}).

(13)

and rewrite it as

\displaystyle(\mathbf{m}_{t},\dots,\mathbf{m}_{t+\gamma})\left(\begin{array}[]% {ccc}\mathbf{G}_{0}&\cdots&\mathbf{G}_{\gamma}\\ &\ddots&\vdots\\ &&\mathbf{G}_{0}\end{array}\right)=(\mathbf{c}_{t}\cdots\mathbf{c}_{t+\gamma})% -(\mathbf{m}_{0},\dots,\mathbf{m}_{t-1})\begin{pmatrix}\mathbf{G}_{t}&\dots&% \mathbf{G}_{t+\gamma}\\ \vdots&&\vdots\\ \mathbf{G}_{1}&\cdots&\mathbf{G}_{\gamma+1}\end{pmatrix}

Denote the received word after transmission by $\mathbf{r}(z)=\sum_{i\in{\mathbb{Z}_{\geq 0}}}\mathbf{r}_{i}z^{i}\in\mathbb{F}% _{q}[z]^{n}$ .

Then, $(\mathbf{m}_{t},\dots,\mathbf{m}_{t+\gamma})$ can be recovered by decoding

(\mathbf{r}_{t}\cdots\mathbf{r}_{t+\gamma})-(\mathbf{m}_{0},\dots,\mathbf{m}_{% t-1})\begin{pmatrix}\mathbf{G}_{t}&\dots&\mathbf{G}_{t+\gamma}\\ \vdots&&\vdots\\ \mathbf{G}_{1}&\cdots&\mathbf{G}_{\gamma+1}\end{pmatrix}

in the block code with generator matrix $\tilde{\mathbf{G}}^{\gamma}_{0}$ .

This process can be iterated by sliding the decoding window by $\gamma+1$ time steps to $(\mathbf{r}_{t+\gamma+1}\cdots\mathbf{r}_{t+2\gamma+1})$ and in this way the whole message $\textbf{m}(z)$ can be recovered with several decoding steps in the block code with generator matrix $\tilde{\mathbf{G}}^{\gamma}_{0}$ .

3 Basics of Information-Set Decoding

Information-set decoding (ISD) is a generic decoding technique for linear codes. In this section we introduce the simplest form of ISD, the Prange algorithm [Prange62]. Several improvements exist, such as [LeeB88], [Stern88], [dumer1991minimum], [Peters10], [BernsteinLP11], [MayMT11], and [BeckerJMM12]. The general idea is as follows. Given an erroneous codeword $\mathbf{c}+\mathbf{e}$ , we wish to find a set $I\subset\{1,\ldots,N\}$ such that $I\cap\operatorname{supp}(\mathbf{e})=\emptyset$ and $\mathbf{c}$ is uniquely determined by the positions indexed by $I$ . If we have done so, we can easily recover the codeword $\mathbf{c}$ with linear algebra, hence also the error. In order to find such a set we try random information sets, do the procedure described above and check if the weight of the recovered error is within specified bounds. We continue until we have found an error that is small enough.

Definition 18.

Let $C$ be an $[N,K]$ linear block code and $\mathbf{G}$ be a generator matrix of $C$ . For any subset $I\subseteq\{1,\ldots,N\}$ we define $\mathbf{G}_{I}$ to be the submatrix of $\mathbf{G}$ with columns indexed by $I$ . Then any subset $I\subseteq\{1,\ldots,N\}$ of size $K$ such that $\mathbf{G}_{I}$ is invertible is called an information set.

More formally, we do the following in the Prange algorithm. Let $t$ be an upper bound for the number of errors, $\mathbf{G}$ be a generator matrix of the code, $\mathbf{r}$ the received word, $\mathbf{G}_{I}$ the submatrix of $\mathbf{G}$ with columns indexed by $I$ and $\mathbf{r}_{I}$ the vector consisting of the entries of $\mathbf{r}$ indexed by $I$ . The algorithm is as follows.

(1)

Pick a random information set $I$ .
(2)

Solve the equation $\mathbf{m}\mathbf{G}_{I}=\mathbf{r}_{I}$ for $\mathbf{m}$ .
(3)

Set $\mathbf{e}=\mathbf{r}-\mathbf{m}\mathbf{G}$ .
(4)

If ${\rm wt}(\mathbf{e})\leq t$ , output $\mathbf{e}$ , else go back to step $1$ .

If $t$ is within the unique decoding radius, the algorithm succeeds if an information set $I$ is found such that $I$ has no intersection with the support of the error $\mathbf{e}$ .

Assuming that every subset of size $K$ is an information set (which is only true if the code is MDS, i.e., the minimum distance is $d=N-K+1$ ) and that there is a unique solution, we get for each time we do steps $(1),(2)$ and $(3)$ , a probability of $\binom{N-K}{t}/{\binom{N}{t}}$ that $I\cap\operatorname{supp}(e)=\emptyset$ . The expected number of iterations, which we call the workfactor, is the reciprocal of this, denoted by

\mathrm{WF}_{t}=\frac{\binom{N}{t}}{\binom{N-K}{t}}.

4 ISD for Delay-Free Convolutional Codes

In this section, we study ISD for delay-free convolutional codes. For this, we rewrite a sliding generator matrix of $\mathbf{G}(z)=\sum_{i=0}^{d_{1}}\mathbf{G}_{i}z^{i}$ as

\displaystyle\tilde{\mathbf{G}}=\begin{pmatrix}\tilde{\mathbf{G}}^{\gamma}_{0}% &\tilde{\mathbf{G}}^{\gamma}_{1}&\cdots&\tilde{\mathbf{G}}^{\gamma}_{l}&&&\\ &\ddots&\ddots&\vdots&\ddots&&\\ &&\tilde{\mathbf{G}}^{\gamma}_{0}&\tilde{\mathbf{G}}^{\gamma}_{1}&\cdots&% \tilde{\mathbf{G}}^{\gamma}_{l}\\ &&&\tilde{\mathbf{G}}^{\gamma}_{0}&\ddots&\vdots\\ &&&&\ddots&\tilde{\mathbf{G}}^{\gamma}_{1}\\ &&&&&\tilde{\mathbf{G}}^{\gamma}_{0}\\ \end{pmatrix},

(14)

where $\tilde{\mathbf{G}}^{\gamma}_{i}$ has size $k\gamma\times n\gamma$ . Note that when $\gamma=1$ , we have $\tilde{\mathbf{G}}^{\gamma}_{i}=\mathbf{G}_{i}$ , but $\tilde{\mathbf{G}}^{\gamma}_{i}$ can consist of several of the $\mathbf{G}_{j}$ ’s, for example, for $\gamma=2$ , we have

\tilde{\mathbf{G}}^{2}_{0}=\begin{pmatrix}\mathbf{G}_{0}&\mathbf{G}_{1}&% \mathbf{G}_{2}\\ &\mathbf{G}_{0}&\mathbf{G}_{1}\\ &&\mathbf{G}_{0}\end{pmatrix},

and

\tilde{\mathbf{G}}^{2}_{1}=\begin{pmatrix}\mathbf{G}_{3}&\mathbf{G}_{4}&% \mathbf{G}_{5}\\ \mathbf{G}_{2}&\mathbf{G}_{3}&\mathbf{G}_{4}\\ \mathbf{G}_{1}&\mathbf{G}_{2}&\mathbf{G}_{3}\end{pmatrix}.

We will also write the error vector as $\mathbf{e}=\begin{pmatrix}\tilde{\mathbf{e}}_{0}&\tilde{\mathbf{e}}_{1}&\ldots% &\tilde{\mathbf{e}}_{s-1}\end{pmatrix}$ , where $\tilde{\mathbf{e}}_{i}$ is of size $n\gamma$ .

4.1 Generic Decoding of Convolutional Codes

Given the encrypted message $\mathbf{r}(z)=\mathbf{m}(z)\mathbf{G}(z)+\mathbf{e}(z)$ , we aim to find $\mathbf{m}(z)\mathbf{G}(z)$ and consequently $\mathbf{m}(z)$ .

As before, we turn a polynomial vector $\mathbf{v}(z)=\sum_{i=0}^{d}\mathbf{v}_{i}z^{i}\in\mathbb{F}_{q}[z]^{n}$ into $\mathbf{v}=\begin{pmatrix}\mathbf{v}_{0}&\mathbf{v}_{1}&\ldots&\mathbf{v}_{d}% \end{pmatrix}\in\mathbb{F}_{q}^{nd}$ . Let $\tilde{\mathbf{v}}_{i}$ be the concatenation of several consecutive $\mathbf{v}_{i}$ ’s. We get $\mathbf{v}=\begin{pmatrix}\tilde{\mathbf{v}}_{0}&\ldots&\tilde{\mathbf{v}}_{s-% 1}\end{pmatrix}.$

The idea is to iteratively decode the convolutional code with the matrix $\tilde{\mathbf{G}}$ as follows:

1.

Step 0: Use generic decoding to recover $\tilde{\mathbf{e}}_{0}$ from $\tilde{\mathbf{r}}_{0}=\tilde{\mathbf{m}}_{0}\tilde{\mathbf{G}}^{\gamma}_{0}+% \tilde{\mathbf{e}}_{0}$ and recover $\tilde{\mathbf{m}}_{0}$ from $\tilde{\mathbf{m}}_{0}\tilde{\mathbf{G}}^{\gamma}_{0}$ with linear algebra.
2.
Step j:
1. (a)
  
  Compute $\tilde{\mathbf{m}}_{j}\tilde{\mathbf{G}}^{\gamma}_{0}+\tilde{\mathbf{e}}_{j}=% \tilde{\mathbf{r}}_{j}-\sum_{i=0}^{j-1}\tilde{\mathbf{m}}_{i}\tilde{\mathbf{G}% }^{\gamma}_{j-i}$ .
2. (b)
  
  Recover $\tilde{\mathbf{e}}_{j}$ and $\tilde{\mathbf{m}}_{j}$ with generic decoding for the code generated by $\tilde{\mathbf{G}}^{\gamma}_{0}$ .

For cryptographic purposes there exists only one error $\mathbf{e}(z)$ with at most a certain Hamming weight such that $\mathbf{r}(z)$ can be written as codeword plus error. However, we might not find a unique solution $\tilde{\mathbf{e}}_{j}$ in each block. So at each step, we produce a list of possible error vectors $[\tilde{\mathbf{e}}_{j,1},\ldots,\tilde{\mathbf{e}}_{j,n_{j}}]$ where $\tilde{\mathbf{e}}_{j,\rho}\in\mathbb{F}_{q}^{N}$ for $\rho\in\{1,\ldots,n_{j}\}$ are the errors that we recovered using ISD in Step $j$ . We also get from the list of possible errors a list of possible messages $[\tilde{\mathbf{m}}_{j,1},\ldots,\tilde{\mathbf{m}}_{j,n_{j}}]$ . This gives us a tree, and we aim to find a branch that goes to the bottom of the tree. To find such a branch, we use a depth-first algorithm. We will first describe this procedure and then justify the choice of a depth-first algorithm over a breadth-first algorithm. As a generic decoding method, we use information set decoding, but in principle, any generic decoding method can be used.

Note further that one can also use overlapping parts of the received word for decoding and use a consistency check on the recovered messages to limit the number of possibilities in the search. For our experiments that did not give huge speedups, but it might be useful if one uses different codes.

4.2 Using ISD for Convolutional Codes

We want to use information set decoding (ISD) as a generic method of decoding. Let $t$ be the expected weight of an error block $\tilde{\mathbf{e}}_{i}$ , rounded up to the next integer. While we expect $t$ errors in each block, the chance of each block having at most $t$ errors can be very small. So we introduce a variable $\varepsilon$ and allow up to $t+\varepsilon$ errors in each block. Note that $\varepsilon$ can be chosen so that an arbitrarily high percentage of the errors satisfy this condition, but increasing $\varepsilon$ increases the amount of solutions we receive at each step and thus computational time. We will discuss how to choose $\varepsilon$ in Section 4.3.1.

Note that the first solution we find might not be the correct one, or, if the algorithm is currently running on a “wrong” branch, no solution may be found. So aborting the ISD search after an error is found is not an option. Our solution to this issue is to simply run the algorithm for a fixed amount of iterations and collect all solutions. The number of such iterations can be chosen such that it is almost guaranteed that the correct error will be in the list of outputs at every step as we will explain later. Thus, if no solutions are found, we expect to be on a wrong branch.

4.2.1 The Choice of a Depth-First Algorithm

Depth-first algorithms, in general, have lower memory usage than breadth-first algorithms. However, our main reason for choosing a depth-first algorithm is that for our purposes, we expect a depth-first algorithm to run faster than a breadth-first algorithm. The reasons for this we shall explain in this subsection.

Depth-first search works as follows. We start with a tree and always move down the left-most possible branch. Once we cannot move down any further we backtrack and then move down the left-most branch we have not explored yet. Note that we do not have to store what we have explored already by storing in an appropriate way which branches we have to explore.

In Figure 1, we labeled the nodes in the tree according to the sequence in which we encounter them in a depth first search. To be precise we go from $1$ to $2$ , $3$ , $4$ . Then we backtrack to $3$ , from which we go to $5$ . After this we backtrack all the way to $1$ from where we go to $6$ and traverse the remaining graph.

Figure 1: Depth-First Search.

We only worked with the most basic instance of ISD, the Prange algorithm. Note that, after a choice of an information set $I$ , Prange succeeds in recovering an error if the support of the error is contained in the complement of $I$ . This is more likely to happen for errors of smaller weight, so errors of smaller weight have a higher probability of getting found first by the algorithm than errors of higher weight.

We allow up to $t+\varepsilon$ errors in each window. However, in most cases, the correct error vector will have a smaller weight. This means that the correct error has a tendency to appear at one of the first positions of the output lists of the ISD search at each step. So in most steps, the depth-first algorithm will immediately proceed with the correct error.

Furthermore we heuristically expect a kind of avalanche effect or diffusion, meaning that if we have several wrong messages the error will spread and there will be no close codewords to the received word. Hence the algorithm should not go too deep in the depth first search for wrong errors and messages.

Our observations from the experiments are in line with these expectations.

4.2.2 Low Weight Codewords

Let $\mathbf{G}(z)=\sum\mathbf{G}_{i}z^{i}\in\mathbb{F}_{q}[z]^{k\times n}$ and $\mathbf{H}(z)=\sum\mathbf{H}_{i}z^{i}\in\mathbb{F}_{q}[z]^{(n-k)\times n}$ , with $\mathbf{H}_{0}$ full rank, be a generator matrix and a parity-check matrix, respectively, of the same non-catastrophic convolutional code $\mathcal{C}$ . Note that each non-catastrophic convolutional code possesses a left-prime parity-check matrix $\mathbf{H}(z)$ , and hence, in particular a parity-check matrix $\mathbf{H}(z)$ such that $\mathbf{H}_{0}$ is full rank. Then, for $\gamma\in{\mathbb{Z}_{\geq 0}}$ , it is an easy consequence of Lemma 3 that $\tilde{\mathbf{H}}^{\gamma}_{0}$ is a parity-check matrix for the block code with generator matrix $\tilde{\mathbf{G}}^{\gamma}_{0}$ . Note that the block code with generator matrix $\tilde{\mathbf{G}}^{\gamma}_{0}$ (like each block code) always possesses a parity-check even if the corresponding convolutional code is catastrophic. Moreover, it is important to note that, due to the restriction $\mathbf{v}_{0}\neq 0$ , respectively $\mathbf{u}_{0}\neq 0$ , in the definition of column distances, the distance of the block code with generator matrix $\tilde{\mathbf{G}}^{\gamma}_{0}$ is upper bounded by $d_{\gamma}^{c}$ but can be much smaller. While a convolutional code $\mathcal{C}$ with generator matrix $\mathbf{G}(z)$ may have a large minimum distance, the linear code generated by the matrix

\tilde{\mathbf{G}}^{\gamma}_{0}=\begin{pmatrix}\mathbf{G}_{0}&\mathbf{G}_{1}&% \cdots&\mathbf{G}_{\gamma}\\ &\mathbf{G}_{0}&\ddots&\vdots\\ &&\ddots&\mathbf{G}_{1}\\ &&&\mathbf{G}_{0}\end{pmatrix}

can have codewords of small weight. Note that the code generated by $\tilde{\mathbf{G}}^{\gamma}_{0}$ contains codewords of the form $\begin{pmatrix}\mathbf{0}&\cdots&\mathbf{0}&\mathbf{c}\end{pmatrix},$ where $\mathbf{c}$ lies in $\text{rowspan}(\mathbf{G}_{0})$ . Note further that if $\mathcal{C}$ is delay-free (that is, $\mathbf{G}_{0}$ has full rank), every non-zero codeword of $\text{rowspan}(\tilde{\mathbf{G}}^{\gamma}_{0})$ is of the form

\begin{pmatrix}\mathbf{0}&\cdots&\mathbf{0}&\mathbf{c}&\mathbf{r}_{1}&\cdots&% \mathbf{r}_{i}\end{pmatrix}

for a non-zero $\mathbf{c}\in\text{rowspan}(\mathbf{G}_{0})$ . So the minimum distance of the code generated by $\tilde{\mathbf{G}}^{\gamma}_{0}$ equals the one of the code generated by $\mathbf{G}_{0}$ .

Actually, if one has a message of the form $\textbf{m}=(m_{0},\ldots,m_{\gamma})$ where $i\in\{0,\ldots,\gamma\}$ is minimal such that $m_{i}\neq 0$ , then the weight of $\textbf{c}=(c_{0},\ldots,c_{\gamma})=\textbf{m}\tilde{\textbf{G}}_{0}$ is lower bounded by $d^{c}_{\gamma-i}(\mathcal{C})$ and $c_{0}=\cdots=c_{i-1}=0$ , $c_{i}\neq 0$ .

4.2.3 Undetectable Errors

Since we are working with Prange’s algorithm, errors can’t be detected if the support of a codeword is contained in the support of the error. This is formalized in the following proposition, which states that an information set $I$ of a a linear code $C$ over $\mathbb{F}_{q}$ must intersect the support of each codeword.

Proposition 19.

Let $I$ be an information set for a $[N,K]$ -code $C$ and $\mathbf{c}\in C\setminus\{0\}$ . Then $\mathrm{supp}(\mathbf{c})\cap I\neq\emptyset$ .

Proof.

Let $\mathbf{c}$ be in $C\setminus\{0\}$ and $I\subset\{1,\ldots,N\}$ a set of size $K$ that has trivial intersection with the support of $\mathbf{c}$ . Pick a basis of $C$ that contains $\mathbf{c}$ and arrange them in a generator matrix $\mathbf{G}$ such that the last row is $\mathbf{c}$ . The last row of the matrix $\mathbf{G}_{I}$ , consisting of the columns indexed by $I$ , is a zero row, so $\mathbf{G}_{I}$ is not invertible. This means that $I$ is not an information set. ∎

Note that in these cases, the error $\mathbf{e}$ can be written as $\mathbf{e}=\mathbf{e^{\prime}}+\mathbf{c}$ , where $\mathbf{c}$ is a codeword whose support is contained in the support of $\mathbf{e}$ and $\mathbf{e^{\prime}}$ is an error which can be recovered by the Prange algorithm. So we can ensure that $\mathbf{e}$ is in our list of outputs by pre-computing a list of low-weight codewords (for our experiments codewords of weight up to 2 were sufficient for most cases) and then, for all possible errors $\mathbf{e^{\prime}}$ in the output of the Prange algorithm and low-weight codewords $\mathbf{c}$ , appending $\mathbf{e^{\prime}}+\mathbf{c}$ at the end of the list of outputs if $\operatorname{wt}(\mathbf{e^{\prime}}+\mathbf{c})\leq t+\varepsilon$ . It is important that the new solutions get appended at the end of the list when working in a setting where it is unlikely to happen that the support of an error contains the support of a codeword.

4.2.4 The Choice of $\gamma$

Recall that $\gamma$ is the number of distinct $\mathbf{G}_{i}$ ’s in our matrix $\mathbf{G}_{0}^{\gamma}$ which we use for iterative decoding (see beginning of Section 4). Note that the choice of $\gamma$ can heavily impact the decoding complexity: choosing a large $\gamma$ will increase the ISD cost in each step, while choosing a small $\gamma$ increases the number of steps. This could potentially substantially increase the nodes the algorithm will transverse in the depth-first search. Thus, $\gamma$ needs to be carefully chosen depending on the parameters of the cryptosystem. In fact, if we choose $\gamma$ to be the maximum of the memory of the generator matrix and the degree of the error vector $\mathbf{e}(z)$ , then we are in the setting of decoding a single block code. On the other hand, a small $\gamma$ might also not be suitable for decoding. Consider the case

\mathbf{G}(z)=\begin{pmatrix}g_{1}(z)&g_{2}(z)\end{pmatrix},

where $g_{1}(z),g_{2}(z)\in\mathbb{F}_{2}[z]$ are of large degree. If we choose $\gamma=1$ , that is, $\mathbf{G}_{0}^{\gamma}$ consists of the constant terms of $g_{1}(z)$ and $g_{2}(z)$ , then only a small percentage of errors will have at most $1$ error in each block. But if we decode with at most $2$ errors per block, then we get $4$ solutions each step, at which point we are essentially just bruteforcing all solutions.

We have not found a rigorous way of choosing $\gamma$ optimally and have resorted to experimentation.

4.3 ISD for a Fixed Amount of Errors Uniformly Distributed Over The Error Vector

Our goal in this section is to reduce the information-set decoding of the convolutional code to the ISD of equations of the form

\mathbf{m}_{i}\tilde{\mathbf{G}^{\gamma}_{0}}+\tilde{\mathbf{e}_{i}},

or equivalently

\tilde{\mathbf{H}}_{0}\tilde{\mathbf{e}}^{T}_{i}=\mathbf{s}_{i}.

The analysis consists of several steps. We first want to compute the probability that blocks of the error contain at most $t+\varepsilon$ non-zero entries for a given $\varepsilon$ .

We will assume the following: $\mathcal{C}$ is a delay-free convolutional code, the error vector $\mathbf{e}$ of length $sN$ has a total weight of $t_{e}$ , with the errors uniformly distributed over its length.

4.3.1 Exceedence Probability for the Error Distributions

Let $t_{e}$ be the total error weight, $N$ the length of the code generated by $\tilde{\mathbf{G}}^{\gamma}_{0}$ , $s$ the number of blocks of size $N$ we need for decoding, $t=\lceil{t_{e}}/{s}\rceil$ , the (rounded) expected number of errors in each size $N$ window, and $\varepsilon$ the tolerance for the error weights.

To give the exact probability of an error having at most weight $t+\varepsilon$ in each window of size $N$ , we consider the polynomial

p(z)=1+(q-1)\binom{N}{1}z+(q-1)^{2}\binom{N}{2}z^{2}+\ldots+(q-1)^{t+% \varepsilon}\binom{N}{t+\varepsilon}.

We compute

q(z)=p(z)^{s}

and recover the $t_{e}$ ’th coefficient of $q(z),$ which gives us the number of error vectors of weight $t_{e}$ with at most $t+\varepsilon$ errors in each block of size $N$ . Dividing by the total number of weight $t_{e}$ error vectors, which is

(q-1)^{t_{e}}\binom{sN}{t_{e}},

yields the probability of having at most $t+\varepsilon$ errors in all blocks. One can reformulate this in terms of a multivariate hypergeometric distribution. More information about multivariate hypergeometric distributions can be found in [johnson1997discrete]. Note that in each block we get a hypergeometric distribution which we use for an estimate of the tails in the next section.

4.3.2 Estimates, Hypergeometric Distribution

The hypergeometric distribution has an exponentially decaying tail; see [chvatal1979tail]. Let us consider the tail of a hypergeometric distribution given by

H(A,B,b,l)=\binom{B}{b}^{-1}\sum_{i=l}^{b}\binom{A}{i}\binom{B-A}{b-i}.

Define $\alpha$ through the equation

l=(A/B+\alpha)b.

where $\alpha$ should be greater than $0$ . One can show as in [chvatal1979tail] that we get for the tail

H(A,B,b,l)\leq e^{-2\alpha^{2}b}.

In the setting above we have $A=N,B=Ns,b=ts$ . Hence

A/B=1/s,

and

l=t+\alpha ts.

Thus, if $\varepsilon=\alpha ts$ , then we get that the probability of having at least $t+\varepsilon$ errors is at most

e^{-2\alpha^{2}ts}=e^{-2\alpha\varepsilon}.

For the weight of the block to exceed $t+\varepsilon$ , we set $l=t+\varepsilon+1$ , getting $\alpha=\frac{\varepsilon+1}{st}.$ Then the probability of the block having weight more than $t+\varepsilon$ is at most

e^{-2\alpha^{2}ts}=e^{-2\alpha(\varepsilon+1)}.

By the Union bound, we have:

	$\displaystyle\mathbb{P}[\operatorname{max}_{j}\{\operatorname{wt}(\tilde{% \mathbf{e}}_{j}),j=0,\cdots,s-1\}>t+\varepsilon]$	$\displaystyle\leq sH(N,Ns,ts,t+\varepsilon+1)$
		$\displaystyle\leq se^{-2\alpha^{2}ts}=se^{-2\alpha(\varepsilon+1)}$

where $\tilde{\mathbf{e}}_{j}$ is the $j$ -th error coefficient.

4.4 Estimates on The Number of Solutions When the Amount of Errors Exceeds the Unique Decoding Radius

Given a (random) full-rank matrix $\mathbf{H}\in\mathbb{F}_{q}^{(N-K)\times N}$ , an error vector $\mathbf{e}\in\mathbb{F}_{q}^{N}$ of weight $t$ and $\mathbf{s}\in\mathbb{F}_{q}^{N-K}$ we want to compute the expected amount of solutions of the equation

\mathbf{He}^{T}=\mathbf{s}

(15)

of weight at most $t+\varepsilon$ . We care about the conditional expectation

		$\displaystyle\mathbb{E}[\#\text{solutions with at most }t+\varepsilon\text{ % errors}\>\|\>\text{at least one such solution exists}]$
		$\displaystyle\approx 1+\mathbb{E}[\#\text{solutions with at most }t+% \varepsilon\text{ errors}].$		(16)

Note that every solution to Equation (15) lives in

S:=\mathbf{e}+\mathrm{ker}(\mathbf{H}),

which has $q^{K}$ elements. Notice that if $\mathbf{H}$ is “random”, then we expect this set to be uniformly distributed in $\mathbb{F}_{q}^{N}$ . So we expect that the number of vectors of weight $\tilde{t}$ in $S$ to be

\frac{(q-1)^{\tilde{t}}}{q^{N-K}}\binom{N}{\tilde{t}}.

Summing over all $\tilde{t}\leq t+\varepsilon$ yields the estimate for (4.4).

4.4.1 Choosing Maxiter and Increased ISD Cost for Allowing $t+\varepsilon$ Errors.

The work factor of the Prange algorithm for an $[N,K]-$ code with an error vector of weight $t$ is given by

\mathrm{WF}_{t}=\frac{\binom{N}{t}}{\binom{N-K}{t}}.

Similarly, if we allow $t+\varepsilon$ errors with $\varepsilon\geq 1$ the work factor changes to

\mathrm{WF}_{t+\varepsilon}=\frac{\binom{N}{t+\varepsilon}}{\binom{N-K}{t+% \varepsilon}}.

The quotient of the two work factors simplifies to

\frac{\mathrm{WF}_{t+\varepsilon}}{\mathrm{WF}_{t}}=\displaystyle\prod_{j=0}^{% \varepsilon-1}\frac{N-t-j}{N-K-t-j}\approx\left({\frac{N-t}{N-K-t}}\right)^{% \varepsilon}.

Note further that while $\mathrm{WF}_{t+\varepsilon}$ is the expected number of iterations to find a solution of weight $t+\varepsilon$ , it is unlikely that we will find the correct error at each step with this number of iterations. To ensure that the correct error is in the list of outputs of our ISD algorithm, we run it for a fixed amount of iterations, chosen to ensure this. To make this choice, note that the chance of finding a fixed detectable error of weight $t+\epsilon$ in one iteration of Prange’s algorithm is given by

\text{WF}_{t+\varepsilon}^{-1}=\frac{\binom{N-K}{t+\epsilon}}{\binom{N}{t+% \epsilon}}.

So the probability of finding a given error of weight $t+\epsilon$ at all $s$ steps with $W$ ISD iterations at each step is given by

\left(1-\left(1-\frac{\binom{N-K}{t+\epsilon}}{\binom{N}{t+\epsilon}}\right)^{% W}\right)^{s}.

Note that this is a lower bound for the success probability of recovering an error vector $\mathbf{e}=\begin{pmatrix}\tilde{\mathbf{e}}_{0}&\tilde{\mathbf{e}}_{1}&\ldots% &\tilde{\mathbf{e}}_{s-1}\end{pmatrix}$ with $\mathrm{wt}(\tilde{\mathbf{e}}_{i})\leq t+\epsilon$ for all $i=1,\ldots,s-1$ since the weight of an $\tilde{\mathbf{e}}_{i}$ can be smaller than $t+\epsilon$ , which increases the probability of it getting recovered.

4.4.2 Choice of $\varepsilon$

In this subsection we summarize the impact of the choice of $\varepsilon$ . Increasing $\varepsilon$ makes it more likely that errors satisfy the weight condition on each block and thus allows to recover more errors and messages. However, it also increases the work factor for ISD and the number of solutions found in each step, which can heavily increase the computational time.

All of these factors need to be carefully considered when choosing $\varepsilon$ .

Figure 2: Calculating a suitable value of

\varepsilon

for [bolkema2017variations], Example 5.11.

For example, for the attack on [bolkema2017variations] that we will discuss in Section 5.1, we have the parameters $q=2$ , $t_{e}=140$ , $s=167$ , $K=36$ and $N=60.$ This gives us $t=1$ , and we get the plot in Figure 2 which helps us to choose the value of $\varepsilon$ . We chose $\varepsilon=3$ , as about $78\%$ of errors should have block-weight at most $4$ , as can be seen in Figure 2, and the work factor is still comparatively small as discussed in Section 4.4.1, and as can be seen in Figure 3. We use this value of $\varepsilon$ for our experiments, as described in Section 5.1.

Figure 3:

{\mathrm{WF}_{t+\varepsilon}}/{\mathrm{WF}_{t}}

for Prange on the

[60,36]-

code, as we vary

\varepsilon

5 Experiments

In this section we describe how to attack two cryptosystems and what modifications of the procedures described above are necessary in order to attack them in practice. The systems we attack are from [almeidaBNS21] (note that this is not the authors current version) and [bolkema2017variations], Chapter 5.

The first challenge is that the codes constructed in both papers are not delay-free. This does not pose a major challenge though since we can construct non-catastrophic convolutional codes that contain these codes, see Section 2. Then we use ISD with a fixed amount of iterations with respect to these non-catastrophic codes.

A second problem that we encounter with both systems is the existence of low weight codewords. To account for this we find codewords of weight $1$ or $2$ with a brute-force search. Then we add these to all the solutions ISD found and keep only the results that have the weight within specified bounds. Note that ISD might find several solutions since we are outside of the unique decoding radius.

With these modifications we use a depth first search after each round of ISD as described below.

$L_{0},M_{0}$ are outputs of first stage of ISD, i.e., $L_{0}=[\tilde{\mathbf{e}}_{0,1},\ldots,\tilde{\mathbf{e}}_{0,n_{0}}],M_{0}=[% \tilde{\mathbf{m}}_{0,1},\ldots,\tilde{\mathbf{m}}_{0,n_{0}}]$ . Furthermore $t_{e}$ is the total error weight and $\tilde{\mathbf{G}}^{\gamma}_{i}$ are the matrices we construct from $\mathbf{G}$ as above. Moreover $\mathcal{C}_{low},\mathcal{M}_{low}$ are the low weight codewords and corresponding messages, and $\mathbf{m^{\prime}_{x}}$ corresponds to the message for error $\mathbf{x}\in L_{j}$ , and $\mathbf{m^{\prime}_{c}}$ for $\mathbf{c}\in\mathcal{C}_{low}$ respectively.

Algorithm 1 Depth-first sequential ISD

\text{A generator matrix }\mathbf{G}(z)\text{ of a convolutional code }% \mathcal{C},\text{ received word }\mathbf{r}(z).

\mathbf{e,m}\text{ such that }\mathbf{r}(z)=\mathbf{m}(z)\mathbf{G}(z)+\mathbf% {e}(z).

j\leftarrow 0,L\leftarrow\{L_{0}\},M\leftarrow\{M_{0}\}

\mathbf{e}\leftarrow\underbrace{(0,0,0,0,0,\cdots,0,0,0)}_{s\text{ blocks of % length }N},\mathbf{m}\leftarrow\underbrace{(0,0,0,0,0,\cdots,0,0,0)}_{s\text{ % blocks of length }K}

Bool\leftarrow\text{False}

6:while

j\leq s

7: if

Bool

then

8: Break

9: while

L_{j}\neq\varnothing

10: if

\mathrm{wt}(\mathbf{e})+\mathrm{wt}(L_{j,0})>t_{e}

then

11:

L_{j}\leftarrow L_{j}-\{L_{j,0}\},M_{j}\leftarrow M_{j}-\{M_{j,0}\}

12: continue

13:

\mathbf{e}_{j}\leftarrow L_{j,0},\mathbf{m}_{j}\leftarrow M_{j,0}

14: if

j=s-1\And\mathbf{r}(z)-\mathbf{e}(z)\in\mathcal{C}

then

15:

Bool\leftarrow\text{True}

16: break

17: else

18:

L_{j}\leftarrow L_{j}-\{L_{j,0}\},M_{j}\leftarrow M_{j}-\{M_{j,0}\},\mathbf{e}% _{j}\leftarrow 0,\mathbf{m}_{j}\leftarrow 0

19:

\mathbf{r}^{\prime}_{j+1}\leftarrow\mathbf{r}_{j+1}-\sum_{i=0}^{j}\mathbf{m}_{% i}\tilde{\mathbf{G}}^{\gamma}_{j-i}

20:

L_{j+1},M_{j+1}\leftarrow\operatorname{Prange}(\tilde{\mathbf{G}}^{\gamma}_{0}% ,\mathbf{r}^{\prime}_{j+1},t+\varepsilon)

21: if

L_{j+1}\neq\varnothing

then

22: for

(\mathbf{x,c})\in L_{j+1}\times\mathcal{C}_{low}

23: if

\rm wt(\mathbf{x}+\mathbf{c})\leq t+\varepsilon

then

24:

L_{j+1}\leftarrow L_{j+1}\cup\{\mathbf{x}+\mathbf{c}\}

25:

M_{j+1}\leftarrow M_{j+1}\cup\{\mathbf{m^{\prime}_{x}}+\mathbf{m^{\prime}_{c}}\}

26:

j\leftarrow j+1

27: else

28:

L_{j}\leftarrow L_{j}-\{L_{j,0}\},M_{j}\leftarrow M_{j}-\{M_{j,0}\},\mathbf{e}% _{j}\leftarrow 0,\mathbf{m}_{j}\leftarrow 0

29:

\mathbf{e}_{j}\leftarrow 0,\mathbf{m}_{j}\leftarrow 0

30:

j\leftarrow j-1

31: if

L_{j+1}=\varnothing

then

32:

L_{j}\leftarrow L_{j}-\{L_{j,0}\},M_{j}\leftarrow M_{j}-\{M_{j,0}\}

Next we describe the experimental results we get for both of the schemes we attack. We ran our experiments mainly on SageMath 9.4 [sagemath]. Some calculations for the attack in Section 5.2 were done with Magma v2.27/7 [magma]. The code can be found under https://git.math.uzh.ch/abmazu/isd-for-convolutional-codes.

5.1 Attack on “A Variation Based on Spatially Coupled MDPC Codes”

For this paper [bolkema2017variations] we took the generator matrix in Example 5.11 for which the authors claim that it gives 80 bits security against generic decoding attacks. The authors propose a $(5,3)$ binary convolutional code. The generator matrix has a memory of $94$ . The error is of degree up to $1999$ and has a total weight of $140$ .

When we attacked the system, we took $\gamma=12$ , so ISD is done with a $36\times 60$ generator matrix. The choice of $\gamma$ was motivated by the desire of having, on average, approximately one error in each decoding window. This seems to be a good compromise as for lower $\gamma$ ’s, the algorithm tends to get lost more often, and for higher $\gamma$ ’s, the algorithm is slower. At each step, Prange was run 430 times, which guarantees a total success rate of more than $98\%$ according to the formula in Section 4.4.1. We chose $\varepsilon=3$ as discussed in Section 4.4.2. Several seeds for error generation were used and the ones for which the error was outside the range $t+\varepsilon$ were discarded. From the probability calculations we expect to discard about $22$ % of the seeds. In the experiment we discarded 7 out of 27 errors, which is about 26%. For the remaining ones we ran $20$ experiments, the results of which can be seen in Figure 4.

Figure 4: Running time of the experiments for the scheme proposed in [bolkema2017variations].

All the experiments correctly recovered the error in less than $10$ hours. This translates to a bit security of at most $47$ bits. Note that we think that our code could be optimized quite a bit and therefore this is an overestimate. Furthermore we also ran our experiment with another generator matrix with the key generation as described in [bolkema2017variations], Section 5.5, for the same parameters to verify our results. It also finished successfully.

5.2 Attack on “Smaller Keys for Code-Based Cryptography: McEliece Cryptosystems with Convolutional Encoders” 2021

We implemented the scheme proposed in [almeidaBNS21] for the parameters for $128$ bit security as given in Example 17 of that paper. We used several generator matrices and errors to estimate the security of the scheme. For this scheme we know more about the errors, namely in their example for the $128$ bit security level the authors use an error with $50$ blocks where the total error weight is $133$ and the sum of the weights of $3$ consecutive blocks is $8$ . This tells us that the blocks have weights $a,b,3,a,b,3,\ldots,a,b$ where $a+b=5$ . Therefore we take $\varepsilon=3$ , which would also be a good choice based on the probability calculations. Of course in this case this $\varepsilon$ always works.

For the experiments, we took $\gamma=1$ , so $\mathbf{G}_{0}^{\gamma}=\mathbf{G}_{0}$ . This means that the size of our block code generated by $\mathbf{G}_{0}^{\gamma}$ has size $N=n=62$ and dimension $K=k=30$ . At each step, the ISD runs for $700$ iterations, ensuring a total success rate of more than $99\%$ .

To give estimates for the running time, we “cheated” and checked at each step at which position of the output of the ISD the correct error is. Since the algorithm usually does not run further on a wrong branch than a single node, we expect that adding up the positions of the correct error in the output of the ISD gives us a relatively precise estimate of the total number of ISD iterations needed to recover the correct error. For the estimates which indicated that we could recover the error quickly we ran the actual experiments without cheating to verify our estimates. This was the case for seeds $1$ and $10$ . In both cases, the actual time required was below the estimate. The results can be seen in Figure 5.

Figure 5: Estimated bit security for [almeidaBNS21]. For seeds 1 and 10 we also gave the actual running time.

In total we ran 20 experiments, of which 16 successfully finished. In the other cases, the correct error was not found at some step. In these cases we got only partial recovery due to low weight codewords of weight $3$ and $4$ . We did not include the seeds corresponding to those experiments in the figure above.

5.2.1 Probability of Getting Lost

We will justify that our algorithm is very unlikely to proceed further than one layer on a wrong branch. Assume the algorithm is correct up to degree $j$ , and then, for a given wrong error $\tilde{\mathbf{e}}_{j+1}$ at degree $j+1$ , finds a solution $\tilde{\mathbf{e}}_{j+2}$ for degree $j+2$ as well. Let $\mathbf{e}_{j+1}$ and $\mathbf{e}_{j+2}$ be the correct solutions at the respective degrees. Define $\mathbf{c}_{j+1}:=\mathbf{e}_{j+1}-\tilde{\mathbf{e}}_{j+1}$ and $\mathbf{c}_{j+2}:=\mathbf{e}_{j+2}-\tilde{\mathbf{e}}_{j+2}$ . Then:

•

$\mathbf{c}_{j+1}$ is non-zero,
•

$\begin{pmatrix}\mathbf{c}_{j+1}&\mathbf{c}_{j+2}\end{pmatrix}\in\text{rowspan}% \begin{pmatrix}\mathbf{G}_{0}&\mathbf{G}_{1}\\ \mathbf{0}&\mathbf{G}_{0}\end{pmatrix},$
•

$\operatorname{wt}(\tilde{\mathbf{e}}_{j+1})=\operatorname{wt}(\mathbf{e}_{j+1})$ and $\operatorname{wt}(\tilde{\mathbf{e}}_{j+2})=\operatorname{wt}(\mathbf{e}_{j+2})$ .

Thus, let $\mathbf{e}\in\mathbb{F}_{q}^{N}$ be fixed of weight $t_{e}$ and $\mathbf{c}\in\mathbb{F}_{q}^{N}$ of weight $t_{c}$ . We will assume that $N\gg t_{e},t_{c}$ , which also holds in practice. We are interested in the probability of $\operatorname{wt}(\mathbf{e}+\mathbf{c})=\operatorname{wt}(\mathbf{e})$ . Let $z$ be the number of elements of $\mathbf{e}$ that become zero after adding $\mathbf{c}$ (the number of “cancellations”). Notice that to preserve weight we then need to have $z$ non-overlaps, i.e. $z$ elements of the support of $\mathbf{c}$ are outside of the support of $\mathbf{e}$ . Notice that if $\operatorname{wt}(\mathbf{e}+\mathbf{c})=\operatorname{wt}(\mathbf{e})$ , then $z\leq\lfloor\frac{t_{c}}{2}\rfloor$ . Further, it is also necessary that $z\geq t_{c}-t_{e}$ since if $t_{c}>t_{e}$ , we have at least $t_{c}-t_{e}$ entries of $\mathbf{c}$ outside of the support of $\mathbf{e}$ , so we also need to cancel at least $t_{c}-t_{e}$ positions of $\mathbf{e}$ . Now, for a fixed number of cancellations, we have a probability of

\frac{\binom{t_{e}}{z}\binom{N-t_{e}}{z}(q-1)^{z}\binom{t_{e}-z}{t_{c}-2z}(q-2% )^{t_{c}-2z}}{\binom{N}{t_{c}}(q-1)^{t_{c}}}

that the weight stays the same. Indeed, we have $\binom{t_{e}}{z}$ choices for entries of $\mathbf{c}$ that cancel entries of $\mathbf{e}$ . For entries of $\mathbf{c}$ outside of the support of $\mathbf{e}$ , we choose $z$ from $N-t_{e}$ positions, and can have $q-1$ entries in each position, which gives us $\binom{N-t_{e}}{z}(q-1)^{z}$ possibilities. Finally, we need to choose $t_{c}-2z$ positions in the remaining support of $\mathbf{e}$ and have $q-2$ choices for the entries of $\mathbf{c}$ since the entries must be non-zero and also cannot be minus the corresponding entry of $\mathbf{e}$ , which is non-zero, giving us $\binom{t_{e}-z}{t_{c}-2z}(q-2)^{t_{c}-2z}$ possibilities. Let $E_{t_{e},t_{c}}$ be the conditional event that $\operatorname{wt}(\mathbf{e}+\mathbf{c})=\operatorname{wt}(\mathbf{e})$ for an $\mathbf{e}$ and $\mathbf{c}$ such that $\operatorname{wt}(\mathbf{e})=t_{e}$ and $\operatorname{wt}(\mathbf{c})=t_{c}$ . Summing over all possible number of cancellations $z$ , we get a probability

\mathbb{P}(E_{t_{e},t_{c}})=\sum_{z=0}^{\lfloor\frac{t_{c}}{2}\rfloor}\frac{% \binom{t_{e}}{z}\binom{N-t_{e}}{z}{\binom{t_{e}-z}{t-2z}}(q-2)^{t_{c}-2z}}{% \binom{N}{t_{c}}(q-1)^{t_{c}-z}}.

We sum $z$ from $0$ , as the summands for $z<t_{c}-t_{e}$ (when $t_{c}>t_{e}$ ) will be zero anyways.

Thus, given a vector $\begin{pmatrix}\mathbf{c}_{j+1}&\mathbf{c}_{j+2}\end{pmatrix}\in\text{rowspan}% \begin{pmatrix}\mathbf{G}_{0}&\mathbf{G}_{1}\\ \mathbf{0}&\mathbf{G}_{0}\end{pmatrix},$ where $\operatorname{wt}(\mathbf{c}_{j+1})=t_{{c}_{j+1}}$ and $\operatorname{wt}(\mathbf{c}_{j+2})=t_{{c}_{j+2}}$ , and an error vector $\begin{pmatrix}\mathbf{e}_{j+1}&\mathbf{e}_{j+2}\end{pmatrix}$ , where $\operatorname{wt}(\mathbf{e}_{j+1})=t_{{e}_{j+1}}$ and $\operatorname{wt}(\mathbf{e}_{j+2})=t_{{e}_{j+2}}$ , we get that the probability of $\operatorname{wt}(\mathbf{e}_{j+1}+\mathbf{c}_{j+1})=\operatorname{wt}(\mathbf% {e}_{j+1})$ and $\operatorname{wt}(\mathbf{e}_{j+2}+\mathbf{c}_{j+2})=\operatorname{wt}(\mathbf% {e}_{j+2})$ is

\mathbb{P}(E_{t_{e_{j+1}},t_{c_{j+1}}})\cdot\mathbb{P}(E_{t_{e_{j+2}},t_{c_{j+% 2}}}).

In Table 1, we have listed $\mathbb{P}(E_{t_{e},t_{c}})$ for various values of $t_{e}$ and $t_{c}$ for the parameters in Example 17 of [almeidaBNS21].

$\begin{array}[]{c|ccccc}{\begin{array}[]{@{}r@{}}\scriptstyle\text{$t_{e}$}~{}% \smash{\eqmakebox[ind]{$\scriptstyle\rightarrow$}}\\[-3.0pt] \scriptstyle\text{$t_{c}$}~{}\smash{\eqmakebox[ind]{$\scriptstyle\downarrow$}}% \end{array}}&1&2&3&4&5\\ \hline\cr 1&$1.58\text{\times}{10}^{-2}$&$3.17\text{\times}{10}^{-2}$&$4.76% \text{\times}{10}^{-2}$&$6.35\text{\times}{10}^{-2}$&$7.94\text{\times}{10}^{-% 2}$\\ 2&$0.000\,512\,032\,770\,097\,286\,2$&$0.001\,519\,443\,588\,391\,738\,6$&$0.0% 03\,022\,232\,454\,883\,357\,7$&$0.005\,020\,399\,369\,572\,143$&$0.007\,513\,% 944\,332\,458\,095$\\ 3&0&$4.96\text{\times}{10}^{-5}$&$0.000\,171\,416\,953\,940\,528\,02$&$0.000\,% 388\,281\,101\,513\,571\,2$&$0.000\,722\,880\,491\,910\,143\,9$\end{array}$

Table 1:

\mathbb{P}(E_{t_{e},t_{c}})

for some values of

t_{e}

and

t_{c}

5.2.2 Technical Remarks

Each of our experiments was run on UZH I-Math servers Rambo or Olive. Rambo has a 1 x AMD EPYC™ 7742 2,25GHz CPU, while Olive has a 1 x AMD EPYC™ 7502P 2,5GHz CPU. Note that both are AMD EPYC™ 7002 Series CPUS, and specifications for them can be found in [AMD]. Each instance used one core, as we did not parallellise our algorithm. Note that theoretical FLOPs can be higher than the base clock of the CPU by a few bits, but we have not used that in our calculations as maximum claimed FLOPs per cycle is not always achieved. Some sources, such as [NASAEPYC] have claimed a maximum of 16 FLOPs per cycle per core, and that would add 4 bits to our calculations.

Also note that though we have not implemented our algorithm for multiple cores, it is well-suited for parallelisation, as for the $j+1-$ th round of ISD, we only need the values of $e,m$ up-to the $j-$ th block. That is to say, each round is only influenced by the nodes above it, not the nodes to the side. Thus, if we had $n_{c}$ number of cores, each core could traverse the tree with one of the initial choices for $e_{0}$ . If any core reaches the end of the tree, we could stop there, and if any core goes all the way back to the starting node, having not found the solution in that branch, it could aid any core still actively traversing the tree. This could hypothetically cut down the running time by a factor of at-most $n_{c}$ . We expect parallelisation to be more effective on the attack on [bolkema2017variations], rather than on [almeidaBNS21], as in the latter, our algorithm does not seem to get lost as discussed in Section 5.2.

6 Acknowledgements

The authors would like to thank Daniela Portillo del Valle and Joachim Rosenthal for fruitful discussions. The authors would also like to thank Diego Napp and Miguel Beltrá Vidal for answering our questions about their system. Finally, the authors would like to thank Violetta Weger and Anna-Lena Horlemann for their encouragement and Carsten Rose and the IT-team at I-Math UZH for their practical help and infinite patience while we tormented the math servers with our experiments.

This work has been supported in part by armasuisse Science and Technology (Project number.: CYD-C-2020010), by the Swiss National Science Foundation under SNSF grant number 212865, and by the German research foundation, project number 513811367.

\printbibliography

Information-Set Decoding for Convolutional Codes

Abstract

1 Introduction

1.1 Notation and Conventions

2 Basics of Linear Codes and Convolutional Codes

Definition 1.

Definition 2.

Lemma 3.

Definition 4.

Definition 5.

Definition 6.

Remark 7.

Definition 8.

Definition 9.

Definition 10.

Theorem 11.

Definition 12.

Theorem 13.

Corollary 14.

Definition 15.

Definition 16.

Definition 17.

3 Basics of Information-Set Decoding

Definition 18.

4 ISD for Delay-Free Convolutional Codes

4.1 Generic Decoding of Convolutional Codes

4.2 Using ISD for Convolutional Codes

4.2.1 The Choice of a Depth-First Algorithm

4.2.2 Low Weight Codewords

4.2.3 Undetectable Errors

Proposition 19.

Proof.

4.2.4 The Choice of γ𝛾\gammaitalic_γ

4.3 ISD for a Fixed Amount of Errors Uniformly Distributed Over The Error Vector

4.3.1 Exceedence Probability for the Error Distributions

4.3.2 Estimates, Hypergeometric Distribution

4.4 Estimates on The Number of Solutions When the Amount of Errors Exceeds the Unique Decoding Radius

4.4.1 Choosing Maxiter and Increased ISD Cost for Allowing t+ε𝑡𝜀t+\varepsilonitalic_t + italic_ε Errors.

4.4.2 Choice of ε𝜀\varepsilonitalic_ε

5 Experiments

5.1 Attack on “A Variation Based on Spatially Coupled MDPC Codes”

5.2 Attack on “Smaller Keys for Code-Based Cryptography: McEliece Cryptosystems with Convolutional Encoders” 2021

5.2.1 Probability of Getting Lost

5.2.2 Technical Remarks

6 Acknowledgements

4.2.4 The Choice of $\gamma$

4.4.1 Choosing Maxiter and Increased ISD Cost for Allowing $t+\varepsilon$ Errors.

4.4.2 Choice of $\varepsilon$