Niklas Gassner, Julia Lieb, Abhinaba Mazumder, and Michael Schaller
Abstract
In this paper, we present a framework for generic decoding of convolutional codes, which allows us to do cryptanalysis of code-based systems that use convolutional codes. We then apply this framework to information set decoding, study success probabilities and give tools to choose variables. Finally, we use this to attack two cryptosystems based on convolutional codes. In the case of [bolkema2017variations], our code recovered about 74% of errors in less than 10 hours each, and in the case of [almeidaBNS21], we give experimental evidence that 80% of the errors can be recovered in times corresponding to about 60 bits of operational security, with some instances being significantly lower.
1 Introduction
Current cryptographic systems rely on the hardness of integer factorisation or the discrete logarithm problem. These problems can be solved efficiently with Shor’s algorithm on a quantum computer [shor].
This led to a new interest in post-quantum cryptography, a part of which is cryptography based on linear codes. The first code-based cryptosystem was proposed in 1978 by McEliece [mceliece] and uses binary Goppa-codes. The proposal is still fundamentally intact and the proposal Classic McEliece [NISTMcEliece] is based on it.
Round 4 of the currently ongoing post-quantum competition of the National Institute of Standards and Technology (NIST) features three code-based public-key submissions, the already mentioned Classic McEliece, BIKE [NISTBike], and HQC [NISTHQC].
There have been proposals not featured in the NIST post-quantum competition based on convolutional codes, such as [londahl2012new] (attacked in [londahlattack]), [almeidaBNS23], or [moufek2018new]. The rationale behind this is that the key size is linear in the memory of the convolutional code, while security levels are reliant on the size of the sliding generator or parity-check matrix, leading to an expected exponential increase in time for generic decoding methods and thus, increased security.
For example, in the Viterbi algorithm [viterbi] the decoding complexity is exponential in the degree.
Additionally for the receiver of the ciphertext, decoding a convolutional code can be done sequentially.
In this paper, we provide a framework for generic decoding of convolutional codes (which are not tail-biting) which is similar to sliding window decoding and apply it to information-set decoding.
The framework resembles the idea of sequential decoding, which was introduced in [wozencraft].
We study success probabilities of several aspects of the algorithm and give tools to choose parameters in the framework. Finally, we use it to attack the system proposed in [bolkema2017variations] and a set of parameters of the system proposed in [almeidaBNS21]. Note that there exists an updated version [almeidaBNS23] with tail-biting convolutional codes, where our attack does not apply.
The paper is organised as follows.
In the second chapter, we give an overview of linear codes and especially convolutional codes, where we introduce all the notions and notations necessary for understanding our work.
In the third chapter, we cover the basics of information-set decoding, a class of generic decoding methods for linear codes, where we mainly talk about Prange’s algorithm.
The remaining chapters cover our own contribution.
In Chapter 4, we discuss information-set decoding for convolutional codes. We first give a general framework for generic decoding of convolutional codes that reduces the problem to decoding a smaller block code multiple times and then discuss how we apply it to information-set decoding.
We justify our choice of a depth-first algorithm, then discuss the issues posed by low weight codewords and how we adapt the algorithm to address these issues. Many of the components of our algorithm are probability based, so we provide tools that allow us to compute success probability for given parameters and thus help us to choose parameters.
Finally, in Section 5 we discuss our implementation of the attack and the results of attacking the cryptosystem proposed in [bolkema2017variations] and a parameter set of the cryptosystem proposed in [almeidaBNS21].
Since computation times were in most cases infeasible for the latter, we only provide an estimate of the computation time necessary for recovering errors for most random seeds.
We also ran the algorithm in full for two cases, where the estimates of the computation time was low enough, and managed to recover the correct error in both cases.
1.1 Notation and Conventions
For the convenience of the reader, we summarize some notation that we will use throughout the paper.
–
the finite field with elements,
–
a convolutional code, where is its length and its rank as an -submodule of ,
–
a block code of length and dimension ,
–
matrices and vectors are denoted with bold capital respectively small letters (e.g. and ),
–
and for a generator matrix respectively parity-check matrix of a convolutional code ,
–
an information set.
2 Basics of Linear Codes and Convolutional Codes
In this section we present definitions and results for linear block and convolutional codes that will be important in later sections of this paper.
Definition 1.
Let with . A linear -block code is a -dimensional subspace of .
Hence, there is a full rank matrix such that
is called generator matrix, length and rate of .
While the generator matrix is used for the encoding of a message, for the decoding of a received word, one usually uses another matrix, called parity-check matrix, as defined in the following.
Definition 2.
Let be a linear -block code.
A full rank matrix such that
is called parity-check matrix of .
Lemma 3.
Let be a generator matrix of a linear block code . A matrix is a parity-check matrix for if and only if and is full rank.
Definition 4.
The support of is defined as
The (Hamming) weight of is defined as , i.e. as the number of nonzero components of the vector .
Definition 5.
The minimum distance of a linear block code is defined as
In the following, we introduce the basics of convolutional codes, which can be understood as
a generalization of linear block codes.
More details about convolutional codes can e.g. be found in [bookchapter].
Definition 6.
A convolutional code of length is defined as an -submodule of . Let be the rank of . Then, is the rate of and we call an convolutional code. There exists a polynomial matrix such that
is called generator matrix of . If we write with , then is called memory of .
The maximal degree of the full size (i.e. ) minors of is called is called degree of .
Note that a generator matrix is not unique and the memory depends on , however the degree of does not depend on the choice of the generator matrix.
Remark 7.
Two full rank matrices are generator matrices of the same code if and only if there exists a unimodular matrix such that
and are then called equivalent.
In the next part of this section, we want to consider parity-check matrices for convolutional codes. In contrast to linear block codes, not each convolutional code admits a parity-check matrix.
Definition 8.
Let be an convolutional code. A full row rank polynomial matrix is called parity-check matrix of , if
If such a parity-check matrix exists for the code , then is called non-catastrophic.
In the following, we will describe how to check whether a convolutional code possesses a parity-check matrix and how to calculate it.
Definition 9.
For , is called left prime if in all factorisations , with and , the left factor is unimodular.
Definition 10.
Let with be full (row) rank. Then, there exists a unimodular matrix such that
(1)
where are monic for and for each . The matrix is called column Hermite form of .
Theorem 11.
[bookchapter]
Consider with . The following statements are equivalent:
1.
is left prime.
2.
The column Hermite form of is .
3.
There exists such that .
4.
can be completed to a unimodular matrix, i.e. there exists such that is unimodular.
5.
for all , where denotes the algebraic closure of the field .
Definition 12.
A polynomial matrix is said to be delay-free if is full row rank.
A convolutional code is called delay-free if its generator matrices are delay-free.
It is easy to see that if one generator matrix of a convolutional code is delay-free, then all its generator matrices are delay-free and hence it makes sense to speak of delay-free convolutional codes. Moreover, note that by Theorem 11 all non-catastrophic convolutional codes are delay-free.
Theorem 13.
[bookchapter]
Let be an convolutional code. Then, admits a parity-check matrix if and only if any of the generator matrices of is left prime.
Corollary 14.
Let be a left-prime generator matrix of a convolutional code . A matrix is a parity-check matrix for if and only if and is full rank.
Next we describe, how a parity-check matrix for a convolutional code can be calculated.
First the column Hermite form of is computed. According to Theorem 11 there exists a parity-check matrix if and only if . If this is true, calculate such that . Then, the last rows of form a parity-check matrix for the convolutional code with generator matrix .
If is not left prime, the corresponding convolutional code does not posses a parity-check matrix. However, it is still possible to find a left prime polynomial matrix that plays the role of a parity check matrix for in the sense that
Since is not left prime, one has
where
is the column Hermite form of and unimodular and is not the identity matrix. It follows that , where consists of the first rows of , i.e. is left prime. Hence, the convolutional code with generator matrix is non-catastrophic and has a parity-check matrix . Finally, one has
In the following we introduce distance measures for convolutional codes.
Definition 15.
The weight of a polynomial vector is defined as
Definition 16.
Let be an convolutional code. The free distance of is defined as
The free distance is a measure for the total number of errors a convolutional code can correct. However, we will carry out the decoding of a convolutional code by splitting the received word and codeword into parts, so-called windows, and decode one window after the other. To this end, we need to introduce so-called sliding matrices.
Let
be a generator matrix and
be a parity-check matrix of an
convolutional code, where if and if . For , consider the silding generator-matrix
and the sliding parity-check matrix
Definition 17.
Let be an
convolutional code.
For , the -th column distance of is defined as
If is delay-free with generator matrix , it follows
If is non-catastrophic with parity-check matrix
one obtains
One has that
reflecting the fact that the larger we choose the decoding window, the more errors can be corrected inside this window.
We now write the equation with as
(6)
Assume we decoded the messages up to a time instant , then we can use the following equation:
(13)
and rewrite it as
Denote the received word after transmission by .
Then,
can be recovered by decoding
in the block code with generator matrix .
This process can be iterated by sliding the decoding window by time steps to and in this way the whole message can be recovered with several decoding steps in the block code with generator matrix .
3 Basics of Information-Set Decoding
Information-set decoding (ISD) is a generic decoding technique for linear codes. In this section we introduce the simplest form of ISD, the Prange algorithm [Prange62]. Several improvements exist, such as [LeeB88], [Stern88], [dumer1991minimum], [Peters10], [BernsteinLP11], [MayMT11], and [BeckerJMM12].
The general idea is as follows.
Given an erroneous codeword , we wish to find a set such that and is uniquely determined by the positions indexed by .
If we have done so, we can easily recover the codeword with linear algebra, hence also the error.
In order to find such a set we try random information sets, do the procedure described above and check if the weight of the recovered error is within specified bounds.
We continue until we have found an error that is small enough.
Definition 18.
Let be an linear block code and be a generator matrix of .
For any subset we define to be the submatrix of with columns indexed by .
Then any subset of size such that is invertible is called an information set.
More formally, we do the following in the Prange algorithm.
Let be an upper bound for the number of errors, be a generator matrix of the code, the received word, the submatrix of with columns indexed by and the vector consisting of the entries of indexed by .
The algorithm is as follows.
(1)
Pick a random information set .
(2)
Solve the equation for .
(3)
Set .
(4)
If , output , else go back to step .
If is within the unique decoding radius, the algorithm succeeds if an information set is found such that has no intersection with the support of the error .
Assuming that every subset of size is an information set (which is only true if the code is MDS, i.e., the minimum distance is ) and that there is a unique solution, we get for each time we do steps and , a probability of that .
The expected number of iterations, which we call the workfactor, is the reciprocal of this, denoted by
4 ISD for Delay-Free Convolutional Codes
In this section, we study ISD for delay-free convolutional codes. For this, we rewrite a sliding generator matrix of as
(14)
where has size .
Note that when , we have , but can consist of several of the ’s, for example, for , we have
and
We will also write the error vector as , where is of size .
4.1 Generic Decoding of Convolutional Codes
Given the encrypted message , we aim to find and consequently .
As before, we turn a polynomial vector into . Let be the concatenation of several consecutive ’s. We get
The idea is to iteratively decode the convolutional code with the matrix as follows:
1.
Step 0: Use generic decoding to recover from and recover from with linear algebra.
2.
Step j:
(a)
Compute .
(b)
Recover and with generic decoding for the code generated by .
For cryptographic purposes there exists only one error with at most a certain Hamming weight such that can be written as codeword plus error.
However, we might not find a unique solution in each block.
So at each step, we produce a list of possible error vectors where for are the errors that we recovered using ISD in Step .
We also get from the list of possible errors a list of possible messages .
This gives us a tree, and we aim to find a branch that goes to the bottom of the tree. To find such a branch, we use a depth-first algorithm. We will first describe this procedure and then justify the choice of a depth-first algorithm over a breadth-first algorithm.
As a generic decoding method, we use information set decoding, but in principle, any generic decoding method can be used.
Note further that one can also use overlapping parts of the received word for decoding and use a consistency check on the recovered messages to limit the number of possibilities in the search.
For our experiments that did not give huge speedups, but it might be useful if one uses different codes.
4.2 Using ISD for Convolutional Codes
We want to use information set decoding (ISD) as a generic method of decoding. Let be the expected weight of an error block , rounded up to the next integer. While we expect errors in each block, the chance of each block having at most errors can be very small. So we introduce a variable and allow up to errors in each block. Note that can be chosen so that an arbitrarily high percentage of the errors satisfy this condition, but increasing increases the amount of solutions we receive at each step and thus computational time. We will discuss how to choose in Section 4.3.1.
Note that the first solution we find might not be the correct one, or, if the algorithm is currently running on a “wrong” branch, no solution may be found. So aborting the ISD search after an error is found is not an option. Our solution to this issue is to simply run the algorithm for a fixed amount of iterations and collect all solutions. The number of such iterations can be chosen such that it is almost guaranteed that the correct error will be in the list of outputs at every step as we will explain later. Thus, if no solutions are found, we expect to be on a wrong branch.
4.2.1 The Choice of a Depth-First Algorithm
Depth-first algorithms, in general, have lower memory usage than breadth-first algorithms. However, our main reason for choosing a depth-first algorithm is that for our purposes, we expect a depth-first algorithm to run faster than a breadth-first algorithm. The reasons for this we shall explain in this subsection.
Depth-first search works as follows.
We start with a tree and always move down the left-most possible branch.
Once we cannot move down any further we backtrack and then move down the left-most branch we have not explored yet.
Note that we do not have to store what we have explored already by storing in an appropriate way which branches we have to explore.
In Figure 1, we labeled the nodes in the tree according to the sequence in which we encounter them in a depth first search.
To be precise we go from to , , .
Then we backtrack to , from which we go to .
After this we backtrack all the way to from where we go to and traverse the remaining graph.
We only worked with the most basic instance of ISD, the Prange algorithm. Note that, after a choice of an information set , Prange succeeds in recovering an error if the support of the error is contained in the complement of .
This is more likely to happen for errors of smaller weight, so errors of smaller weight have a higher probability of getting found first by the algorithm than errors of higher weight.
We allow up to errors in each window. However, in most cases, the correct error vector will have a smaller weight. This means that the correct error has a tendency to appear at one of the first positions of the output lists of the ISD search at each step. So in most steps, the depth-first algorithm will immediately proceed with the correct error.
Furthermore we heuristically expect a kind of avalanche effect or diffusion, meaning that if we have several wrong messages the error will spread and there will be no close codewords to the received word.
Hence the algorithm should not go too deep in the depth first search for wrong errors and messages.
Our observations from the experiments are in line with these expectations.
4.2.2 Low Weight Codewords
Let and , with full rank, be a generator matrix and a parity-check matrix, respectively, of the same non-catastrophic convolutional code . Note that each non-catastrophic convolutional code possesses a left-prime parity-check matrix , and hence, in particular a parity-check matrix such that is full rank. Then, for , it is an easy consequence of Lemma 3 that
is a parity-check matrix for the block code with generator matrix .
Note that the block code with generator matrix (like each block code) always possesses a parity-check even if the corresponding convolutional code is catastrophic.
Moreover, it is important to note that, due to the restriction , respectively , in the definition of column distances, the distance of the block code with generator matrix is upper bounded by but can be much smaller.
While a convolutional code with generator matrix may have a large minimum distance, the linear code generated by the matrix
can have codewords of small weight.
Note that the code generated by contains codewords of the form where lies in .
Note further that if is delay-free (that is, has full rank), every non-zero codeword of is of the form
for a non-zero . So the minimum distance of the code generated by equals the one of the code generated by .
Actually, if one has a message of the form where is minimal such that , then the weight of is lower bounded by and , .
4.2.3 Undetectable Errors
Since we are working with Prange’s algorithm, errors can’t be detected if the support of a codeword is contained in the support of the error. This is formalized in the following proposition, which states that an information set of a a linear code over must intersect the support of each codeword.
Proposition 19.
Let be an information set for a -code and . Then .
Proof.
Let be in and a set of size that has trivial intersection with the support of .
Pick a basis of that contains and arrange them in a generator matrix such that the last row is . The last row of the matrix , consisting of the columns indexed by , is a zero row, so is not invertible. This means that is not an information set.
∎
Note that in these cases, the error can be written as , where is a codeword whose support is contained in the support of and is an error which can be recovered by the Prange algorithm. So we can ensure that is in our list of outputs by pre-computing a list of low-weight codewords (for our experiments codewords of weight up to 2 were sufficient for most cases) and then, for all possible errors in the output of the Prange algorithm and low-weight codewords , appending at the end of the list of outputs if . It is important that the new solutions get appended at the end of the list when working in a setting where it is unlikely to happen that the support of an error contains the support of a codeword.
4.2.4 The Choice of
Recall that is the number of distinct ’s in our matrix which we use for iterative decoding (see beginning of Section 4). Note that the choice of can heavily impact the decoding complexity: choosing a large will increase the ISD cost in each step, while choosing a small increases the number of steps. This could potentially substantially increase the nodes the algorithm will transverse in the depth-first search. Thus, needs to be carefully chosen depending on the parameters of the cryptosystem. In fact, if we choose to be the maximum of the memory of the generator matrix and the degree of the error vector , then we are in the setting of decoding a single block code. On the other hand, a small might also not be suitable for decoding. Consider the case
where are of large degree. If we choose , that is, consists of the constant terms of and , then only a small percentage of errors will have at most error in each block. But if we decode with at most errors per block, then we get solutions each step, at which point we are essentially just bruteforcing all solutions.
We have not found a rigorous way of choosing optimally and have resorted to experimentation.
4.3 ISD for a Fixed Amount of Errors Uniformly Distributed Over The Error Vector
Our goal in this section is to reduce the information-set decoding of the convolutional code to the ISD of equations of the form
or equivalently
The analysis consists of several steps. We first want to compute the probability that blocks of the error contain at most non-zero entries for a given .
We will assume the following: is a delay-free convolutional code, the error vector of length has a total weight of , with the errors uniformly distributed over its length.
4.3.1 Exceedence Probability for the Error Distributions
Let be the total error weight, the length of the code generated by , the number of blocks of size we need for decoding, , the (rounded) expected number of errors in each size window, and the tolerance for the error weights.
To give the exact probability of an error having at most weight in each window of size , we consider the polynomial
We compute
and recover the ’th coefficient of which gives us the number of error vectors of weight with at most errors in each block of size . Dividing by the total number of weight error vectors, which is
yields the probability of having at most errors in all blocks.
One can reformulate this in terms of a multivariate hypergeometric distribution.
More information about multivariate hypergeometric distributions can be found in [johnson1997discrete].
Note that in each block we get a hypergeometric distribution which we use for an estimate of the tails in the next section.
4.3.2 Estimates, Hypergeometric Distribution
The hypergeometric distribution has an exponentially decaying tail; see [chvatal1979tail].
Let us consider the tail of a hypergeometric distribution given by
Define through the equation
where should be greater than .
One can show as in [chvatal1979tail] that we get for the tail
In the setting above we have .
Hence
and
Thus, if , then we get that the probability of having at least errors is at most
For the weight of the block to exceed , we set , getting Then the probability of the block having weight more than is at most
By the Union bound, we have:
where is the -th error coefficient.
4.4 Estimates on The Number of Solutions When the Amount of Errors Exceeds the Unique Decoding Radius
Given a (random) full-rank matrix , an error vector of weight and we want to compute the expected amount of solutions of the equation
(15)
of weight at most . We care about the conditional expectation
(16)
Note that every solution to Equation (15) lives in
which has elements.
Notice that if is “random”, then we expect this set to be uniformly distributed in . So we expect that the number of vectors of weight in to be
4.4.1 Choosing Maxiter and Increased ISD Cost for Allowing Errors.
The work factor of the Prange algorithm for an code with an error vector of weight is given by
Similarly, if we allow errors with the work factor changes to
The quotient of the two work factors simplifies to
Note further that while is the expected number of iterations to find a solution of weight , it is unlikely that we will find the correct error at each step with this number of iterations. To ensure that the correct error is in the list of outputs of our ISD algorithm, we run it for a fixed amount of iterations, chosen to ensure this.
To make this choice, note that the chance of finding a fixed detectable error of weight in one iteration of Prange’s algorithm is given by
So the probability of finding a given error of weight at all steps with ISD iterations at each step is given by
Note that this is a lower bound for the success probability of recovering an error vector with for all since the weight of an can be smaller than , which increases the probability of it getting recovered.
4.4.2 Choice of
In this subsection we summarize the impact of the choice of .
Increasing makes it more likely that errors satisfy the weight condition on each block and thus allows to recover more errors and messages. However, it also increases the work factor for ISD and the number of solutions found in each step, which can heavily increase the computational time.
All of these factors need to be carefully considered when choosing .
For example, for the attack on [bolkema2017variations] that we will discuss in Section 5.1, we have the parameters , , , and This gives us , and we get the plot in Figure 2 which helps us to choose the value of .
We chose , as about of errors should have block-weight at most , as can be seen in Figure 2, and the work factor is still comparatively small as discussed in Section 4.4.1, and as can be seen in Figure 3. We use this value of for our experiments, as described in Section 5.1.
5 Experiments
In this section we describe how to attack two cryptosystems and what modifications of the procedures described above are necessary in order to attack them in practice.
The systems we attack are from [almeidaBNS21] (note that this is not the authors current version) and [bolkema2017variations], Chapter 5.
The first challenge is that the codes constructed in both papers are not delay-free.
This does not pose a major challenge though since we can construct non-catastrophic convolutional codes that contain these codes, see Section 2.
Then we use ISD with a fixed amount of iterations with respect to these non-catastrophic codes.
A second problem that we encounter with both systems is the existence of low weight codewords.
To account for this we find codewords of weight or with a brute-force search.
Then we add these to all the solutions ISD found and keep only the results that have the weight within specified bounds.
Note that ISD might find several solutions since we are outside of the unique decoding radius.
With these modifications we use a depth first search after each round of ISD as described below.
are outputs of first stage of ISD, i.e., .
Furthermore is the total error weight and are the matrices we construct from as above.
Moreover are the low weight codewords and corresponding messages, and corresponds to the message for error , and for respectively.
Next we describe the experimental results we get for both of the schemes we attack.
We ran our experiments mainly on SageMath 9.4 [sagemath]. Some calculations for the attack in Section 5.2 were done with Magma v2.27/7 [magma].
The code can be found under https://git.math.uzh.ch/abmazu/isd-for-convolutional-codes.
5.1 Attack on “A Variation Based on Spatially Coupled MDPC Codes”
For this paper [bolkema2017variations] we took the generator matrix in Example 5.11 for which the authors claim that it gives 80 bits security against generic decoding attacks. The authors propose a binary convolutional code. The generator matrix has a memory of . The error is of degree up to and has a total weight of .
When we attacked the system, we took , so ISD is done with a generator matrix. The choice of was motivated by the desire of having, on average, approximately one error in each decoding window. This seems to be a good compromise as for lower ’s, the algorithm tends to get lost more often, and for higher ’s, the algorithm is slower. At each step, Prange was run 430 times, which guarantees a total success rate of more than according to the formula in Section 4.4.1. We chose as discussed in Section 4.4.2.
Several seeds for error generation were used and the ones for which the error was outside the range were discarded.
From the probability calculations we expect to discard about % of the seeds. In the experiment we discarded 7 out of 27 errors, which is about 26%.
For the remaining ones we ran experiments, the results of which can be seen in Figure 4.
All the experiments correctly recovered the error in less than hours.
This translates to a bit security of at most bits.
Note that we think that our code could be optimized quite a bit and therefore this is an overestimate.
Furthermore we also ran our experiment with another generator matrix with the key generation as described in [bolkema2017variations], Section 5.5, for the same parameters to verify our results.
It also finished successfully.
5.2 Attack on “Smaller Keys for Code-Based Cryptography: McEliece Cryptosystems with Convolutional Encoders” 2021
We implemented the scheme proposed in [almeidaBNS21] for the parameters for bit security as given in Example 17 of that paper.
We used several generator matrices and errors to estimate the security of the scheme.
For this scheme we know more about the errors, namely in their example for the bit security level the authors use an error with blocks where the total error weight is and the sum of the weights of consecutive blocks is .
This tells us that the blocks have weights where .
Therefore we take , which would also be a good choice based on the probability calculations. Of course in this case this always works.
For the experiments, we took , so . This means that the size of our block code generated by has size and dimension . At each step, the ISD runs for iterations, ensuring a total success rate of more than .
To give estimates for the running time, we “cheated” and checked at each step at which position of the output of the ISD the correct error is. Since the algorithm usually does not run further on a wrong branch than a single node, we expect that adding up the positions of the correct error in the output of the ISD gives us a relatively precise estimate of the total number of ISD iterations needed to recover the correct error.
For the estimates which indicated that we could recover the error quickly we ran the actual experiments without cheating to verify our estimates.
This was the case for seeds and .
In both cases, the actual time required was below the estimate.
The results can be seen in Figure 5.
In total we ran 20 experiments, of which 16 successfully finished. In the other cases, the correct error was not found at some step.
In these cases we got only partial recovery due to low weight codewords of weight and .
We did not include the seeds corresponding to those experiments in the figure above.
5.2.1 Probability of Getting Lost
We will justify that our algorithm is very unlikely to proceed further than one layer on a wrong branch. Assume the algorithm is correct up to degree , and then, for a given wrong error at degree , finds a solution for degree as well. Let and be the correct solutions at the respective degrees. Define and . Then:
•
is non-zero,
•
•
and .
Thus, let be fixed of weight and of weight . We will assume that , which also holds in practice. We are interested in the probability of . Let be the number of elements of that become zero after adding (the number of “cancellations”). Notice that to preserve weight we then need to have non-overlaps, i.e. elements of the support of are outside of the support of . Notice that if , then . Further, it is also necessary that since if , we have at least entries of outside of the support of , so we also need to cancel at least positions of .
Now, for a fixed number of cancellations, we have a probability of
that the weight stays the same. Indeed, we have choices for entries of that cancel entries of .
For entries of outside of the support of , we choose from positions, and can have entries in each position, which gives us possibilities.
Finally, we need to choose positions in the remaining support of and have choices for the entries of since the entries must be non-zero and also cannot be minus the corresponding entry of , which is non-zero, giving us possibilities. Let be the conditional event that for an and such that and . Summing over all possible number of cancellations , we get a probability
We sum from , as the summands for (when ) will be zero anyways.
Thus, given a vector
where and , and an error vector , where and , we get that the probability of and is
In Table 1, we have listed for various values of and for the parameters in Example 17 of [almeidaBNS21].
5.2.2 Technical Remarks
Each of our experiments was run on UZH I-Math servers Rambo or Olive.
Rambo has a 1 x AMD EPYC™ 7742 2,25GHz CPU, while Olive has a 1 x AMD EPYC™ 7502P 2,5GHz CPU. Note that both are AMD EPYC™ 7002 Series CPUS, and specifications for them can be found in [AMD]. Each instance used one core, as we did not parallellise our algorithm.
Note that theoretical FLOPs can be higher than the base clock of the CPU by a few bits, but we have not used that in our calculations as maximum claimed FLOPs per cycle is not always achieved. Some sources, such as [NASAEPYC] have claimed a maximum of 16 FLOPs per cycle per core, and that would add 4 bits to our calculations.
Also note that though we have not implemented our algorithm for multiple cores, it is well-suited for parallelisation, as for the th round of ISD, we only need the values of up-to the th block. That is to say, each round is only influenced by the nodes above it, not the nodes to the side. Thus, if we had number of cores, each core could traverse the tree with one of the initial choices for . If any core reaches the end of the tree, we could stop there, and if any core goes all the way back to the starting node, having not found the solution in that branch, it could aid any core still actively traversing the tree. This could hypothetically cut down the running time by a factor of at-most . We expect parallelisation to be more effective on the attack on [bolkema2017variations], rather than on [almeidaBNS21], as in the latter, our algorithm does not seem to get lost as discussed in Section 5.2.
6 Acknowledgements
The authors would like to thank Daniela Portillo del Valle and Joachim Rosenthal for fruitful discussions. The authors would also like to thank Diego Napp and Miguel Beltrá Vidal for answering our questions about their system. Finally, the authors would like to thank Violetta Weger and Anna-Lena Horlemann for their encouragement and Carsten Rose and the IT-team at I-Math UZH for their practical help and infinite patience while we tormented the math servers with our experiments.
This work has been supported in part by armasuisse Science and Technology (Project number.: CYD-C-2020010), by the Swiss National Science Foundation under SNSF grant number 212865, and by the German research foundation, project number 513811367.