KR100486697B1

KR100486697B1 - Modular arithmetic apparatus and method thereof

Info

Publication number: KR100486697B1
Application number: KR1019980019040A
Authority: KR
Inventors: 이경희; 차영태
Original assignee: 삼성전자주식회사
Priority date: 1998-05-26
Filing date: 1998-05-26
Publication date: 2005-06-16
Also published as: KR19990086179A

Abstract

공개키 암호화/복호화 및 디지털 서명 시스템에 사용되는 모듈러 연산장치에 관한 것으로서, 본 발명에 의한 모듈러 연산 을 계산하는 장치는 k비트의 저장용량을 지니고, 각각 A, B, N을 저장하는 A메모리수단, B메모리수단, N메모리수단; 두 개의 w비트의 값을 병렬로 입력받아 곱한 2w비트의 결과를 병렬로 출력하는 곱셈기; 두 개의 2w비트의 값을 병렬로 입력받아 더한 결과를 병렬로 출력하는 덧셈기; k+2w비트의 저장용량을 지니고, 한 클럭 내에 그 중 2w비트를 결정하여 덧셈기로 출력하고, 덧셈기로부터 입력된 값을 출력한 비트의 위치에 저장하는 누산기; w비트의 저장용량을 지니고, 미리 계산된 N₀ ^-1 mod 2^w 저장하는 J메모리수단; w비트의 저장용량을 지니고, 곱셈기의 출력으로부터 상위 w비트를 저장하는 q메모리수단; J메모리수단, B메모리수단 및 N메모리수단 중에서 하나를 선택하여 곱셈기에 접속하는 제1선택수단; A메모리수단, q메모리수단 및 누산기 중에서 하나를 선택하여 곱셈기에 접속하는 제2선택수단; 및 q메모리수단 및 덧셈기 중에서 하나를 선택하여 곱셈기 접속하는 제3선택수단을 포함함을 특징으로 한다.The present invention relates to a modular computing device for use in public key encryption / decryption and digital signature systems. The apparatus for calculating a has a storage capacity of k bits, A memory means for storing A, B, N, respectively, B memory means, N memory means; A multiplier that receives two values of w bits in parallel and outputs the result of 2w bits multiplied in parallel; An adder for receiving two 2w bit values in parallel and outputting the sum result in parallel; an accumulator having a storage capacity of k + 2w bits, determining 2w bits among the clocks and outputting them as an adder, and storing the value input from the adder at the position of the output bits; J memory means having a storage capacity of w bits and storing N ₀ ^-1 mod 2 ^w precomputed; q memory means having a storage capacity of w bits and storing high order w bits from the output of the multiplier; First selecting means for selecting one of a J memory means, a B memory means, and an N memory means to connect to a multiplier; Second selecting means for selecting one of an A memory means, a q memory means, and an accumulator and connecting the multiplier; And third selecting means for selecting one of the q memory means and the adder to connect the multiplier.

본 발명에 의하면, 디지털 서명기기, 공개키 암호화/복호화기기에서 필요로 하는 모듈러 곱셈 또는 모듈러 멱승을 계산하기 위한 회로인 A·B·2^-k mod N을 계산하는 회로를 간단하게 구현함으로써 디지털 서명기기 및 공개키 암호화/복호화기기 제작에 요구되는 비용을 줄일 수 있다.According to the present invention, a digital signature is implemented by simply implementing a circuit for calculating A · B · 2 ^−k mod N, which is a circuit for calculating the modular multiplication or modular power required by a digital signature device and a public key encryption / decryption device. The cost required to manufacture devices and public key encryption / decryption devices can be reduced.

Description

Modular arithmetic apparatus and method

본 발명은 모듈러 연산장치에 관한 것으로서, 특히 공개키 암호화/복호화 및 디지털 서명 시스템에 사용되는 모듈러 연산장치 및 그 방법에 관한 것이다.TECHNICAL FIELD The present invention relates to a modular computing device, and more particularly, to a modular computing device and a method for use in a public key encryption / decryption and digital signature system.

1976년 디피(Diffie)와 헬만(Hellman)은 수학적으로 매우 풀기 어려운 문제의 일방성을 이용한 공개키 암호시스템(Public Key Cryptosystem)의 개념을 처음으로 소개하여 현대 암호학의 새로운 전기를 마련하였다. 기존의 관용(대칭형) 암호시스템(conventional or symmetric cryptosystem)에서는 통신하고자 하는 두 사용자가 동일한 비밀키를 공유하여야 하므로 키관리가 어렵고, 디지털 서명(digital signature)을 구현하기 어렵다는 등의 단점이 있었다. 그런데, 공개키 암호시스템에서는 수학적으로 풀기 어려운 문제의 일방성을 이용하여 공개키와 비밀키를 계산하고, 공개키는 누구나 이용할 수 있게 공개하며 비밀키만 각 사용자가 보관하게 된다. 따라서, 공개된 상대방의 공개키를 가진 사용자는 누구나 그 상대방과 비밀통신을 할 수 있게 된다.In 1976, Diffie and Hellman introduced the concept of the Public Key Cryptosystem for the first time using a one-sided problem of mathematically very difficult problems to prepare a new biography of modern cryptography. Conventional or symmetric cryptosystems have disadvantages such as difficult to manage keys and difficult to implement digital signatures because two users who want to communicate must share the same secret key. However, in the public key cryptosystem, the public key and the private key are calculated using one-sided problems that are difficult to solve mathematically, the public key is open to everyone, and only the private key is kept by each user. Therefore, anyone who has the public key of the other party who has been disclosed can perform secret communication with the other party.

공개키 암호시스템에서 가장 널리 이용되는 어려운 문제로는 이산대수 문제(Discrete Logarithm Problem)와 소인수분해 문제가 있다. 대표적인 공개키 암호시스템으로는 이산대수 문제에 근거한 엘가말(ElGamal)형의 암호시스템과 소인수분해 문제에 근거한 알.에스.에이(RSA:Revest Shamir Adleman) 암호시스템이 있다. 표준으로 채택된 것도 국제표준으로는 ISO(the International Organization for Standardization:국제표준기구)/IEC(the International Electrotechnical Commission:국제전자기술위원회) 9796, 엘가말형의 변형인 미국의 DSA, 러시아의 GOST 등이 있으며, 한국에서는 KC-DSA가 있다.The most widely used difficult problems in public key cryptosystems are the discrete logarithm problem and the prime factorization problem. Representative public key cryptosystems include ElGamal cryptosystems based on discrete algebra and Revest Shamir Adleman (RSA) cryptosystems based on prime factorization. International standards such as the International Organization for Standardization (ISO) / the International Electrotechnical Commission (IEC) 9796, the DSA in the United States, and the GOST in Russia In Korea, there is KC-DSA.

이러한 공개키 암호시스템들은 대부분 모듈러 멱승(modular exponentiation: m^e mod N) 연산을 필요로 하고, 이 모듈러 멱승 연산을 위해서는 모듈러 곱셈 (AB mod N)을 수행하는 것이 반드시 필요하다.Most of these public key cryptosystems require modular exponentiation (M ^e mod N) operations, and it is essential to perform modular multiplication (AB mod N).

모듈러 곱셈을 위한 알고리듬으로는 고전적인 알고리듬, 바레트(Barret)의 알고리듬, 그리고 몽고메리(Montgomery) 알고리듬 등이 제안되어 있다.Algorithms for modular multiplication are proposed as classical algorithms, Barrett's algorithms, and Montgomery algorithms.

상기 고전적인 알고리듬은 보통 연필로 나눗셈을 하여 나머지를 구하듯이 한자리씩 몫을 추정하여 나머지를 구하는 과정을 반복함으로써 모듈러 감소를 하는 방법이다. 이는 법 M에 대한 제약이 없으며 사전(事前) 계산이나 사후(事後) 계산이 필요없으므로 어느 경우에나 적용될 수 있는 가장 일반적인 모듈러 감소 알고리듬이다. 그러나, 몫을 추정하는 과정에서 (곱셈에 비해 속도가 느린) 나눗셈이 필요하고 추정된 몫이 정확한 값이 아닌 경우 추가적인 덧셈이나 뺄셈이 필요하므로 비교적 속도가 느린 편이다.The classical algorithm is a method of modular reduction by repeating the process of estimating the quotient by one digit and calculating the remainder as if it is usually divided by a pencil to obtain the remainder. This is the most common modular reduction algorithm that can be applied in either case because there is no restriction on law M and no pre or post calculation is required. However, the estimation of the quotient requires a division (slower than multiplication), and if the estimated quotient is not an exact value, additional addition or subtraction is required, which is relatively slow.

바레트 알고리듬은 고정된 법에 대한 사전계산값을 이용하여 전체 몫을 한꺼번에 추정하여 곱셈만으로 모듈러 감소를 수행한다. 이는 법 M이 고정되어 있는 경우 또는 같은 법에 대해 많은 수의 모듈러 곱셈이 필요한 모듈러 멱승 연산시 고전적인 알고리듬에 비해 좀 더 나은 성능을 보여준다.Barrett's algorithm estimates the total quotient at once using precomputed values for a fixed method and performs modular reduction by multiplication alone. This gives better performance than the classical algorithm when the law M is fixed or when a modular power operation requires a large number of modular multiplications for the same law.

몽고메리 알고리듬은 수체제의 변환을 통해 나눗셈없이 나머지를 구하는 알고리듬으로 다은 알고리듬에 비해 속도가 빠르므로 모듈러 멱승이 필요한 공개키 암호시스템의 구현시 가장 널리 이용된다. 즉 주어진 수들을 곱셈만으로 모듈러 감소를 할 수 있는 다은 수체제로 변환하여 거기서 모듈러 감소시킨 후 이를 다시 원래의 수체제로 역변환시켜 원하는 결과를 얻게 된다. 대부분의 공개키 암호시스템에서 요구되는 모듈러 멱승의 연산시는 이러한 사전/사후 계산은 전체의 수행속도에 거의 영향을 미치지 못하므로 이 알고리듬은 전체적으로 다른 알고리듬에 비해 매우 좋은 성능을 보여준다.Montgomery's algorithm is the most widely used algorithm for public key cryptography that requires modular power because it is faster than other algorithms. In other words, a given number is converted to a different water system, which can be modularly reduced, and then reduced to a modular water system, and then converted back to the original water system to obtain a desired result. In the calculation of the modular power required by most public key cryptosystems, this algorithm shows very good performance compared to other algorithms because these pre / post calculations have little effect on the overall performance.

본 발명은 몽고메리 알고리듬을 이용하여 공개키 암호/복호 및 디지털 서명에 이용되는 모듈러 멱승 및 모듈러 곱셈 연산을 효율적으로 수행할 수 있으며, 그 구성이 간단한 모듈러 연산장치 및 그 방법을 제공함을 그 목적으로 한다.An object of the present invention is to provide a modular arithmetic device and method which can efficiently perform modular power and modular multiplication operations used for public key cryptography / decryption and digital signature using Montgomery algorithm. .

상기의 목적을 달성하기 위하여, 본 발명에 의한 모듈러 연산 을 계산하는 장치는 k비트의 저장용량을 지니고, 상기 A값을 병렬로 입력받고, 소정의 클럭마다 w비트 단위로 하위비트 방향으로 쉬프트하며, 최하위 w비트를 병렬로 출력하는 A메모리수단; k비트의 저장용량을 지니고, 상기 B값을 병렬로 입력받고, 소정의 클럭마다 w비트 단위로 하위비트 방향으로 로테이트하며, 최하위 w비트를 병렬로 출력하는 B메모리수단; k비트의 저장용량을 지니고, 상기 N값을 병렬로 입력받고, 소정의 클럭마다 w비트 단위로 하위비트 방향으로 로테이트하며, 최하위 w비트를 병렬로 출력하는 N메모리수단; 두 개의 w비트의 값을 병렬로 입력받아 곱한 2w비트의 결과를 병렬로 출력하는 곱셈기; 두 개의 2w비트의 값을 병렬로 입력받아 더한 결과를 병렬로 출력하는 덧셈기; k+2w비트의 저장용량을 지니고, 한 클럭 내에 그 중 2w비트를 결정하여 상기 덧셈기로 출력하고, 상기 덧셈기로부터 입력된 값을 출력한 비트의 위치에 저장하는 누산기; w비트의 저장용량을 지니고, 미리 계산된 N₀ ^-1 mod 2^w를 병렬로 입력받고 병렬로 출력하는 J메모리수단(여기에서 N₀는 N의 최하위 w비트이다); w비트의 저장용량을 지니고, 상기 곱셈기의 출력으로부터 상위 w비트를 병렬로 입력받고 병렬로 출력하는 q메모리수단; 상기 J메모리수단, 상기 B메모리수단 및 상기 N메모리수단의 w비트 출력들 중에서 하나의 출력을 선택하여 상기 곱셈기에 전달하는 제1선택수단; 상기 A메모리수단, 상기 q메모리수단 및 상기 누산기의 w비트 출력들 중에서 하나의 출력을 선택하여 상기 곱셈기에 전달하는 제2선택수단; 및 상기 q메모리수단 및 상기 덧셈기 중에서 하나를 선택하여 상기 곱셈기의 출력을 전달하는 제3선택수단을 포함함을 특징으로 한다(여기에서, k = w·s이고, k,w,s는 모두 2이상의 정수).In order to achieve the above object, the modular operation according to the present invention An apparatus for calculating a memory has an storage capacity of k bits, A memory means for receiving the A value in parallel, shifting in a lower bit direction in units of w bits for a predetermined clock, and outputting the least significant w bits in parallel; B memory means having a storage capacity of k bits, receiving the B value in parallel, rotating in the lower bit direction in units of w bits per predetermined clock, and outputting the least significant w bits in parallel; an N memory means having a storage capacity of k bits, receiving the N values in parallel, rotating each of the predetermined clocks in a lower bit direction in units of w bits, and outputting the least significant w bits in parallel; A multiplier that receives two values of w bits in parallel and outputs the result of 2w bits multiplied in parallel; An adder for receiving two 2w bit values in parallel and outputting the sum result in parallel; an accumulator having a storage capacity of k + 2w bits, determining 2w bits among the clocks and outputting them to the adder, and storing the value input from the adder at the position of the output bit; J memory means having a storage capacity of w bits and inputting in parallel N ₀ ^-1 mod 2 ^w and outputting them in parallel (where N ₀ is the least significant bit of N); q memory means having a storage capacity of w bits and receiving the upper order w bits in parallel from the output of the multiplier and outputting them in parallel; First selecting means for selecting one output from among the w-bit outputs of the J memory means, the B memory means, and the N memory means and transferring the output to the multiplier; Second selection means for selecting one output from among the w-memory means of the A memory means, the q memory means, and the accumulator and transferring the output to the multiplier; And third selecting means for selecting one of the q memory means and the adder to deliver the output of the multiplier (where k = w · s, where k, w, s are all 2). Integer greater than or equal to).

상기의 다른 목적을 달성하기 위하여, 본 발명에 의한 모듈러 연산장치를 이용하여 모듈러 연산 을 계산하는 방법은 (a) k비트인 A,B,N을 각각 상기 A메모리수단, 상기 B메모리수단 및 상기 N메모리수단에 저장하고, N₀ ^-1 mod 2^w를 미리 계산하여 상기 J메모리수단에 저장하고, 상기 누산기를 '0'으로 초기화하는 단계; (b) (b.1) 상기 곱셈기가 상기 A메모리수단에 저장된 최하위 w비트의 A₀와 상기 B메모리수단에 저장된 최하위 w비트의 B₀의 곱셈을 수행하는 단계; 및 (b.2) 상기 덧셈기가 상기 곱셈기의 계산한 결과와 상기 누산기의 2w비트의 S_iS_i-1(단, i는 반복회수를 나타낸다)에 저장된 값과 더하여, 상기 누산기의 S_iS_i-1(단, i는 반복회수를 나타낸다)에 저장하는 단계를 수행하되, 매 클럭마다 상기 B메모리수단을 w비트 하위비트측으로 쉬프트시키고, 상기 누산기의 입출력되는 위치를 w비트 상위비트측으로 이동하면서 s번 반복 수행하여 A₀B 값을 계산하는 단계; (c) 상기 곱셈기가 상기 누산기의 최하위 w비트 S₀와 상기 J메모리수단의 J₀을 곱하여 상기 q메모리수단에 저장하는 단계; (d) (d.1) 상기 곱셈기가 상기 N메모리수단에 저장된 최하위 w비트의 N₀와 상기 q메모리수단에 저장된 q₀의 곱셈을 수행하는 단계; 및 (d.2) 상기 덧셈기가 상기 곱셈기의 계산한 결과와 상기 누산기의 2w비트의 S_iS_i-1(단, i는 반복회수를 나타낸다)에 저장된 값과 더하여, 상기 누산기의 S_iS_i-1(단, i는 반복회수를 나타낸다)에 저장하는 단계를 수행하되, 매 클럭마다 상기 N메모리수단을 w비트 하위비트측으로 쉬프트시키고, 상기 누산기의 입출력되는 위치를 w비트 상위비트측으로 이동하면서 s번 반복 수행하여 q₀N 값을 계산하는 단계; (e) 상기 누산기에 저장된 값을 w비트 하위비트측으로 쉬프트하는 단계; (f) 상기 A메모리수단에 저장된 A 값을 w비트 하위비트측으로 쉬프트하면서, 상기 (b) 단계 내지 상기 (e) 단계를 s번 수행하는 단계; 및 (g) 상기 누산기에 저장된 값이 N보다 크면, S = S - N을 수행하는 단계를 포함함을 특징으로 하는 모듈러 연산방법.In order to achieve the above another object, the modular operation using the modular operation device according to the present invention The method of calculating (a) stores A, B, and N, which are k bits, in the A memory means, the B memory means, and the N memory means, respectively, and calculates N ₀ ^-1 mod 2 ^w in advance to store the J memory. Storing in the means and initializing the accumulator to '0'; (b) (b.1) said multiplier performing multiplication of A ₀ of least significant w bits stored in said A memory means and B ₀ of least significant w bits stored in said B memory means; And (b.2) wherein the adder is calculated and the result S _{_i} S _{_i-1} bits of the accumulator 2w one of the multipliers S _i S in addition to the value stored in the (where, i denotes the number of repetitions), the accumulator storing _i-1 (where i denotes the number of repetitions), shifting the B memory means to the w bit lower bit side every clock, and shifting the input / output position of the accumulator to the w bit upper bit side. Repeating s times while calculating A ₀ B value; (c) the multiplier multiplying the least significant w bits S ₀ of the accumulator by J ₀ of the J memory means and storing in the q memory means; (d) (d.1) said multiplier performing multiplication of N ₀ of least significant w bits stored in said N memory means with q ₀ stored in said q memory means; And (d.2), the adder and the calculation result S _{_i} S _{_i-1} bits of the accumulator 2w one of the multipliers S _i S in addition to the value stored in the (where, i denotes the number of repetitions), the accumulator storing in _i-1 (where i denotes the number of repetitions), shifting the N memory means to the lower bit side of the w bit at every clock, and shifting the input / output position of the accumulator to the upper bit side of the w bit; Repeating s times while calculating q ₀ N values; (e) shifting the value stored in the accumulator to the w bit lower bit side; (f) performing steps (b) to (e) s times while shifting the A value stored in the A memory means to the w bit lower bit side; And (g) if the value stored in the accumulator is greater than N, performing S = S-N.

이하에서 첨부된 도면을 참조하여 본 발명을 상세히 설명한다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 각각 k비트인 수The present invention is a number of k bits each

에 대하여 몽고메리 알고리듬을 이용하여 A·B mod N을 효율적으로 구현하기 위한 것이다. 이때, A,B,N은 각각 으로 나타낼 수 있는 큰 수이고, w비트인 워드 s개로 나타낼 수 있다. 즉, r=2^w일 때, k=s·w이고, k,w,s는 모두 2이상의 정수이다. 몽고메리 함수 f_m(A,B,N)은 A·B·R^-1 mod N을 계산하는 함수이다(단, R=2^k).For the purpose of efficiently implementing A · B mod N using the Montgomery algorithm. Where A, B, and N are It can be represented by a large number, and can be represented by s words having w bits. That is, when r = ^2w , k = s * w and k, w, s are all integers of 2 or more. The Montgomery function f _m (A, B, N) is a function that calculates A · B · R ⁻¹ mod N (where R = 2 ^k ).

몽고메리 함수 f_m(·)을 이용하여 모듈러 곱셈 A·B mod N을 수행하기 위해서는 P=R² mod N을 계산하여 미리 저장해 두며, T = f_m(A,B,N) = A·B·R^-1 mod N을 계산한 후, P·T·R^-1 mod N = A·B mod N으로 계산할 수 있다.To perform the modular multiplication A · B mod N using the Montgomery function f _m (·), calculate P = R ² mod N and store it in advance. T = f _m (A, B, N) = A · B · After calculating R ^-1 mod N, P-T-R ^-1 mod N = A-B mod N can be calculated.

몽고메리 알고리듬에 의하면, T = f_m(A,B,N) = A·B·R^-1 mod N은 다음처럼 계산할 수 있다. 단, r = 2^w, R = r^s, J₀ ≡ N₀ ^-1 mod r이고, S는 중간값을 저장하는 누산기(accumulator)이다.According to the Montgomery algorithm, T = f _m (A, B, N) = A, B, R ^-1 mod N can be calculated as However, r = 2 ^w , R = r ^s , J ₀ ≡ N ₀ ^-1 mod r, and S is an accumulator that stores an intermediate value.

S = 0S = 0

for i = 0 to s-1 for i = 0 to s-1

S = S + A_i × BS = S + A _i × B

q_i = S₀ × J₀ mod rq _i = S ₀ × J ₀ mod r

S = S + q_i × NS = S + q _i × N

S = S/2^w S = S / 2 ^w

endforendfor

if S > N then S = S - N if S> N then S = S-N

본 발명은 상기 알고리듬을 하드웨어적으로 구현하기 위해 몇 개의 레지스터(또는 메모리), MUX, w비트의 병렬곱셈기, 2w비트의 덧셈기를 이용하는 방법을 제안하고 있다.The present invention proposes a method using several registers (or memory), MUX, w-bit parallel multiplier, and 2-w bit adder to implement the above algorithm in hardware.

도 1에 의하면, 본 발명에 의한 모듈러 연산장치는 하드웨어의 복잡도를 최소화하기 위하여 곱셈기와 덧셈기를 각각 하나씩만 사용하여 구성하였다.According to FIG. 1, the modular computing device according to the present invention is configured using only one multiplier and one adder to minimize hardware complexity.

A메모리수단(10), B메모리수단(12) 및 N메모리수단(14)은 각각 k비트의 레지스터 또는 유사한 형태의 메모리로 구현되며, 각 클럭마다 한 워드 단위인 w비트 단위로 A메모리수단(10)은 오른쪽으로 쉬프트(shift)하며, B메모리수단(12) 및 N메모리수단(14)은 각각 오른쪽으로 로테이트(rotate)한다.The A memory means 10, the B memory means 12, and the N memory means 14 are each implemented with k-bit registers or similar types of memory. 10 shifts to the right, and the B memory means 12 and the N memory means 14 each rotate to the right.

누산기(20)는 k+2w비트의 레지스터로서, 계산 과정의 임시값을 저장하며, 한 클럭 내에 덧셈기(18)와 상호 2w비트 단위로 읽기와 쓰기를 한다.The accumulator 20 is a register of k + 2w bits and stores a temporary value of the calculation process, and reads and writes in the unit of 2w bits with the adder 18 in one clock.

J메모리수단(22)은 미리 계산된 값을 저장하는 w비트의 레지스터 또는 유사한 형태의 메모리로 구현되고, q메모리수단(24)은 매 클럭마다 계산되는 값을 저장하는 w비트의 레지스터 또는 유사한 형태의 메모리로 구현된다.The J memory means 22 is implemented with a w bit register or similar type of memory that stores a precalculated value, and the q memory means 24 is a w bit register or similar type that stores a value calculated every clock. Is implemented in memory.

곱셈기(16)는 두 개의 w비트의 값을 병렬로 입력받아 2w비트의 결과를 한 클럭에 계산하는 것이다.The multiplier 16 receives two values of w bits in parallel and calculates a result of 2w bits in one clock.

참조번호 26, 28은 각각 멀티플렉서이고, 참조번호 30은 디멀티플렉서이다.Reference numerals 26 and 28 are multiplexers, respectively, and reference numeral 30 is a demultiplexer.

이하에서 본 발명에 의한 모듈러 연산장치의 동작을 설명한다.Hereinafter will be described the operation of the modular operation device according to the present invention.

본 발명에 의한 모듈러 연산장치는 각각 w비트인 워드 s개로 이루어진 A, B, N을 입력받아 s·(2s+4)클럭 내에 A·B·R^-1 mod N을 계산한다.The modular arithmetic unit according to the present invention receives A, B, and N each consisting of s words each having w bits, and calculates A · B · R ⁻¹ mod N in the s · (2s + 4) clock.

도 1은 T = f_m(A,B,N) = A·B·R^-1 mod N을 계산하기 위한 회로로서, 다음의 알고리듬을 바탕으로 하고 있다.1 is a circuit for calculating T = f _m (A, B, N) = A, B, R- ¹ mod N, based on the following algorithm.

S = 0S = 0

for i = 0 to s-1 for i = 0 to s-1

for j = 0 to s-1 for j = 0 to s-1

S = S + A_i × B_j S = S + A _i × B _j

endforendfor

q_i = S₀ × J₀ mod rq _i = S ₀ × J ₀ mod r

for j = 0 to s-1 for j = 0 to s-1

S = S + q_i × N_j S = S + q _i × N _j

endforendfor

S = S/2^w S = S / 2 ^w

endforendfor

if S > N then S = S - N if S> N then S = S-N

다음은 도 1을 바탕으로 T = f_m(A,B,N) = A·B·R^-1 mod N을 계산하는 과정을 설명한다.Next, a process of calculating T = f _m (A, B, N) = A · B · R ⁻¹ mod N will be described based on FIG. 1.

(a) 먼저, 초기화단계로서 k비트인 A,B,N을 각각 A메모리수단(10), B메모리수단(12) 및 N메모리수단(14)에 저장한다. 누산기(20)은 '0'으로 초기화한다.(a) First, k, A, B, and N bits are stored in the A memory means 10, the B memory means 12, and the N memory means 14, respectively, as initialization steps. The accumulator 20 is initialized to '0'.

(b) 각각의 메모리수단(10, 12, 14)에 모든 데이터가 입력되었을 때, 제1멀티플렉서(26)는 B메모리수단(12)에 저장된 w비트의 B₀, 제2멀티플렉서는 A메모리수단(10)에 저장된 w비트의 A₀를 선택한다.(b) When all data is input to each memory means (10, 12, 14), the first multiplexer 26 is of w bits stored in the B memory means (12), B _0, the second multiplexer A memory means Select A ₀ of w bit stored in (10).

(c) 곱셈기(16)는 A₀와 B₀에 저장된 값을 병렬 입력하여 곱셈을 수행하고, 그 결과인 A₀×B₀는 디멀티플렉서(30)에 의해 선택된 덧셈기(18)로 전달되어 누산기(20)의 2w비트의 S_iS_i-1(단, i는 반복회수를 나타낸다)에 저장된 값과 더하여진다. 이때 더한 결과값은 다음 클럭에 누산기(20)의 S_iS_i-1(단, i는 반복회수를 나타낸다)에 저장되고, 다음의 덧셈을 위하여 캐리(carry)값은 보관된다.(c) The multiplier 16 performs multiplication by inputting the values stored in A ₀ and B ₀ in parallel, and the result A ₀ × B ₀ is transferred to the adder 18 selected by the demultiplexer 30 to accumulate ( It is added to the value stored in the 2w bit S _i S _i-1 of 20), where i represents the number of repetitions. At this time, the added result is stored in S _i S _i-1 (where i represents a repetition count) of the accumulator 20 at the next clock, and a carry value is stored for the next addition.

(d) 매 클럭마다 B메모리수단(12)를 w비트 오른쪽으로 쉬프트시키고 (b)와 (c)의 과정을 s번 반복 수행하여 A₀B 값을 계산한다. 이 과정 중 누산기(20)가 덧셈기(18)과 주고 받는 데이터는 매 클럭마다 w비트 단위로 상위비트측으로 이동한다.(d) The B memory means 12 is shifted to the right of the w bit at every clock, and the processes of (b) and (c) are repeated s times to calculate the A ₀ B value. During this process, the accumulator 20 transfers data to and from the adder 18 to the upper bit side in units of w bits every clock.

(e) A₀B 값을 계산한 후, 제2멀티플렉서를 이용하여 S₀ 값을 선택하고, 제1멀티플렉서를 이용하여 J₀값을 선택하여 s번째 클럭에서 곱셈기(16)에 의해 q₀값을 계산한 후, 그 다음 클럭에 디멀티플렉서(30)에 의해 선택된 q메모리수단(24)에 저장한다.(e) After calculating the A ₀ B value, select the S ₀ value using the second multiplexer, select the J ₀ value using the first multiplexer, and then set the q ₀ value by the multiplier 16 at the s-th clock. Is calculated and then stored in the q memory means 24 selected by the demultiplexer 30 on the next clock.

(f) 제1멀티플렉서는 N메모리수단(14)에 저장된 N을 선택하고, 제2멀티플렉서는 q메모리수단(24)에 저장된 q₀값을 선택한다.(f) The first multiplexer selects N stored in the N memory means 14, and the second multiplexer selects a q ₀ value stored in the q memory means 24.

(g) 다른 s클럭 동안 (b)~(d) 과정에서 A₀B를 계산하는 것과 비슷하게 q₀N을 계산한다.(g) Calculate q ₀ N similarly to calculate A ₀ B in steps (b) to (d) during the other clocks.

(h) 누산기(20)에 저장된 값을 한 워드 단위인 w비트 오른쪽으로 쉬프트한다.(h) The value stored in the accumulator 20 is shifted to the right of the w bit, which is a word unit.

(i) A메모리수단(10)에 저장된 A 값을 한 워드 단위인 w비트 오른쪽으로 쉬프트하면서, (b)~(h) 과정을 s번 수행하면, 누산기(20)에는 T = f_m(A,B,N) = A·B·R^-1 mod N 값이 저장된다.(i) If the A value stored in the A memory means 10 is shifted to the right side of the w bit, which is a word unit, and the steps (b) to (h) are performed s times, the accumulator 20 has T = f _m (A , B, N) = A · B · R ^-1 mod N value is stored.

(j) 누산기(20)에 저장된 값이 N보다 크면 S = S - N을 수행한다.(j) If the value stored in the accumulator 20 is larger than N, S = S-N is performed.

본 발명에서 메모리수단을 제외한 모든 소자는 단순한 조합회로로 구현되므로 회로의 잘못된 동작을 방지하기 위하여 연결된 조합회로의 전파지연시간을 충분히 보장하는 정도의 클럭을 제공하여야 한다.In the present invention, all the elements except the memory means are implemented as a simple combination circuit, so in order to prevent incorrect operation of the circuit, it is necessary to provide a clock sufficient to ensure the propagation delay time of the connected combination circuit.

도 2에는 이러한 계산과정의 타이밍 관계가 도시되어 있다.2 shows the timing relationship of this calculation process.

따라서, 본 발명에 의한 모듈러 연산장치를 이용하여 모듈러 곱셈 A·B mod N을 계산하는 과정은 다음과 같다.Therefore, the process of calculating the modular multiplication A · B mod N using the modular operation device according to the present invention is as follows.

(1) 먼저, 미리 P = 2^2k mod N을 계산해 둔다.(1) First, calculate P = 2 ^2k mod N in advance.

(2) 다음, 도 1에 도시된 회로를 이용하여 C = A·B·2^-k mod N을 계산한다.(2) Next, C = ^{A.B.2 -k} mod N is calculated using the circuit shown in FIG.

(3) 마지막으로, P·C·2^-k mod N = A·B mod N을 계산한다.(3) Finally, P · C · 2 ^−k mod N = A · B mod N is calculated.

도 1의 모듈러 연산장치를 이용하여 모듈러 멱승 m^e mod N을 계산하는 과정은 다음과 같다.The process of calculating the modular power m ^e mod N using the modular operation device of FIG. 1 is as follows.

(1) 지수 e를 레지스터 또는 유사한 형태의 메모리에 저장한다.(1) Store the exponent e in a register or similar form of memory.

(2) 레지스터 N에 법 N을 저장한다.(2) Store law N in register N.

(3) 누산기 S를 '0'으로 초기화한다.(3) Initialize accumulator S to '0'.

(4) 몽고메리 모듈러 곱셈 을 수행한다. 단, 멱승 연산의 밑 P는 모듈러 곱셈을 계산하는 과정 중 (1)에서 미리 계산한 값과 동일한 값이다.(4) Montgomery Modular Multiplication Do this. However, the base P of the power operation is the same value as the value calculated in advance in the process of calculating the modular multiplication.

(5) m'을 레지스터 B에 로드한다.(5) Load m 'into register B.

(6) 레지스터 B에 로드된 값을 이용하여 모듈러 제곱 연산을 수행한다. 이때, 몽고메리 모듈러 곱셈에 필요한 A는 레지스터 B에서 로드한다.(6) Modular square operation is performed using the value loaded in register B. At this time, A required for Montgomery modular multiplication is loaded from register B.

(7) 지수 e의 최상위비트(Most Significant Bit:MSB)인 '1'을 무시하고, 다음 비트를 현재비트로 한다.(7) Ignore '1', the most significant bit (MSB) of exponent e, and make the next bit the current bit.

(8) 레지스터 B에 저장된 값을 승수 및 피승수로 하여 모듈러 제곱 연산을 수행한다. 결과값은 레지스터 B에 로드한다.(8) Modular square operation is performed by using the value stored in register B as a multiplier and a multiplicand. The result is loaded into register B.

(9) 지수 e의 현재비트가 '1'인 경우에는 멱승 연산의 밑 m'을 승수로하고, 레지스터 B의 값을 피승수로 하여 모듈러 곱셈 연산을 수행한다. 결과값은 레지스터 B에 로드한다.(9) When the current bit of the exponent e is' 1 ', the modular multiplication operation is performed with the base m' of the power operation as a multiplier and the value of the register B as a multiplier. The result is loaded into register B.

(10) 지수 e의 모든 비트에 대하여 단계 (8) 내지 단계 (9)를 수행한 후, 1을 승수로 하고, 레지스터 B의 값을 피승수로 하여 모듈러 곱셈 연산을 수행한다.(10) After performing steps (8) to (9) for all bits of the exponent e, a modular multiplication operation is performed by setting 1 to a multiplier and a value of register B as a multiplier.

단계 (1) 내지 단계 (10)을 수행한 후, 누산기 S에 남아 있는 값이 최종적인 모듈러 멱승 m^e mod N이 된다.After performing steps (1) to (10), the value remaining in the accumulator S becomes the final modular power m ^e mod N.

도 1은 본 발명에 의한 모듈러 연산장치의 블록 구성도이다.1 is a block diagram of a modular computing device according to the present invention.

도 2는 도 1의 모듈러 연산장치에서의 타이밍도이다.FIG. 2 is a timing diagram of the modular computing device of FIG. 1.

Claims

Modular operations In the device for calculating

A memory means having a storage capacity of k bits, receiving the A value in parallel, shifting in a lower bit direction in units of w bits per predetermined clock, and outputting the least significant w bits in parallel;

B memory means having a storage capacity of k bits, receiving the B value in parallel, rotating in the lower bit direction in units of w bits per predetermined clock, and outputting the least significant w bits in parallel;

an N memory means having a storage capacity of k bits, receiving the N values in parallel, rotating each of the predetermined clocks in a lower bit direction in units of w bits, and outputting the least significant w bits in parallel;

A multiplier that receives two values of w bits in parallel and outputs the result of 2w bits multiplied in parallel;

An adder for receiving two 2w bit values in parallel and outputting the sum result in parallel;

an accumulator having a storage capacity of k + 2w bits, determining 2w bits among the clocks and outputting them to the adder, and storing the value input from the adder at the position of the output bit;

J memory means having a storage capacity of w bits and inputting in parallel N ₀ ^-1 mod 2 ^w and outputting them in parallel (where N ₀ is the least significant bit of N);

q memory means having a storage capacity of w bits and receiving the upper order w bits in parallel from the output of the multiplier and outputting them in parallel;

First selecting means for selecting one output from among the w-bit outputs of the J memory means, the B memory means, and the N memory means and transferring the output to the multiplier;

Second selection means for selecting one output from among the w-memory means of the A memory means, the q memory means, and the accumulator and transferring the output to the multiplier; And

And a third selecting means for selecting one of the q memory means and the adder to deliver the output of the multiplier (where k = w · s, where k, w, s is Are all integers greater than or equal to 2).

The apparatus of claim 1, wherein the adder and the multiplier are respectively

A modular computing device for performing addition and multiplication within one clock.

The method of claim 1, wherein the first selecting means and the second selecting means are respectively.

Is a multiplexer,

The third selecting means is

Modular operation unit, characterized in that the demultiplexer.

Modular operation using the modular computing device of claim 1 In the method for calculating

(a) storing A, B, and N, which are k bits, in the A memory means, the B memory means, and the N memory means, respectively, calculating N ₀ ^-1 mod 2 ^w in advance and storing them in the J memory means, Initializing the accumulator to '0';

(b) (b.1) said multiplier performing multiplication of A ₀ of least significant w bits stored in said A memory means and B ₀ of least significant w bits stored in said B memory means; And

(b.2) In addition to the result of the adder's calculation of the multiplier and the value stored in 2w bits of S _i S _i-1 (where i represents the number of repetitions) of the accumulator, S _i S _i of the accumulator _-1 (where i denotes the number of repetitions), while shifting the B memory means to the w bit lower bit side every clock, and moving the input / output position of the accumulator to the w bit upper bit side. Repeating s times to calculate A ₀ B

(c) the multiplier multiplying the least significant w bits S ₀ of the accumulator by J ₀ of the J memory means and storing in the q memory means;

(d) (d.1) said multiplier performing multiplication of N ₀ of least significant w bits stored in said N memory means with q ₀ stored in said q memory means; And

(d.2) In addition to the result of the adder's calculation of the multiplier and the value stored in 2w bits of S _i S _i-1 (where i denotes the number of repetitions) of the accumulator, S _i S _i of the accumulator _-1 (where i denotes the number of repetitions), while shifting the N memory means to the w bit lower bit side every clock, and shifting the input / output position of the accumulator to the w bit upper bit side. repeating s times to calculate the q ₀ N value;

(e) shifting the value stored in the accumulator to the w bit lower bit side;

(f) performing steps (b) to (e) s times while shifting the A value stored in the A memory means to the w bit lower bit side; And

(g) if the value stored in the accumulator is greater than the N value, subtracting the N value from the value stored in the accumulator, and storing the result value in the accumulator. .