A project by Alex Gustafsson and Marcus Lenander in the course MA1453 (Cryptography 2) at BTH, Sweden.
This project evaluates the relative performance of various algorithms for integer factorization, namely Trial Division, Fermat's Factorization, Pollard's Rho Algorithm, Lenstra's Elliptic Curve Factorization Algorithm and briefly the Quadratic Sieve. The algorithms are presented and described with regards to use and implementation details. No effort was made to optimize the implemented algorithms further than what is considered logical. No algorithm leverages parallel computing techniques, dynamic programming techniques or other techniques in order to gain performance. No consideration of performance was made when the software and programming stack were chosen.
The project was hosted on a private instance of CoCalc - Collaborative Calculation in the Cloud. CoCalc makes it easy to collaboratively develop math-oriented code using technologies such as SageMath, R, Julia and LaTeX whilst still being completely free and open source.
The paper is available in the branch paper.
See also: Course Notes for BTH courses and code and examples for Programming Courses at BTH.
- Download this repository and launch it in a public or private CoCalc project
- Navigate / test the code yourself or read below
Quickstart
Algorithms
Utilities
Disclaimer
Psuedocode and comments for the implemented factorization algorithms. The goal of each implemented algorithm was to be able to retrieve an unknown amount of factors. Therefore there are some changes made to some of the algorithms to allow for that goal to be fulfilled. Beyond that, we've tried to stay true to each algorithm's core meaning that we've taken no liberty to add dynamic or parallell programming techniques to speed up the factorization. In a realistic scenario, such techniques are crucial to speed factorization up - especially for numbers consisting of many (possibly equal) factors.
Found in trial_division.py
, the trial division algorithms implemented consists of the following pseudocode:
def TrialDivision(n)
f = 1
while n > 1
if n is divisible by f
f is a factor. Divide n by f
else
increment f by one
A modification of the above Trial Division that halves the needed time.
def TrialDivision(n)
f = 3
while n is divisible by 2
2 is a factor. Divide n by 2
while n > 1
if n is divisible by f
f is a factor. Divide n by f
else
increment f by 2
Found in fermats_factorization.py
, the Fermat's algorithm implemented consists of the following pseudocode:
def FermatsFactorization(n)
if n % 2 == 0:
return None
a = ceil(sqrt(n))
b2 = a*a - n
while b2 is not square:
a += 1
b2 = a*a - n
b = sqrt(b2)
return p = (a + b), q = (a - b)
Found in pollards_rho_algorithm.py
, the Pollard's algorithm implemented consists of the following pseduocode:
def PollardsRhoAlgorithm(n)
if n is 1
1 is a factor. Stop
add n to the stack 'stack' to try with the first g(x)
while the stack is not empty
pop an element a from the stack
if a is prime
a is a factor
else
f = 1, x = 2, y = 2
while f is 1
x = g(x), y = g(g(x))
f = gcd(x - y, a)
if f is n
add a to stack to try with the next g(x)
else
add f and a / f to the stack
Note: the function g(x)
is tested in the following order: g(x)=x^2+1
, g(x)=x^3+1
, g(x)=x+1
. If you're interested in why those functions are tested in that order, please refer to the paper.
Found in lenstras_elliptic_curve_factorization.py
, the Lenstra's algorithm implemented consists of the following pseudocode:
def LenstrasEllipticCurveFactorizationAlgorithm(n)
x, y, A = random element in Zn
Q = (x, y)
if a modular inverse exists:
P = Q + Q
while P + Q can be calculated
if iterations is less then maximum iterations
if modular inverse exists in kP
kP = (k-1)P + Q
else
add gcd(modular inverse, n) to the stack of factors
else
no factor found, try with new random elements
else
add gcd(modular inverse, n) to the stack of factors
Found in quadratic_sieve.py
, the Quadratic Sieve algorithm implemented is too complex to easily describe using pseudocode. Please view the file quadratic_sieve.py
and read the source code instead.
Utilities produced to make testing, benchmarking and debugging easier.
Found in factorization_function.py
, the FactorizationFunction
class is a parent class to all implemented factorization algorithms. The class enables the same interface and testability for all algorithms. Noteworthy details are mentioned below.
Class implementation:
from factorization_function import *
class MyFactorizationFunction(FactorizationFunction)
@classmethod
def getCharacteristics(self):
c = FactorizationFunctionCharacteristics()
c.canFactorizePrimeComposites = True
c.canFactorizeEvenComposites = True
return c
@classmethod
def factorize(self, n, returnBenchmark=False):
# Setup
factors = []
...
# Start of algorithm
...
# End of algorithm
return (factors, benchmark) if returnBenchmark else factors
Class usage:
MyFactorizationFunction.factorize(3*5*6)
MyFactorizationFunction.test()
MyFactorizationFunction.benchmark(numberSuite)
Each Factorization Function supports a common interface for determining capabilities.
To retrieve a Factorization Function's capabilities, call the getCharacteristics
method.
The result is expected to be a FactorizationFunctionCharacteristics
object consisting of two bools:
canFactorizePrimes
indicates whether or not the algorithm can factorize numbers consisting of prime factors. canFactorizeNonPrimes
indicates whether or not the algorithm can factorize numbers consisting of even factors.
To add specific characteristics to your function, simply define the function like this:
...
class MyFactorizationFunction(FactorizationFunction)
...
def getCharacteristics():
c = FactorizationFunctionCharacteristics()
c.canFactorizePrimes = True
c.canFactorizeNonPrimes = False
return c
Each Factorization Function supports a common interface for testing.
To test a Factorization Function, call the test
method.
MyFactorizationFunction.test()
Each Factorization Function is first tested to see whether or not the factorization
function will run at all. If not, the test will fail.
If the algorithm has been noted to support factorization to prime factors, the function is then tested using the following numbers: 2, 3, 5, 7, 11, 15, 21, 25, 45, 77, 121, 169, 21853
. These numbers have been found by us to test the overall factorization of an algorithm, including edge cases, multiple factors and other quirks.
If the algorithm has been noted to support factorization to even factors, the function is then tested using the following numbers: 2, 4, 12, 20, 30
. No further effort has been made to cover edge cases using even numbers. This is due to the fact that not all algorithms support even numbers without alteration. Furthermore, factorizing even numbers is often not interesting in a true to life scenario.
When common tests have been run, any function of the class starting with test
will be run. To add specific tests to your function, simply define a function like this:
...
class MyFactorizationFunction(FactorizationFunction)
...
def testMySpecifics():
...
Each Factorization Function supports a common interface for benchmarking. See also the implementation details on benchmarking.
To test a Factorization Function, call the benchmark
function with your number suite and an optional number of iterations for each benchmark (defaults to 10
).
MyFactorizationFunction.benchmark(numberSuite, iterations=10)
The number suite is expected to follow the following format:
# A tuple
(
# String description of the suite
description,
# A list of tuples
[
# A tuple
(
# Integer indicating amount of factors in this benchmark
numberOfFactors,
# A list of tuples
[
# A tuple
(
# The number to factorize
number,
# A list of factors
[
factor1,
factor2,
...
]
)
]
)
]
)
For a more comprohensive view, refer to the following example as well as the file number_suite.sagews
.
An example number suite testing two numbers consisting of two and three factors. The number tested for two factors is 118=1*2*59
. The number tested for three factors is 159953=1*17*97*97
.
("Example", [(2, [(118, [1, 2, 59])]), (3, [(159953, [1, 97, 17, 97])])])`
To add benchmarks to your function, define benchmarks like this:
from benchmark import *
...
class MyFactorizationFunction(FactorizationFunction)
...
def factorize():
# Setup
...
benchmark = Benchmark()
# Start of algorithm
benchmark.start()
for i in range(10):
benchmark.iterate()
benchmark.start("another timer")
...
benchmark.stop("another timer")
# End of algorithm
benchmark.stop()
...
The complete list of parameters supported by .benchmark()
:
(self, numberSuite, iterations=10, optionalArgs={}, start=2, skipFactors=None):
Parameter | Default | Description | Required |
---|---|---|---|
numberSuite |
- | See above | Yes |
iterations |
10 |
Number of times each number is tested | No |
optionalArgs |
{} |
Map of optional parameter (key) and corresponding value (value) sent to the function that is being benchmarked | No |
start |
2 |
The least number of factors to test, no matter the number suite | No |
skipFactors |
None |
A function that accepts a number, numberOfFactors and returns a boolean whether or not the benchmark should occur. Allows all numbers if None |
No |
Found in benchmark.py
, the Benchmark
class is a utility class to keep track of time and iterations.
# Import the class
from benchmark import *
# Create an instance to store values
benchmark = Benchmark()
benchmark.verbose = True
# Start a benchmark (the default one)
benchmark.start()
print "This is some heavy work"
# Stop the default benchmark
benchmark.stop()
benchmark.start("main loop")
for i in range(2):
# Add one to the default iteration counter
benchmark.iterate()
benchmark.start("inner loop")
for j in range(2):
benchmark.iterate()
benchmark.stop("inner loop")
benchmark.stop("main loop")
# Benchmark uses named timers and iterations
# If none is given, "default" is used - as seen above
benchmark.start("another timer")
benchmark.stop("another timer")
benchmark.iterate("another counter")
# When printing a benchmark, it is automatically formatted to be somewhat easy to read
print benchmark
This is some heavy work
Benchmark <Total time: {"default": 0.000479, "inner loop": 0.000051, "main loop": 0.000562, "another timer": 0.000119}, iterations: {"default": 6, "another counter": 1}>
A benchmark instance has the following noteworthy attributes:
totalTime
: a map where keys are labels and values are total time recorded for labeliterations
: a map where keys are labels and values are iterations recorded for label
A benchmark instance has the following noteworthy methods:
-
start(label)
: starts counting time forlabel
.label
defaults to "default" -
stop(label)
: stops counting time forlabel
.label
defaults to "default" -
iterate(label)
: add one tolabel
iteration counter.label
defaults to "default" -
asTables()
: return a latex string describing the benchmark as time and iterations tables - commonly used with Sage'sshow()
-
printEvents()
: an overview of the algorithm. See "Other" below.
A benchmark instance has the following operations:
+
: add to benchmarks together - useful when measuring different functions and aggregating the results
Benchmark has the following static functions:
min(benchmarks)
: returns the minimum value for each measurement in the benchmarks array as a new benchmarkmax(benchmarks)
: returns the maximum value for each measurement in the benchmarks array as a new benchmarkaverage(benchmarks)
: returns the average value for each measurement in the benchmarks array as a new benchmark
Benchmarking can be used to describe the algorithm rather cleanly. You can try to set verbose=True
on your Benchmark
instance to record logs. An overview of the algorithm can the be printed using .printEvents()
. The output below is from the example above ("Basic usage") with an added call to benchmark.printEvents()
:
▶️ : start of a timer- ⏹: stop of a timer
- 🔂: iteration
This is some heavy work
Benchmark <Total time: {"default": 0.000493, "inner loop": 0.000046, "main loop": 0.000550, "another timer": 0.000118}, iterations: {"default": 6, "another counter": 1}>
▶️ default
⏹ default (0.000493 s)
▶️ main loop
🔂 default
▶️ inner loop
🔂 default x 2
⏹ inner loop (0.000025 s)
🔂 default
▶️ inner loop
🔂 default x 2
⏹ inner loop (0.000021 s)
⏹ main loop (0.000550 s)
▶️ another timer
⏹ another timer (0.000118)
🔂 another counter
IMPORTANT NOTE:
Every time either .start()
, .stop()
or .iterate()
is called, somewhere around 40 bytes is added to RAM usage due to the logging of events. This might not seem as much, but when, for example, Lenstra's Elliptic Curve Factorization Method is run on larger numbers it may run several hundred thousand iterations, resulting in multiple gigabytes of RAM usage.
This repository holds code created by students. Although the code is most likely correct, it may not promote best practices, be factual or grammatically correct. The code is meant as a future reference for students reading the same or a similar course. The code is also made publically available alongside the report written on the subject.