Silicon Forest Math Series | RSA


Silicon Forest Math Series
Oregon Curriculum Network

Introduction to Public Key Cryptography

Here in the Silicon Forest, we do not expect everyone to become a career computer programmer.

We do expect a lot of people will wish to program at some time in their career.

Coding skills give you the power to control machines and you might find appropriate and life-enhancing uses for this type of power.

To help you get the flavor of coding, we leverage concepts we expect you're already getting through your math courses.

In moving from pre-computer math, to computer math, and back again, we develop important conceptual bridges.

Generating the Prime Numbers

Lets look at a first such concept, that of a prime number.

The Fundamental Theorem of Arithmetic says every positive integer distills into a unique list of prime number factors. Duplicates are allowed.

But what are primes in the first place? Numbers with no factors other than themselves.


In [1]:
import pprint

def primes():
    """generate successive prime numbers (trial by division)"""
    candidate = 1
    _primes_so_far = [2]     # first prime, only even prime
    yield _primes_so_far[0]  # share it!
    while True:
        candidate += 2    # check odds only from now on
        for prev in _primes_so_far:
            if prev**2 > candidate:
                yield candidate  # new prime!
                _primes_so_far.append(candidate)
                break
            if not divmod(candidate, prev)[1]: # no remainder!
                break                          # done looping
                
p = primes()  # generator function based iterator
pp = pprint.PrettyPrinter(width=40, compact=True)
pp.pprint([next(p) for _ in range(30)])  # next 30 primes please!


[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31,
 37, 41, 43, 47, 53, 59, 61, 67, 71, 73,
 79, 83, 89, 97, 101, 103, 107, 109,
 113]

The above algorithm is known as "trial by division".

Keep track of all primes discovered so far, and test divide them, in increasing order, into a candidate number, until:

(A) either one of the primes goes evenly, in which case move on to the next odd

or

(B) until we know our candidate is a next prime, in which case yield it and append it to the growing list.

If we get passed the 2nd root of the candidate, we conclude no larger factor will work, as we would have encountered it already as the smaller of the factor pair.

Passing this 2nd root milestone triggers plan B. Then we advance to the next candidate, ad infinitum.

Python pauses at each yield statement however, handing control back to the calling sequence, in this case a "list comprehension" containing a next() function for advancing to the next yield.

Coprimes, Totatives, and the Totient of a Number

From here, we jump to the idea of numbers being coprime to one another. A synonym for coprime is "stranger." Given two ordinary positive integers, they're strangers if they have no prime factors in common. For that to be true, they'd have no shared factors at all (not counting 1).

Guido van Rossum, the inventor of Python, gives us a pretty little implementation of what's known as Euclid's Method, an algorithm that's thousands of years old. It'll find the largest factor any two numbers have in common (gcd = "greatest common divisor").

Here it is:


In [2]:
def gcd(a, b):
    while b:
        a, b = b, a % b
    return a

print(gcd(81, 18))
print(gcd(12, 44))
print(gcd(117, 17)) # strangers


9
4
1

How does Euclid's Method work? That's a great question and one your teacher should be able to explain. First see if you might figure it out for yourself...

Here's one explanation:

If a smaller number divides a larger one without remainder then we're done, and that will always happen when that smaller number is 1 if not before.

If there is a remainder, what then? Lets work through an example.

81 % 18 returns a remainder of 9 in the first cycle. 18 didn't go into 81 evenly but if another smaller number goes into both 9, the remainder, and 18, then we have our answer.

9 itself does the trick and we're done.


In [3]:
print(81 % 18) # 18 goes into 
print(18 % 9)  # so the new b becomes the answer


9
0

Suppose we had asked for gcd(18, 81) instead? 18 is the remainder (no 81s go into it) whereas b was 81, so the while loop simply flips the two numbers around to give the example above.

The gcd function now gives us the means to compute totients and totatives of a number. The totients of N are the strangers less than N, whereas the totient is the number of such strangers.


In [4]:
def totatives(N):
    # list comprehension!
    return [x for x in range(1,N) if gcd(x,N)==1]  # strangers only
    
def T(N):
    """
    Returns the number of numbers between (1, N) that 
    have no factors in common with N:  called the 
    'totient of N' (sometimes phi is used in the docs)
    """
    return len(totatives(N)) # how many strangers did we find?

print("Totient of  100:", T(100))
print("Totient of 1000:", T(1000))


Totient of  100: 40
Totient of 1000: 400

Where to go next is in the direction of Euler's Theorem, a generalization of Fermat's Little Theorem. The built-in pow(m, n, N) function will raise m to the n modulo N in an efficient manner.


In [5]:
def powers(N):
    totient = T(N)
    print("Totient of {}:".format(N), totient)
    for t in totatives(N):
        values = [pow(t, n, N) for n in range(totient + 1)]
        cycle = values[:values.index(1, 1)]  # first 1 after initial 1
        print("{:>2}".format(len(cycle)), cycle)
    
powers(17)


Totient of 17: 16
 1 [1]
 8 [1, 2, 4, 8, 16, 15, 13, 9]
16 [1, 3, 9, 10, 13, 5, 15, 11, 16, 14, 8, 7, 4, 12, 2, 6]
 4 [1, 4, 16, 13]
16 [1, 5, 8, 6, 13, 14, 2, 10, 16, 12, 9, 11, 4, 3, 15, 7]
16 [1, 6, 2, 12, 4, 7, 8, 14, 16, 11, 15, 5, 13, 10, 9, 3]
16 [1, 7, 15, 3, 4, 11, 9, 12, 16, 10, 2, 14, 13, 6, 8, 5]
 8 [1, 8, 13, 2, 16, 9, 4, 15]
 8 [1, 9, 13, 15, 16, 8, 4, 2]
16 [1, 10, 15, 14, 4, 6, 9, 5, 16, 7, 2, 3, 13, 11, 8, 12]
16 [1, 11, 2, 5, 4, 10, 8, 3, 16, 6, 15, 12, 13, 7, 9, 14]
16 [1, 12, 8, 11, 13, 3, 2, 7, 16, 5, 9, 6, 4, 14, 15, 10]
 4 [1, 13, 16, 4]
16 [1, 14, 9, 7, 13, 12, 15, 6, 16, 3, 8, 10, 4, 5, 2, 11]
 8 [1, 15, 4, 9, 16, 2, 13, 8]
 2 [1, 16]

Above we see repeating cycles of numbers, with the length of the cycles all dividing 16, the totient of the prime number 17.

pow(14, 2, 17) is 9, pow(14, 3, 17) is 7, and so on, coming back around the 14 at pow(14, 17, 17) where 17 is 1 modulo 16.

Numbers raised to any kth power modulo N, where k is 1 modulo the totient of N, end up staying the same number. For example, pow(m, (n * T(N)) + 1, N) == m for any n.


In [6]:
from random import randint

def check(N):
    totient = T(N)
    for t in totatives(N):
        n = randint(1, 10)
        print(t, pow(t, (n * totient) + 1, N))

check(17)


1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
11 11
12 12
13 13
14 14
15 15
16 16

In public key cryptography, RSA in particular, a gigantic composite N is formed from two primes p and q.

N's totient will then be (p - 1) * (q - 1). For example if N = 17 * 23 (both primes) then T(N) = 16 * 22.


In [7]:
p = 17
q = 23
T(p*q) == (p-1)*(q-1)


Out[7]:
True

From this totient, we'll be able to find pairs (e, d) such that (e * d) modulo T(N) == 1.

We may find d, given e and T(N), by means of the Extended Euclidean Algorithm (xgcd below).

Raising some numeric message m to the eth power modulo N will encrypt the message, giving c.

Raising the encrypted message c to the dth power will cycle it back around to its starting value, thereby decrypting it.

c = pow(m, e, N)

m = pow(c, d, N)

where (e * d) % T(N) == 1.

For example:


In [8]:
p = 37975227936943673922808872755445627854565536638199
q = 40094690950920881030683735292761468389214899724061
RSA_100 = p * q
totient = (p - 1) * (q - 1)

In [9]:
# https://en.wikibooks.org/wiki/
# Algorithm_Implementation/Mathematics/
# Extended_Euclidean_algorithm

def xgcd(b, n):
    x0, x1, y0, y1 = 1, 0, 0, 1
    while n != 0:
        q, b, n = b // n, n, b % n
        x0, x1 = x1, x0 - q * x1
        y0, y1 = y1, y0 - q * y1
    return  b, x0, y0

# x = mulinv(b) mod n, (x * b) % n == 1
def mulinv(b, n):
    g, x, _ = xgcd(b, n)
    if g == 1:
        return x % n

e = 3
d = mulinv(e, totient)
print((e*d) % totient)


1

In [10]:
import binascii
m = int(binascii.hexlify(b"I'm a secret"), 16)
print(m)  # decimal encoding of byte string


22640069159164324404987585908

In [11]:
c = pow(m, e, RSA_100) # raise to eth power
print(c)


11604682090980443295874366312042718286665199375418874910814662777671879383859486933312

In [12]:
m = pow(c, d, RSA_100) # raise to dth power
print(m)


22640069159164324404987585908

In [13]:
binascii.unhexlify(hex(m)[2:]) # m is back where we started.


Out[13]:
b"I'm a secret"

What makes RSA hard to crack is that although N is public, d is not, and N's factors p and q have been thrown away.

Because factoring N back into p, q is a super hard problem, if N is large enough, RSA remains a secure algorithm, and a favorite one, now that the patent has expired.

The idea is when you want to encrypt a message for Alice, look up her public key N. Only she has private key d, derived from her N at the time.

You may sign your message by raising it to your own secret key power, modulo your own public N, and she'll know for sure the message is from you, given she'll have your public key to decrypt it, once her own secret d has been applied.

For Further Reading

Linked Jupyter Notebooks: