Panos Louridas
Athens University of Economics and Business
The security of the RSA cryptosystem is based on the difficulty of factoring.
That is, given a number, it is difficult to find its prime factors.
But to arrive at the RSA cryptosystem itself, we need first to see first some algorithms from Number Theory.
Euclid's algorithm allows us to find the Greatest Common Divisor (gcd) of two numbers.
In [11]:
def euclid(a, b):
if b == 0:
return a
else:
return euclid(b, a % b)
print(euclid(160, 144))
The algorithm is recursive.
To find the gcd of $a$ and $b$ is equivalent to finding the GCD of $a$ and $a \bmod b$, unless $b$ is zero, in which case the gcd is $a$.
It does not matter which of $a$, $b$ is bigger, the algorithm will provide the same answer.
In [12]:
print(euclid(144, 160))
To see why this happens, and to get a better understanding at the workings of the algorithm, we can include some tracing information in the code.
In [13]:
def trace_euclid(a, b):
print('GCD of', a, b)
if b == 0:
return a
else:
return trace_euclid(b, a % b)
In [14]:
print(trace_euclid(160, 144))
print(trace_euclid(144, 160))
So, what happens is that when $a < b$, $a$ and $b$ are swapped in the first recursive call.
That is because if $a < b$, then $a \bmod b = a$.
In [15]:
144 % 160 == 144
Out[15]:
The next step is the Extended Euclid's Algorithm.
This algorithm gives us the GCD of $a$ and $b$ and two numbers $x$ and $y$ such that: $$ GCD(a, b) = xa + yb$$
In [16]:
def extended_euclid(a, b):
if b == 0:
return (a, 1, 0)
else:
(d, x, y) = extended_euclid(b, a % b)
return (d, y, x - (a // b) * y)
print(extended_euclid(3, 352))
Indeed, we can verify that the GCD of 3 and 352 is 1 and that: $$ 1 = -117 \times 3 + 352 \times 1$$
In [17]:
(d, x, y) = extended_euclid(3, 352)
d == x * 3 + y * 352
Out[17]:
Again, we can gain a better understanding by tracing the algorithm.
In the following, Going down
means that we make recursive calls (so we go add to the recursion stack) and Going up
means that we have returned from recursive calls (so we make our way up, unwinding the recursion stack).
In effect, they are the left and the right part of Figure 5.2 in the book.
In [18]:
def trace_extended_euclid(a, b):
if b == 0:
return (a, 1, 0)
else:
print('Going down', a, b, a % b)
(d, x, y) = trace_extended_euclid(b, a % b)
print('Going up', d, x, y)
return (d, y, x - (a // b) * y)
print(trace_extended_euclid(3, 352))
We need the Extended Euclid's Algorithm in order to find the modular inverse of a number $a$ modulo $n$.
The inverse of a number $a$, is the number that we denote by $a^{-1}$ such that $a a^{-1} = 1$.
If we move over to arithmetic modulo $n$, we obtain a similar definition for the modular inverse.
The modular inverse of a number $a$ modulo $n$, is the number we denote by $a^{-1} \bmod n$ such that $a a^{-1} \bmod n = 1$.
To find the modular inverse of a number $a$ modulo $n$, we run the Extended Euclid's Algorithm with input $a$ and $n$.
The algorithm will return a triplet $(r, x, y)$.
If $r = 1$ it means we have: $$ 1 = ax + nb$$
Therefore: $$ ax = 1 - bn$$
But that is equivalent to: $$ ax \bmod n = 1$$
So, $x$ is the modular inverse of $a$ modulo $n$.
If $r \ne 1$, then the modular inverse does not exist.
Note that the Extended Euclid's Algorithm may return $(r, x, y)$ with $x < 0$.
If this happens, the modular inverse is $x + n$.
With all this, the function for finding the modular inverse is:
In [19]:
def modular_inverse(a, n):
(r, x, y) = extended_euclid(a, n)
if r != 1:
return 0
elif x < 0:
return x + n
else:
return x
For example, let's take $n = 31$ and $a = 6$.
In [20]:
modular_inverse(6, 31)
Out[20]:
We can verify that it is correct:
In [21]:
(6 * 26) % 31 == 1
Out[21]:
Now we can see the RSA cryptosystem in action.
Alice picks two primes, say $p = 17$ and $q = 23$.
She calculates $n = pq$.
In [22]:
p_alice = 17
q_alice = 23
n_alice = p_alice * q_alice
print(n_alice)
Alice then chooses a number $e$ that is relatively prime to $(p - 1)(q - 1)$.
Say she picks $e = 3$.
She then finds the modular inverse of $e$ modulo $(p - 1)(q - 1)$.
In [23]:
e_alice = 3
phi_alice = (p_alice - 1)*(q_alice - 1)
d_alice = modular_inverse(e_alice, phi_alice)
print(d_alice)
Alice's public key is $(e, n) = (3, 391)$.
Her private key is $(d, n) = (235, 391)$.
If Bob wants to encrypt, say, the message $M = 314$ and send it to Alice, he takes her public key and performs the following operation: $$ M^e \bmod n$$
In [24]:
m = 314
c = pow(m, e_alice, n_alice)
print(c)
The encrypted message is $C = 155$, which he sends to Alice.
To decrypt the message, Alice uses her private key and performs the following operation: $$C^d \bmod n$$
In [25]:
dec_c = pow(c, d_alice, n_alice)
print(dec_c)
So she did decrypt $C$ to the correct message.
Signing a message works in the same way, only the roles of the public and the private key are interchanged.
If Alice wants to sign, say, the message $M = 314$ and send it to Bob, she takes her public key and performs the following operation: $$M^d \bmod n$$
In [26]:
sig_c = pow(m, d_alice, n_alice)
print(sig_c)
The signature is $S = 274$, which she sends to Bob, along with $M = 314$.
Bob can then verify that the sender is indeed Alice by taking her public key and performing the following operation: $$S^e \bmod n$$
In [27]:
pow(sig_c, e_alice, n_alice) == m
Out[27]:
As we saw, this allows Bob to verify that Alice was the sender, but Alice sends her message as plaintext to Bob.
If she wants to encrypt it, then she will encrypt the message and the signature with Bob's public key.
Bob will decrypt the message and the signagure and then will check the signature against the decrypted message.
Suppose that Bob has selected $p = 31$, $q = 43$.
In [28]:
p_bob = 31
q_bob = 43
n_bob = 31 * 43
print(n_bob)
phi_bob = (p_bob - 1) * (q_bob - 1)
print(phi_bob)
Bob then picks $e = 23$, which is relatively prime to 1260.
He finds the modular inverse of $e$ modulo $(p - 1)(q - 1)$.
In [29]:
e_bob = 23
d_bob = modular_inverse(e_bob, phi_bob)
print(d_bob)
So Alice will send her signature of $M = 314$, and send it along with $M$ encrypted with Bob's public key:
In [30]:
c = pow(m, e_bob, n_bob)
print(c)
enc_sig = pow(sig_c, e_bob, n_bob)
print(enc_sig)
Bob can now decrypt the message and the signature using his private key:
In [31]:
dec_c = pow(c, d_bob, n_bob)
print(dec_c)
dec_sig = pow(enc_sig, d_bob, n_bob)
print(dec_sig)
This is equal to what he will get by verifying the signature:
In [32]:
pow(dec_sig, e_alice, n_alice) == m
Out[32]:
Up to this point we have been using small numbers, but in reality we use much bigger ones.
For instance, we want the sizes of our RSA keys to be 4096 bits.
Let's see how RSA works with such numbers.
In what follows we did not pulled big numbers out of thin air. In reality, we used the OpenSSL toolkit, in case you wonder how the example was built.
We'll use $e = 65537$.
In [33]:
e = 65537
We need two large primes $p$ and $q$, so here is $p$:
In [34]:
p = 0x00cfb0f468a68aee13806191c962ea\
565a4ab0d5547797d05a27aac67d6a\
ba1b279a1e7f62220b26190cefb271\
5d4bb81e28cffc290c2c05a58c9767\
184ceec95e3b333833829e346ec9ec\
f50ba2ba033f88d5b961008726f61b\
7600fd6b0d408c2d347ab8360aa14c\
eb9b4c851e61b76265bf53b510c647\
d185341571cb8fa76e1fce322906a8\
8a78fb4c588d821a08d395dc57e972\
2cc83c71cf3a138531f3e0f3918996\
eaf80c4879bf94755f0d2f470d5d6f\
bcbe1ec71fa2012341dcf8be36daf2\
3f3b00d30859a373efce255649602d\
f7664779bdc441602d3f0217240003\
a96801730aba5392796d0ea69116e8\
2e107d48b35d010c3c4a49973b3aa7\
d047
And here is $q$:
In [35]:
q = 0x00c7e596a9d7d1ee5f7e591933b3b9\
9f8e720b9f3e7b402987adb9480595\
14a5bd56d071bb9ef8637cd9c34932\
1d60ff2990bf2d4d1178d6456d7007\
ae527e3f8d7128a4f0d7fe04179b2a\
080c7c63f85207be1e21a8d322823a\
201860accf874effbc9a14bfeb266c\
83bf2957dc79f1459a34d7bda13b96\
caa239540b9337c018793d0645bf4c\
541e792bccfb43b849505296da74c9\
18ae9b47dbc544ac880baaf733df5d\
76fd0a2bfb9c9b2bfe24393a61acdb\
2cc134366ce669788ff8c85ac4806a\
b35bc8d1110a27769fabae1a474007\
33f0e63e078b15163a86b7f402d9e4\
644a594f2e42b095728d25d411eb0a\
72efe859f58e5bdd1f93d15b77fb90\
c9dd
Now we can calculate $(p - 1)(q - 1)$:
In [36]:
phi = (p - 1) * (q - 1)
print(hex(phi))
And we can also calculate $n = pq$:
In [37]:
n = p * q
print(hex(n))
The public key is $(e, n)$.
The private key is $(d, n)$ where $d$ is the multiplicative inverse modulo $(p - 1)(q - 1)$ of $e$.
That is:
In [40]:
d = modular_inverse(e, phi)
print(hex(d))
Let's encrypt the message ATTACKATDAWN
.
To do that, we need to convert the message to a number smaller than $n$.
One way to do that is to convert the string to a sequence of bytes and then interpret that sequence of bytes as an integer, using the int.from_bytes()
function that we saw in the notebook for Chapter 4.
In [53]:
encoded_m = int.from_bytes('ATTACKATDAWN'.encode('utf-8'), byteorder='big')
print(m)
The number is well below $n$, so we are fine.
We can encrypt it:
In [50]:
c = pow(encoded_m, e, n)
print(c)
And we can verify that it decrypts correctly:
In [54]:
dec_c = pow(c, d, n)
dec_c == encoded_m
Out[54]:
Finally, we can recover the original string, by converting its number encoding to a series of bytes, and that to a string.
Note that the int.to_bytes()
function requires the length of the byte array, which we calculate as we go.
In effect, we add 7 to the number of bits representing the integer encoding of the message, and then we perform an integer division by 8. In this way we get the rounded up multiple of 8, which is what we want.
In [59]:
decoded_m = dec_c.to_bytes((dec_c.bit_length() + 7) // 8, byteorder='big').decode('utf-8')
print(decoded_m)
Note, though, that if our message cannot be encoded with less bits than those required for representing $n$, the modulo, we cannot encrypt it using RSA.
In that case, we use RSA only to exchange a secret encryption key that we will use to encrypt and decrypt our message using a symmetric encryption algorithm, such as AES, using a hybrid encryption scheme: a combination of symmetric and asymmetric cryptography.
If we want to sign a large message, we usually sign a digital fingerprint of the message instead.
A fingerprint of a message $M$ is a short piece of identifying data that we get by applying a special fast function $h(M)$ to the message.
This function is called a hash function.
As a hash function maps a message of a number of bits to a digest of a smaller number of bits, we will always have collisions: different messages hashing to the same value.
However, a good hash function will make such collisions very unlikely in practice.
A good family of hash function is the Secure Hash Algorithm 2 (SHA-2) family of functions, defined as part of the Secure Hash Standard.
The functions are available in Python through the hashlib
library.
As an example, we will use SHA-256, which produces 256 bit hashes.
We can verify that even very similar messages produce very different hashes.
For example, capital and smaller case latin letters differ only by one bit, bit 100000 = 32 as we can see below:
In [83]:
import string
for lc, uc in zip(string.ascii_lowercase, string.ascii_uppercase):
bin_lc = bin(ord(lc))
bin_uc = bin(ord(uc))
diff = ord(lc) ^ ord(uc)
print(lc, bin_lc, uc, bin_uc, bin(diff), diff)
We can then see what happens if we get the SHA-256 hash value of i am going to be hashed
vs. the hash value of I am going to be hashed
.
The two messages differ by just one bit.
Yet they produce completely different SHA-256 hash values.
In [88]:
import hashlib
print(hashlib.sha256(b"i am going to be hashed").hexdigest())
print(hashlib.sha256(b"I am going to be hashed").hexdigest())
To repeat how we ended the previous notebook: you should never, ever, use any of the code in this notepad for production use if you care a bit about security.
The code we have shown serves for demonstration purposes only.
However, the development of secure code is a full-blown art. There are many details that one has to take into account when writing code for a robust, secure system.
You should always use libraries that have a good security record, that are actively supported, and are well documented.
You should never use your own cryptographic code unless you are an expert programmer, you have shared your code with the programming community, and the consensus is that the implementation is sound.