Split Secrets

Chapter 5 of Real World Algorithms.

Panos Louridas
Athens University of Economics and Business

The RSA Cryptosystem

The security of the RSA cryptosystem is based on the difficulty of factoring.

That is, given a number, it is difficult to find its prime factors.

But to arrive at the RSA cryptosystem itself, we need first to see first some algorithms from Number Theory.

Euclid's algorithm allows us to find the Greatest Common Divisor (gcd) of two numbers.



In [11]:

    
def euclid(a, b):
    if b == 0:
        return a
    else:
        return euclid(b, a % b)
    
print(euclid(160, 144))

The algorithm is recursive.

To find the gcd of $a$ and $b$ is equivalent to finding the GCD of $a$ and $a \bmod b$, unless $b$ is zero, in which case the gcd is $a$.

It does not matter which of $a$, $b$ is bigger, the algorithm will provide the same answer.



In [12]:

    
print(euclid(144, 160))

To see why this happens, and to get a better understanding at the workings of the algorithm, we can include some tracing information in the code.



In [13]:

    
def trace_euclid(a, b):
    print('GCD of', a, b)
    if b == 0:
        return a
    else:
        return trace_euclid(b, a % b)



In [14]:

    
print(trace_euclid(160, 144))
print(trace_euclid(144, 160))









    



GCD of 160 144
GCD of 144 16
GCD of 16 0
16
GCD of 144 160
GCD of 160 144
GCD of 144 16
GCD of 16 0
16

So, what happens is that when $a < b$, $a$ and $b$ are swapped in the first recursive call.

That is because if $a < b$, then $a \bmod b = a$.



In [15]:

    
144 % 160 == 144









    Out[15]:





True

The next step is the Extended Euclid's Algorithm.

This algorithm gives us the GCD of $a$ and $b$ and two numbers $x$ and $y$ such that: $$ GCD(a, b) = xa + yb$$



In [16]:

    
def extended_euclid(a, b):
    if b == 0:
        return (a, 1, 0)
    else:
        (d, x, y) = extended_euclid(b, a % b)
        return (d, y, x - (a // b) * y)
    
print(extended_euclid(3, 352))









    



(1, -117, 1)

Indeed, we can verify that the GCD of 3 and 352 is 1 and that: $$ 1 = -117 \times 3 + 352 \times 1$$



In [17]:

    
(d, x, y) = extended_euclid(3, 352)
d == x * 3 + y * 352









    Out[17]:





True

Again, we can gain a better understanding by tracing the algorithm.

In the following, Going down means that we make recursive calls (so we go add to the recursion stack) and Going up means that we have returned from recursive calls (so we make our way up, unwinding the recursion stack).

In effect, they are the left and the right part of Figure 5.2 in the book.



In [18]:

    
def trace_extended_euclid(a, b):
    if b == 0:
        return (a, 1, 0)
    else:
        print('Going down', a, b, a % b)
        (d, x, y) = trace_extended_euclid(b, a % b)
        print('Going up', d, x, y)
        return (d, y, x - (a // b) * y)
    
print(trace_extended_euclid(3, 352))









    



Going down 3 352 3
Going down 352 3 1
Going down 3 1 0
Going up 1 1 0
Going up 1 0 1
Going up 1 1 -117
(1, -117, 1)

We need the Extended Euclid's Algorithm in order to find the modular inverse of a number $a$ modulo $n$.

The inverse of a number $a$, is the number that we denote by $a^{-1}$ such that $a a^{-1} = 1$.

If we move over to arithmetic modulo $n$, we obtain a similar definition for the modular inverse.

The modular inverse of a number $a$ modulo $n$, is the number we denote by $a^{-1} \bmod n$ such that $a a^{-1} \bmod n = 1$.

To find the modular inverse of a number $a$ modulo $n$, we run the Extended Euclid's Algorithm with input $a$ and $n$.

The algorithm will return a triplet $(r, x, y)$.

If $r = 1$ it means we have: $$ 1 = ax + nb$$

Therefore: $$ ax = 1 - bn$$

But that is equivalent to: $$ ax \bmod n = 1$$

So, $x$ is the modular inverse of $a$ modulo $n$.

If $r \ne 1$, then the modular inverse does not exist.

Note that the Extended Euclid's Algorithm may return $(r, x, y)$ with $x < 0$.

If this happens, the modular inverse is $x + n$.

With all this, the function for finding the modular inverse is:



In [19]:

    
def modular_inverse(a, n):
    (r, x, y) = extended_euclid(a, n) 
    if r != 1:
        return 0 
    elif x < 0:
        return x + n
    else:
        return x

For example, let's take $n = 31$ and $a = 6$.



In [20]:

    
modular_inverse(6, 31)









    Out[20]:





26

We can verify that it is correct:



In [21]:

    
(6 * 26) % 31 == 1









    Out[21]:





True

Now we can see the RSA cryptosystem in action.

Alice picks two primes, say $p = 17$ and $q = 23$.

She calculates $n = pq$.



In [22]:

    
p_alice = 17
q_alice = 23
n_alice = p_alice * q_alice
print(n_alice)

Alice then chooses a number $e$ that is relatively prime to $(p - 1)(q - 1)$.

Say she picks $e = 3$.

She then finds the modular inverse of $e$ modulo $(p - 1)(q - 1)$.



In [23]:

    
e_alice = 3
phi_alice = (p_alice - 1)*(q_alice - 1)
d_alice = modular_inverse(e_alice, phi_alice)
print(d_alice)

Alice's public key is $(e, n) = (3, 391)$.

Her private key is $(d, n) = (235, 391)$.

If Bob wants to encrypt, say, the message $M = 314$ and send it to Alice, he takes her public key and performs the following operation: $$ M^e \bmod n$$



In [24]:

    
m = 314
c = pow(m, e_alice, n_alice)
print(c)

The encrypted message is $C = 155$, which he sends to Alice.

To decrypt the message, Alice uses her private key and performs the following operation: $$C^d \bmod n$$



In [25]:

    
dec_c = pow(c, d_alice, n_alice)
print(dec_c)

So she did decrypt $C$ to the correct message.

Signing a message works in the same way, only the roles of the public and the private key are interchanged.

If Alice wants to sign, say, the message $M = 314$ and send it to Bob, she takes her public key and performs the following operation: $$M^d \bmod n$$



In [26]:

    
sig_c = pow(m, d_alice, n_alice)
print(sig_c)

The signature is $S = 274$, which she sends to Bob, along with $M = 314$.

Bob can then verify that the sender is indeed Alice by taking her public key and performing the following operation: $$S^e \bmod n$$



In [27]:

    
pow(sig_c, e_alice, n_alice) == m









    Out[27]:





True

As we saw, this allows Bob to verify that Alice was the sender, but Alice sends her message as plaintext to Bob.

If she wants to encrypt it, then she will encrypt the message and the signature with Bob's public key.

Bob will decrypt the message and the signagure and then will check the signature against the decrypted message.

Suppose that Bob has selected $p = 31$, $q = 43$.



In [28]:

    
p_bob = 31
q_bob = 43
n_bob = 31 * 43
print(n_bob)
phi_bob = (p_bob - 1) * (q_bob - 1)
print(phi_bob)

Bob then picks $e = 23$, which is relatively prime to 1260.

He finds the modular inverse of $e$ modulo $(p - 1)(q - 1)$.



In [29]:

    
e_bob = 23
d_bob = modular_inverse(e_bob, phi_bob)
print(d_bob)

So Alice will send her signature of $M = 314$, and send it along with $M$ encrypted with Bob's public key:



In [30]:

    
c = pow(m, e_bob, n_bob)
print(c)
enc_sig = pow(sig_c, e_bob, n_bob)
print(enc_sig)

Bob can now decrypt the message and the signature using his private key:



In [31]:

    
dec_c = pow(c, d_bob, n_bob)
print(dec_c)
dec_sig = pow(enc_sig, d_bob, n_bob)
print(dec_sig)

This is equal to what he will get by verifying the signature:



In [32]:

    
pow(dec_sig, e_alice, n_alice) == m









    Out[32]:





True

Up to this point we have been using small numbers, but in reality we use much bigger ones.

For instance, we want the sizes of our RSA keys to be 4096 bits.

Let's see how RSA works with such numbers.

In what follows we did not pulled big numbers out of thin air. In reality, we used the OpenSSL toolkit, in case you wonder how the example was built.

We'll use $e = 65537$.



In [33]:

    
e = 65537

We need two large primes $p$ and $q$, so here is $p$:



In [34]:

    
p = 0x00cfb0f468a68aee13806191c962ea\
565a4ab0d5547797d05a27aac67d6a\
ba1b279a1e7f62220b26190cefb271\
5d4bb81e28cffc290c2c05a58c9767\
184ceec95e3b333833829e346ec9ec\
f50ba2ba033f88d5b961008726f61b\
7600fd6b0d408c2d347ab8360aa14c\
eb9b4c851e61b76265bf53b510c647\
d185341571cb8fa76e1fce322906a8\
8a78fb4c588d821a08d395dc57e972\
2cc83c71cf3a138531f3e0f3918996\
eaf80c4879bf94755f0d2f470d5d6f\
bcbe1ec71fa2012341dcf8be36daf2\
3f3b00d30859a373efce255649602d\
f7664779bdc441602d3f0217240003\
a96801730aba5392796d0ea69116e8\
2e107d48b35d010c3c4a49973b3aa7\
d047

And here is $q$:



In [35]:

    
q = 0x00c7e596a9d7d1ee5f7e591933b3b9\
9f8e720b9f3e7b402987adb9480595\
14a5bd56d071bb9ef8637cd9c34932\
1d60ff2990bf2d4d1178d6456d7007\
ae527e3f8d7128a4f0d7fe04179b2a\
080c7c63f85207be1e21a8d322823a\
201860accf874effbc9a14bfeb266c\
83bf2957dc79f1459a34d7bda13b96\
caa239540b9337c018793d0645bf4c\
541e792bccfb43b849505296da74c9\
18ae9b47dbc544ac880baaf733df5d\
76fd0a2bfb9c9b2bfe24393a61acdb\
2cc134366ce669788ff8c85ac4806a\
b35bc8d1110a27769fabae1a474007\
33f0e63e078b15163a86b7f402d9e4\
644a594f2e42b095728d25d411eb0a\
72efe859f58e5bdd1f93d15b77fb90\
c9dd

Now we can calculate $(p - 1)(q - 1)$:



In [36]:

    
phi = (p - 1) * (q - 1)
print(hex(phi))









    



0xa22cd18375319222e3284c004ac9a225ecce5b00446aaff5f2df68ecb6ae135c90127c58b7d99cb1a66da9cbe67db3352f3068af2381bf6d2bcf0220a416af0b008bf87eec889a95e8f81036634f2d823eccab77f79f09c693a5f0f335566c5b10c90c0ebbeef5076a273db5a86873351a7b06df09118f3fe988732ebdd4642f6a25b0ad85b49f1b68b07e946e2005cbb6143110a05df959570828c45b7aa8d2bbc78c0bddf7d27a48aaee8f569b5434229ddd4711f8c67bc6c8f52dff31b1d8df252008c8049706bc1598a1cba14ea82a2a9dd19b6d937df77106676c1c039894dd9101189f97c709f5849293774f74c4b6f69cdfb8397c4a22518a7f1aaaa720076e198df7913b3da617b04efad404d366ecf7a3e3c9ddad036f36f5bcda11b395436f9edd42a9e192e994d6573e4ce1b9ed779f17320a3dd78f8e301e0e2651c58bf8351e28caa10a12872581617a74f7ec1ce9cfc038e6de4e71dff6f047dba1b113eed5a4b7e3ac3f56fc8d2f32360167f96c3c9bcb14ff87bc1b8735d3b9394807d3e853dbf6a8cd669f0600de09c15558cd99d8371fc016d24ac73de641714b1afa8e39e1f70b1e335cedf452477509af901580cdbd1ed1fa2c1b65105dbfd2c89184c5f835e9a73d0d73604da0c2b50268a56e073220f8b121f2827d321eec3a5c582dc6c8875efb7bdf47da380bb614bf430b44e8ffb217f41df228

And we can also calculate $n = pq$:



In [37]:

    
n = p * q
print(hex(n))









    



0xa22cd18375319222e3284c004ac9a225ecce5b00446aaff5f2df68ecb6ae135c90127c58b7d99cb1a66da9cbe67db3352f3068af2381bf6d2bcf0220a416af0b008bf87eec889a95e8f81036634f2d823eccab77f79f09c693a5f0f335566c5b10c90c0ebbeef5076a273db5a86873351a7b06df09118f3fe988732ebdd4642f6a25b0ad85b49f1b68b07e946e2005cbb6143110a05df959570828c45b7aa8d2bbc78c0bddf7d27a48aaee8f569b5434229ddd4711f8c67bc6c8f52dff31b1d8df252008c8049706bc1598a1cba14ea82a2a9dd19b6d937df77106676c1c039894dd9101189f97c709f5849293774f74c4b6f69cdfb8397c4a22518a7f1aaaa8b79df92c0c546dae3c60c2ad659ec9ed9023618a96bbc3bf82677db9f58b9af6a484348d5fe0cc3fc845e5385103f5949b4916edbcbc0df537defe54cf8b1711fe21691c8fba615106210f9f449f5d0c058bc39f932a09b13c7467cff7d3b8230892c5e0e4cb6c715306b533f768d7da35f5936c1e3e7a673c6cf1397a4e9d5a5244807699dd32736b20f2ef64d85301f23487b708df4f0ed96b162a7c813d722c36b40f5c8350566c674dd4ba1f5cd3b67f54990f687e5a45896dcc01dc7e0bb91cc55f5b28df5c00d436b6e0e3f0edd5ee0c30206abd5da888be6b2d195c653fd146fc955531eeb48193761ee13a7b387158bdaa9ff4a0c71aa4cb2a568c4b

The public key is $(e, n)$.

The private key is $(d, n)$ where $d$ is the multiplicative inverse modulo $(p - 1)(q - 1)$ of $e$.

That is:



In [40]:

    
d = modular_inverse(e, phi)
print(hex(d))









    



0x5c36e1d73eff35da52922a4d0c3984d2cdc934a37d43b0d4480ad2edae9e62f2021610d09d91c6709972c7d6e233dd7fc35a625c1bf37df6c4af4bc565a86455fc349ad3090a4fe427f94db6af576948230f5bfcb6379f6663b43ac30034291ecaf796bc960e3513c73f92ee455947110e02a09097e67d2ed94ab63c00c2d148c8b1afb9ab4a5e2246affcf9c778bbf2ee90a2a992967cdf590691afbd588cc06ef7f3611810ff847ae77f08d035387700ac0513915b84f902ba67f784a12c4065fa05321911cb1290463368e491cf58fdc907be15040496dee94cbbe81431d343ea8cb633db238190df2347f9442151033e0a127f7f906864fb08a498cb7ff7d1d587bc4e2654557c23825ac53e08eac98e3ff63c219d3a4853a89c76b4b0c6d9f768c6026e4ba6af564b2d4f87822c5339d6966b46d434509fcec80a023d077afdba2e237e07b1d84a12e8f7e18e9d03ffa92973a5a6e674183a40b41a04dcc73d34a461ece7618cbaf7b52d9bd61755f00cc01dd484bdc02de06f8cb3a05ec87692c7a8adf56920e33110462232c20e87f950aa6fed2ac30a05b003c731224959f35bded69a812b690dfe24bc456db7a60c20e0ebe936936ffff3a1bade07a5571ab2e353bd01fb1de5f3683f8fa5f460626e813477d0eda723e20cac899848b49e9c3cc0031f820cbf6c7e2e22fc295c3767aaada3adc65306606a75d0a9

Let's encrypt the message ATTACKATDAWN.

To do that, we need to convert the message to a number smaller than $n$.

One way to do that is to convert the string to a sequence of bytes and then interpret that sequence of bytes as an integer, using the int.from_bytes() function that we saw in the notebook for Chapter 4.



In [53]:

    
encoded_m = int.from_bytes('ATTACKATDAWN'.encode('utf-8'), byteorder='big')
print(m)









    



20218473289907025413077555022

The number is well below $n$, so we are fine.

We can encrypt it:



In [50]:

    
c = pow(encoded_m, e, n)
print(c)









    



407610036536766973941746712970056610431835270763945506960742542700949567151960874466077831719168680539470512240285486337550969993279696657735430597939419591374065091761356463235152143710255449503873561336415103006795697846296395947461876977977394431370772059394605673469827590676144590868929203525034699443219779269405168770159479959294157741569972986104837443912443448384645504245575692381456943218026944194684571058452049478974903319436215524101922325643097716873419410243227934020948460647053999998422913455094868056924225820448155709173128595626608895554934753751544860636088412735253276591446354450386193737480234349332997586568691670839008357864280250666956934406970115168608653775115126626245410948290822146137494594341860806577958750028245578858672356969329085479075907906476430846438024985350304613047425941365722484003570780153140446461960446467394379043325391336573770382418692001929699853863734921837392277439431493254812307721121364761570747515567992948509006262597846233976186363549135380390043355308274871299976822982419028054165067879724454988158437177860111295800172420220495633529577878379178147048919884717838512877763877003097684166522722064813438850586879945356215277724271919578412510711801553294801455380141500

And we can verify that it decrypts correctly:



In [54]:

    
dec_c = pow(c, d, n)
dec_c == encoded_m









    Out[54]:





True

Finally, we can recover the original string, by converting its number encoding to a series of bytes, and that to a string.

Note that the int.to_bytes() function requires the length of the byte array, which we calculate as we go.

In effect, we add 7 to the number of bits representing the integer encoding of the message, and then we perform an integer division by 8. In this way we get the rounded up multiple of 8, which is what we want.



In [59]:

    
decoded_m = dec_c.to_bytes((dec_c.bit_length() + 7) // 8, byteorder='big').decode('utf-8')
print(decoded_m)









    



ATTACKATDAWN

Note, though, that if our message cannot be encoded with less bits than those required for representing $n$, the modulo, we cannot encrypt it using RSA.

In that case, we use RSA only to exchange a secret encryption key that we will use to encrypt and decrypt our message using a symmetric encryption algorithm, such as AES, using a hybrid encryption scheme: a combination of symmetric and asymmetric cryptography.

Message Hashing

If we want to sign a large message, we usually sign a digital fingerprint of the message instead.

A fingerprint of a message $M$ is a short piece of identifying data that we get by applying a special fast function $h(M)$ to the message.

This function is called a hash function.

As a hash function maps a message of a number of bits to a digest of a smaller number of bits, we will always have collisions: different messages hashing to the same value.

However, a good hash function will make such collisions very unlikely in practice.

A good family of hash function is the Secure Hash Algorithm 2 (SHA-2) family of functions, defined as part of the Secure Hash Standard.

The functions are available in Python through the hashlib library.

As an example, we will use SHA-256, which produces 256 bit hashes.

We can verify that even very similar messages produce very different hashes.

For example, capital and smaller case latin letters differ only by one bit, bit 100000 = 32 as we can see below:



In [83]:

    
import string

for lc, uc in zip(string.ascii_lowercase, string.ascii_uppercase):
    bin_lc = bin(ord(lc))
    bin_uc = bin(ord(uc))
    diff = ord(lc) ^ ord(uc)
    print(lc, bin_lc, uc, bin_uc, bin(diff), diff)









    



a 0b1100001 A 0b1000001 0b100000 32
b 0b1100010 B 0b1000010 0b100000 32
c 0b1100011 C 0b1000011 0b100000 32
d 0b1100100 D 0b1000100 0b100000 32
e 0b1100101 E 0b1000101 0b100000 32
f 0b1100110 F 0b1000110 0b100000 32
g 0b1100111 G 0b1000111 0b100000 32
h 0b1101000 H 0b1001000 0b100000 32
i 0b1101001 I 0b1001001 0b100000 32
j 0b1101010 J 0b1001010 0b100000 32
k 0b1101011 K 0b1001011 0b100000 32
l 0b1101100 L 0b1001100 0b100000 32
m 0b1101101 M 0b1001101 0b100000 32
n 0b1101110 N 0b1001110 0b100000 32
o 0b1101111 O 0b1001111 0b100000 32
p 0b1110000 P 0b1010000 0b100000 32
q 0b1110001 Q 0b1010001 0b100000 32
r 0b1110010 R 0b1010010 0b100000 32
s 0b1110011 S 0b1010011 0b100000 32
t 0b1110100 T 0b1010100 0b100000 32
u 0b1110101 U 0b1010101 0b100000 32
v 0b1110110 V 0b1010110 0b100000 32
w 0b1110111 W 0b1010111 0b100000 32
x 0b1111000 X 0b1011000 0b100000 32
y 0b1111001 Y 0b1011001 0b100000 32
z 0b1111010 Z 0b1011010 0b100000 32

We can then see what happens if we get the SHA-256 hash value of i am going to be hashed vs. the hash value of I am going to be hashed.

The two messages differ by just one bit.

Yet they produce completely different SHA-256 hash values.



In [88]:

    
import hashlib

print(hashlib.sha256(b"i am going to be hashed").hexdigest())
print(hashlib.sha256(b"I am going to be hashed").hexdigest())









    



52ac2af460118f051bcc42886ab7f6cbb48ecdba8eeffb141f9c811c12a34353
1b0ca72ec711fcec092e0acfec74f2033dbe70531019892e2bf9be2a0d53c84d

Cryptography in Practice

To repeat how we ended the previous notebook: you should never, ever, use any of the code in this notepad for production use if you care a bit about security.

The code we have shown serves for demonstration purposes only.

However, the development of secure code is a full-blown art. There are many details that one has to take into account when writing code for a robust, secure system.

You should always use libraries that have a good security record, that are actively supported, and are well documented.

You should never use your own cryptographic code unless you are an expert programmer, you have shared your code with the programming community, and the consensus is that the implementation is sound.