Table of Contents

Kryptos Decoding From Scratch

For the latest version of this document see https://github.com/markvanheeswijk/kryptos

Introduction

This document describes how to get started with decoding Kryptos from scratch, gradually going all the way from ROT-13 to decoding Kryptos, discussing the relevant cryptanalysis principles and code along the way. Take your time to really understand each approach and its limitations. There will be some challenge ciphers to solve along the way.

First, we will look at the Caesar cipher and 4 possible ways to attack it:

  • solution #1: brute-force all possible keys
  • solution #2: frequency analysis showing the effect applying a Caesar cipher
  • solution #3: chi-square criterion for automatically determining the best key based on alignment of the frequency plots.
  • solution #4: maximum likelihood for automatically determining the best key based on how likely it is that the decrypted ciphertext is a plaintext in a certain language.

Then we will move on to the Vigenere cipher, which is very similar to the Caesar cipher, except that the consecutive shifts of the plaintext characters are determined by a keyword. If the key has length N, then each Nth character of the plaintext will be shifted by the same shift, determined by the corresponding character in the key. Therefore, we can re-use many of the same principles which we already saw for solving the Caesar cipher.

  • pre-requisite #1: using index of coincidence for determining the key length, say N
  • (pre-requisite #2: using index of coincidence for determining the language of the message and whether it might be a transposition cipher)
  • solution #0: manually cracking the Vigenere cipher
  • solution #1: chi-square criterion for automatically determining the best keys for each of the N Caesar ciphers
  • solution #2: maximum likelihood, combined with a search procedure (in this case a genetic algorithm) for determining the best key
  • solution #3: maximum likelihood, combined brute-force for determining the best keyword in a dictionary/wordlist

Finally, we will look at the Keyed Caesar and Keyed Vigenere variants, which add an extra complication in the form of a keyed alphabet:

  • Keyed Caesar cipher and variants
    • solution #1: maximum likelihood, combined with a search procedure to solve the substitution defined by the cipher
  • Keyed Vigenere cipher
    • solution #1: recognize special case to guess vigenere key, use maximum likelihood, combined with brute-force to find the alphabet key.
    • solution #2: know the alphabet key, like in Kryptos, use maximum likelihood, combined with a search procedure to find the vigenere key.

To conclude, we will use the principles discussed to solve Kryptos K1 and K2, using nothing but the 4 panels of Kryptos.

References

  • http://practicalcryptography.com/ (highly recommended for getting started with python code for cryptanalysis of various ciphers, and the website a lot of the code here is adapted from!)

(for more references, see below)

2 Easier Pieces: Caesar meets Vigenere

Caesar Cipher

The Caesar cipher is an example of a substution cipher, where each symbol in the plaintext is replaced by another symbol through a fixed mapping. In case of the Caesar cipher, the target alphabet is a shifted version of plaintext alphabet. A particular instance of this is ROT-13, where the target alphabet is shifted by 13 positions

ABCDEFGHIJKLMNOPQRSTUVWXYZ

becomes

NOPQRSTUVWXYZABCDEFGHIJKLM

and A gets replaced by N, B by O, etc.

Now, let's define a simple function for encrypting and decrypting using a Caesar cipher:


In [1]:
def rot(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", direction=1):
    keyval = alphabet.find(key)
    t = ""
    for sc in s:
        i = alphabet.find(sc)
        t += alphabet[(i + keyval * direction) % len(alphabet)] if i > -1 else sc
    return t

It takes a letter as key, finds its position in the alphabet, and shifts / rotates the alphabet by that amount. Symbols that are not in the alphabet are unaffected. Now, define some text and encrypt it using the function:


In [2]:
ptext = '"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS'
ctext = rot(ptext, 'N')
ctext


Out[2]:
'"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF'

Since we know the key we can decrypt it:


In [3]:
decrypted_ctext = rot(ctext, 'N', direction = -1)
decrypted_ctext


Out[3]:
'"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS'

Caesar Cipher Solution #1: Brute-force

However, what if we do not know the key? One option would be to brute-fortry all possible keys, and just read off the solution. There are only 26 possibilities after all.


In [4]:
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
for key in alphabet:
    decrypted_ctext = rot(ctext, key, direction = -1)
    print "%s:\t%s" % (key, decrypted_ctext)


A:	"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF
B:	"U TQMD MZP U RADSQF. U EQQ MZP U DQYQYNQD. U PA MZP U GZPQDEFMZP." --OAZRGEUGE
C:	"T SPLC LYO T QZCRPE. T DPP LYO T CPXPXMPC. T OZ LYO T FYOPCDELYO." --NZYQFDTFD
D:	"S ROKB KXN S PYBQOD. S COO KXN S BOWOWLOB. S NY KXN S EXNOBCDKXN." --MYXPECSEC
E:	"R QNJA JWM R OXAPNC. R BNN JWM R ANVNVKNA. R MX JWM R DWMNABCJWM." --LXWODBRDB
F:	"Q PMIZ IVL Q NWZOMB. Q AMM IVL Q ZMUMUJMZ. Q LW IVL Q CVLMZABIVL." --KWVNCAQCA
G:	"P OLHY HUK P MVYNLA. P ZLL HUK P YLTLTILY. P KV HUK P BUKLYZAHUK." --JVUMBZPBZ
H:	"O NKGX GTJ O LUXMKZ. O YKK GTJ O XKSKSHKX. O JU GTJ O ATJKXYZGTJ." --IUTLAYOAY
I:	"N MJFW FSI N KTWLJY. N XJJ FSI N WJRJRGJW. N IT FSI N ZSIJWXYFSI." --HTSKZXNZX
J:	"M LIEV ERH M JSVKIX. M WII ERH M VIQIQFIV. M HS ERH M YRHIVWXERH." --GSRJYWMYW
K:	"L KHDU DQG L IRUJHW. L VHH DQG L UHPHPEHU. L GR DQG L XQGHUVWDQG." --FRQIXVLXV
L:	"K JGCT CPF K HQTIGV. K UGG CPF K TGOGODGT. K FQ CPF K WPFGTUVCPF." --EQPHWUKWU
M:	"J IFBS BOE J GPSHFU. J TFF BOE J SFNFNCFS. J EP BOE J VOEFSTUBOE." --DPOGVTJVT
N:	"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS
O:	"H GDZQ ZMC H ENQFDS. H RDD ZMC H QDLDLADQ. H CN ZMC H TMCDQRSZMC." --BNMETRHTR
P:	"G FCYP YLB G DMPECR. G QCC YLB G PCKCKZCP. G BM YLB G SLBCPQRYLB." --AMLDSQGSQ
Q:	"F EBXO XKA F CLODBQ. F PBB XKA F OBJBJYBO. F AL XKA F RKABOPQXKA." --ZLKCRPFRP
R:	"E DAWN WJZ E BKNCAP. E OAA WJZ E NAIAIXAN. E ZK WJZ E QJZANOPWJZ." --YKJBQOEQO
S:	"D CZVM VIY D AJMBZO. D NZZ VIY D MZHZHWZM. D YJ VIY D PIYZMNOVIY." --XJIAPNDPN
T:	"C BYUL UHX C ZILAYN. C MYY UHX C LYGYGVYL. C XI UHX C OHXYLMNUHX." --WIHZOMCOM
U:	"B AXTK TGW B YHKZXM. B LXX TGW B KXFXFUXK. B WH TGW B NGWXKLMTGW." --VHGYNLBNL
V:	"A ZWSJ SFV A XGJYWL. A KWW SFV A JWEWETWJ. A VG SFV A MFVWJKLSFV." --UGFXMKAMK
W:	"Z YVRI REU Z WFIXVK. Z JVV REU Z IVDVDSVI. Z UF REU Z LEUVIJKREU." --TFEWLJZLJ
X:	"Y XUQH QDT Y VEHWUJ. Y IUU QDT Y HUCUCRUH. Y TE QDT Y KDTUHIJQDT." --SEDVKIYKI
Y:	"X WTPG PCS X UDGVTI. X HTT PCS X GTBTBQTG. X SD PCS X JCSTGHIPCS." --RDCUJHXJH
Z:	"W VSOF OBR W TCFUSH. W GSS OBR W FSASAPSF. W RC OBR W IBRSFGHOBR." --QCBTIGWIG

Caesar Cipher Solution #2: Frequency Analysis

Alternatively, statistics of the text could be used to find the most likely key. Let's look at the frequency of the letters and see what happens when you encrypt / decrypt with Caesar cipher.


In [5]:
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
%matplotlib inline
plt.rcParams["figure.figsize"] = [12,4]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

# determine letter frequency in plaintext, and plot
ptext_freq = [ptext.count(c)/len(ptext) for c in alphabet]
pfig = plt.figure()
plt.bar(range(len(alphabet)), ptext_freq, tick_label = list(alphabet), align = 'center', color = 'b')
plt.title('plaintext letter frequency')

# determine letter frequency in ciphertext, and plot
ctext_freq = [ctext.count(c)/len(ctext) for c in alphabet]
cfig = plt.figure()
plt.bar(range(len(alphabet)), ctext_freq, tick_label = list(alphabet), align = 'center', color = 'r')
plt.title('ciphertext letter frequency')


Out[5]:
<matplotlib.text.Text at 0x7f3113200310>

From these plots, it can clearly be seen that the Caesar cipher shifts the frequencies of the letters (which is what we would expect, knowing how Caesar works). Finding the right key now corresponds to finding that shift, which aligns the frequency plots as well as possible. This can be done in two ways:

  • by hand
  • automatically, using e.g. some criterion to measure how well the plots match.

In the next sections, we'll look at ways to find the alignment automatically. For now, let's align and determine the key manually by aligning the frequency plots. Use the slider to align the frequency plots.

Note: normally you would not have the frequency plot of the plaintext of course, and you would instead use the frequencies based on some sufficiently large text in the same language.


In [6]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def rotate(l, n):
    return l[n:] + l[:n]

@interact(key_i = (0,25,1))
def plot_graphs(key_i = 0):
    # decrypt ciphertext using this key
    key = alphabet[key_i]
    decrypted_ctext = rot(ctext, key, direction = -1)
    
    # determine letter frequency in plaintext, and plot # FIXME: base on separate text
    ptext_freq = np.array([ptext.count(c)/len(ptext) for c in alphabet])
    pfig = plt.subplot(2,1,1)
    plt.bar(range(len(alphabet)), ptext_freq, tick_label = list(alphabet), align = 'center', color = 'b')
    plt.title('blue = plaintext letter frequency, green = decrypted ciphertext letter frequency')

    # determine letter frequency in ciphertext, and plot
    ctext_freq = np.array([decrypted_ctext.count(c)/len(decrypted_ctext) for c in alphabet])
    cfig = plt.subplot(2,1,2)
    plt.bar(range(len(alphabet)), ctext_freq, tick_label = rotate(list(alphabet), key_i), align = 'center', color = 'g')
    plt.xlabel("key = %s" % key)
    
    plt.show()
    print decrypted_ctext


"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF

CHALLENGE: aligning frequency distributions to determine the Caesar key


Use the slider above to align the frequency distributions and determine the right key.

Caesar Cipher Solution #3: Chi-squared Statistic

We just saw that finding the right key consists of aligning the letter-frequency distributions. If we can measure how well aligned the distributions are, it can be done automatically. This is where the Chi-squared Statistic comes in: it measures how different an observed distribution is from an expected distribution:

$$ \chi^2(O,E) = \sum_{i=A}^Z \frac{(O_i - E_i)^2}{E_i} $$

where $O_i$ and $E_i$ are the observed and expected numbers of character $i$ in a text. It can also be expressed in terms of frequencies/probabilities:

$$ \chi^2 = N \sum_{i=A}^Z \frac{(O_i/N - p_i)^2}{p_i} = N \sum_{i=A}^Z \frac{(f_i - p_i)^2}{p_i} $$

where $f_i$ and $p_i$ are the observed and expected relative frequencies of the characters, and $N$ is the length of the text.

For more details see:

Now, let's use it to find the key:


In [7]:
import numpy as np

# english letter frequencies, estimated from small sample text # FIXME: replace by better estimates
g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]

def chi_square(f, g, N):
    chi2 = 0
    chi2 = N * np.sum(np.array([(ff-gg)**2/gg for (ff,gg) in zip(f,g)]))
    return chi2

In [8]:
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
results = []
for key in alphabet:
    decrypted_ctext = rot(ctext, key, direction = -1)
    f = np.array([decrypted_ctext.count(c) for c in alphabet])
    N = np.sum(f) # note: not counting the non-alphabet characters here
    f = f / N
    chi2_decrypted_ctext = chi_square(f, g_english, N)
    results.append((chi2_decrypted_ctext, key, decrypted_ctext))
    print "%s:\t%f\t%s" % (key, chi2_decrypted_ctext, decrypted_ctext)
print "\nthe best key and plaintext:\n %s" % repr(min(results))


A:	671.990916	"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF
B:	2943.592275	"U TQMD MZP U RADSQF. U EQQ MZP U DQYQYNQD. U PA MZP U GZPQDEFMZP." --OAZRGEUGE
C:	798.142246	"T SPLC LYO T QZCRPE. T DPP LYO T CPXPXMPC. T OZ LYO T FYOPCDELYO." --NZYQFDTFD
D:	1358.750922	"S ROKB KXN S PYBQOD. S COO KXN S BOWOWLOB. S NY KXN S EXNOBCDKXN." --MYXPECSEC
E:	1163.735004	"R QNJA JWM R OXAPNC. R BNN JWM R ANVNVKNA. R MX JWM R DWMNABCJWM." --LXWODBRDB
F:	2178.786534	"Q PMIZ IVL Q NWZOMB. Q AMM IVL Q ZMUMUJMZ. Q LW IVL Q CVLMZABIVL." --KWVNCAQCA
G:	1390.882770	"P OLHY HUK P MVYNLA. P ZLL HUK P YLTLTILY. P KV HUK P BUKLYZAHUK." --JVUMBZPBZ
H:	2661.988098	"O NKGX GTJ O LUXMKZ. O YKK GTJ O XKSKSHKX. O JU GTJ O ATJKXYZGTJ." --IUTLAYOAY
I:	3143.131572	"N MJFW FSI N KTWLJY. N XJJ FSI N WJRJRGJW. N IT FSI N ZSIJWXYFSI." --HTSKZXNZX
J:	386.845916	"M LIEV ERH M JSVKIX. M WII ERH M VIQIQFIV. M HS ERH M YRHIVWXERH." --GSRJYWMYW
K:	786.624788	"L KHDU DQG L IRUJHW. L VHH DQG L UHPHPEHU. L GR DQG L XQGHUVWDQG." --FRQIXVLXV
L:	609.812517	"K JGCT CPF K HQTIGV. K UGG CPF K TGOGODGT. K FQ CPF K WPFGTUVCPF." --EQPHWUKWU
M:	1615.214330	"J IFBS BOE J GPSHFU. J TFF BOE J SFNFNCFS. J EP BOE J VOEFSTUBOE." --DPOGVTJVT
N:	26.819408	"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS
O:	1891.387765	"H GDZQ ZMC H ENQFDS. H RDD ZMC H QDLDLADQ. H CN ZMC H TMCDQRSZMC." --BNMETRHTR
P:	393.888673	"G FCYP YLB G DMPECR. G QCC YLB G PCKCKZCP. G BM YLB G SLBCPQRYLB." --AMLDSQGSQ
Q:	1372.403291	"F EBXO XKA F CLODBQ. F PBB XKA F OBJBJYBO. F AL XKA F RKABOPQXKA." --ZLKCRPFRP
R:	3601.924018	"E DAWN WJZ E BKNCAP. E OAA WJZ E NAIAIXAN. E ZK WJZ E QJZANOPWJZ." --YKJBQOEQO
S:	4419.479244	"D CZVM VIY D AJMBZO. D NZZ VIY D MZHZHWZM. D YJ VIY D PIYZMNOVIY." --XJIAPNDPN
T:	1457.857145	"C BYUL UHX C ZILAYN. C MYY UHX C LYGYGVYL. C XI UHX C OHXYLMNUHX." --WIHZOMCOM
U:	2384.901200	"B AXTK TGW B YHKZXM. B LXX TGW B KXFXFUXK. B WH TGW B NGWXKLMTGW." --VHGYNLBNL
V:	1215.392741	"A ZWSJ SFV A XGJYWL. A KWW SFV A JWEWETWJ. A VG SFV A MFVWJKLSFV." --UGFXMKAMK
W:	3726.089115	"Z YVRI REU Z WFIXVK. Z JVV REU Z IVDVDSVI. Z UF REU Z LEUVIJKREU." --TFEWLJZLJ
X:	521.313193	"Y XUQH QDT Y VEHWUJ. Y IUU QDT Y HUCUCRUH. Y TE QDT Y KDTUHIJQDT." --SEDVKIYKI
Y:	1895.618815	"X WTPG PCS X UDGVTI. X HTT PCS X GTBTBQTG. X SD PCS X JCSTGHIPCS." --RDCUJHXJH
Z:	130.677063	"W VSOF OBR W TCFUSH. W GSS OBR W FSASAPSF. W RC OBR W IBRSFGHOBR." --QCBTIGWIG

the best key and plaintext:
 (26.819407968565411, 'N', '"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS')

Caesar Cipher Solution #4: Maximum likelihood

TL;DR: for a given decoding, estimate how much like English (or another language) it is. Best key is the one that gives the decoding that is most like English (or another language)...

The final way to solve is to, for each possible key, compute the likelihood that the decrypted text is plaintext in a particular language. This of course is unnecessary for something as simple as the Caesar cipher, but now think of a cipher where you would have thousands or millions of possible keys...

For now, let's focus on understanding how it works. Spoiler: in a next section, this approach will be used to automatically determine the key for Kryptos and decrypted the K1 and K2 parts of it.

In the following we will assume this to be English, but it could just as well be a different language with different statistics. This is something that you should decide or guess from context.

Computing likelihood based on observed n-grams

In order to measure how much like English a certain string is, we need to compare its properties with strings from the English language. The criterion that will be used here to measure this will be so-called n-gram statistics, which in short, measuring for each sequence of n symbols, what is the frequency / probability that it appears in an English text.

Then, the likelihood that the observed decrypted ciphertext is an English text, is the joint probability of the observing the ngrams in English. Assuming the observations are independent and identically distributed, the joint probability of the observed ngrams in the decrypted ciphertext as follows:

$$ \mathcal{L}(English \mid decrypt(ctext, key)) = p(decrypt(ctext, key)) \mid English) = \prod_i p(ngram_i \mid English) $$

where $ngram_i$ is the $i^{th}$ sequence of $n$ letters in the decrypted ciphertext. For example, in case $decrypt(ctext, key) = \texttt{ANEXAMPLETEXT}$ and using 3-grams, the observed 3-grams are

$$ \texttt{ANE, NEX, EXA, XAM, AMP, MPL, PLE, LET, ETE, TEX, EXT}, $$

and therefore the likelihood that it is an English text can be computed as

$$ \begin{align} \mathcal{L}(English \mid \texttt{ANEXAMPLETEXT}) &= p(\texttt{ANEXAMPLETEXT} \mid English) \\ &= p(\texttt{ANE} \mid English) \cdot p(\texttt{NEX} \mid English) \cdot \ldots \cdot p(\texttt{TEX} \mid English) \cdot p(\texttt{EXT} \mid English). \end{align} $$

Since the numbers involved typically get very small, people often work with the log-probabilities instead:

$$ \begin{align} \log \mathcal{L}(English \mid decrypt(ctext, key)) &= \log \prod_i p(ngram_i \mid English)\\ &= \sum_i \log p(ngram_i \mid English) \end{align} $$

This so-called log-likelihood that the decrypted ctext is English text (i.e. $\log \mathcal{L}(English \mid decrypt(ctext, key))$) can now be used as a criterion to determine the best key. However, we need to estimate the probabilities of certain ngrams appearing in the English language (i.e. all $\log p(ngram_i \mid English)$). For this, we analyze some reference text (the longer the better).

Computing n-gram statistics for a language

Let's define a function that takes in a text, and returns its ngrams.


In [9]:
import re

def find_ngrams(text, n):
    text = re.sub(r'[^A-Z]', r'', text.upper()) #remove everything that is not A-Z
    return zip(*[text[i:] for i in range(n)]) #determine ngrams

The above function takes a text and a number n and returns that text as ngrams:

  • text[i:] for i in range(3) would be
    • $\texttt{ANEXAMPLETEXT}$
    • $\texttt{NEXAMPLETEXT}$
    • $\texttt{EXAMPLETEXT}$
  • *[...] passes these as individual arguments into the zip() function
  • zip() zips the lists together and returns a list of tuples of the 1st entries in the lists, the 2nd entries in the list, etc until one of the lists runs out:
    • $\texttt{(A,N,E)}$
    • $\texttt{(N,E,X)}$
    • ...
    • $\texttt{(E,X,T)}$

Now, using this function, we can easily create a file of ngrams and their counts in some reference text. These statistics are later used to estimate the likelihood that another string is from the same language.


In [10]:
def compute_ngram_statistics(infile, outfile, n):
    
    # read in file, and remove newlines
    s = ''
    print "reading %s..." % infile
    with open(infile) as f:
         s += " ".join(line.strip() for line in f)
    
    print "finding and counting %d-grams..." % n
    # compute ngrams
    qgs = find_ngrams(s, n)
    # count them
    d = {}
    for qg in qgs:
        if d.has_key("".join(qg)):
            d["".join(qg)]+=1
        else:
            d["".join(qg)]=1

    # convert to sorted list and write to file
    print "writing results to %s..." % outfile
    qg_list = sorted(list(d.items()), key=lambda x: x[1], reverse=True)
    f = open(outfile, 'w')
    for t in qg_list:
        f.write("%s %d\n" % t)
        
    print "done!"

In [11]:
compute_ngram_statistics('ngrams/gutenberg_sherlock.txt', 'ngrams/en_sherlock_1grams', 1)
compute_ngram_statistics('ngrams/gutenberg_sherlock.txt', 'ngrams/en_sherlock_2grams', 2)
compute_ngram_statistics('ngrams/gutenberg_sherlock.txt', 'ngrams/en_sherlock_3grams', 3)
compute_ngram_statistics('ngrams/gutenberg_sherlock.txt', 'ngrams/en_sherlock_4grams', 4)


reading ngrams/gutenberg_sherlock.txt...
finding and counting 1-grams...
writing results to ngrams/en_sherlock_1grams...
done!
reading ngrams/gutenberg_sherlock.txt...
finding and counting 2-grams...
writing results to ngrams/en_sherlock_2grams...
done!
reading ngrams/gutenberg_sherlock.txt...
finding and counting 3-grams...
writing results to ngrams/en_sherlock_3grams...
done!
reading ngrams/gutenberg_sherlock.txt...
finding and counting 4-grams...
writing results to ngrams/en_sherlock_4grams...
done!

Cracking Caesar Cipher using Maximum Likelihood

Finally, let's crack Caesar using this. We use function ngram_score (source: practicalcryptography.com) for computing the log-likelihood that a text is a particular language, given a file with ngram counts for that language.


In [12]:
from ngram_score import ngram_score
fitness = ngram_score('ngrams/en_sherlock_1grams') # load our ngram statistics

alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
results = []
for key in alphabet:
    decrypted_ctext = rot(ctext, key, direction = -1)
    ll_decrypted_ctext = fitness.score(decrypted_ctext)
    results.append((ll_decrypted_ctext, key, decrypted_ctext))
    print "%s:\t%f\t%s" % (key, ll_decrypted_ctext, decrypted_ctext)
print "\nthe most likely key and plaintext:\n %s" % repr(max(results))


A:	-261.218311	"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF
B:	-275.005765	"U TQMD MZP U RADSQF. U EQQ MZP U DQYQYNQD. U PA MZP U GZPQDEFMZP." --OAZRGEUGE
C:	-261.293064	"T SPLC LYO T QZCRPE. T DPP LYO T CPXPXMPC. T OZ LYO T FYOPCDELYO." --NZYQFDTFD
D:	-260.411338	"S ROKB KXN S PYBQOD. S COO KXN S BOWOWLOB. S NY KXN S EXNOBCDKXN." --MYXPECSEC
E:	-263.157676	"R QNJA JWM R OXAPNC. R BNN JWM R ANVNVKNA. R MX JWM R DWMNABCJWM." --LXWODBRDB
F:	-276.266855	"Q PMIZ IVL Q NWZOMB. Q AMM IVL Q ZMUMUJMZ. Q LW IVL Q CVLMZABIVL." --KWVNCAQCA
G:	-266.939602	"P OLHY HUK P MVYNLA. P ZLL HUK P YLTLTILY. P KV HUK P BUKLYZAHUK." --JVUMBZPBZ
H:	-271.285500	"O NKGX GTJ O LUXMKZ. O YKK GTJ O XKSKSHKX. O JU GTJ O ATJKXYZGTJ." --IUTLAYOAY
I:	-271.299750	"N MJFW FSI N KTWLJY. N XJJ FSI N WJRJRGJW. N IT FSI N ZSIJWXYFSI." --HTSKZXNZX
J:	-257.016117	"M LIEV ERH M JSVKIX. M WII ERH M VIQIQFIV. M HS ERH M YRHIVWXERH." --GSRJYWMYW
K:	-267.452924	"L KHDU DQG L IRUJHW. L VHH DQG L UHPHPEHU. L GR DQG L XQGHUVWDQG." --FRQIXVLXV
L:	-265.559114	"K JGCT CPF K HQTIGV. K UGG CPF K TGOGODGT. K FQ CPF K WPFGTUVCPF." --EQPHWUKWU
M:	-259.083646	"J IFBS BOE J GPSHFU. J TFF BOE J SFNFNCFS. J EP BOE J VOEFSTUBOE." --DPOGVTJVT
N:	-238.235661	"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS
O:	-263.915876	"H GDZQ ZMC H ENQFDS. H RDD ZMC H QDLDLADQ. H CN ZMC H TMCDQRSZMC." --BNMETRHTR
P:	-266.795598	"G FCYP YLB G DMPECR. G QCC YLB G PCKCKZCP. G BM YLB G SLBCPQRYLB." --AMLDSQGSQ
Q:	-271.291470	"F EBXO XKA F CLODBQ. F PBB XKA F OBJBJYBO. F AL XKA F RKABOPQXKA." --ZLKCRPFRP
R:	-272.319535	"E DAWN WJZ E BKNCAP. E OAA WJZ E NAIAIXAN. E ZK WJZ E QJZANOPWJZ." --YKJBQOEQO
S:	-273.790470	"D CZVM VIY D AJMBZO. D NZZ VIY D MZHZHWZM. D YJ VIY D PIYZMNOVIY." --XJIAPNDPN
T:	-264.146556	"C BYUL UHX C ZILAYN. C MYY UHX C LYGYGVYL. C XI UHX C OHXYLMNUHX." --WIHZOMCOM
U:	-270.341111	"B AXTK TGW B YHKZXM. B LXX TGW B KXFXFUXK. B WH TGW B NGWXKLMTGW." --VHGYNLBNL
V:	-267.596702	"A ZWSJ SFV A XGJYWL. A KWW SFV A JWEWETWJ. A VG SFV A MFVWJKLSFV." --UGFXMKAMK
W:	-272.823051	"Z YVRI REU Z WFIXVK. Z JVV REU Z IVDVDSVI. Z UF REU Z LEUVIJKREU." --TFEWLJZLJ
X:	-261.034143	"Y XUQH QDT Y VEHWUJ. Y IUU QDT Y HUCUCRUH. Y TE QDT Y KDTUHIJQDT." --SEDVKIYKI
Y:	-265.728144	"X WTPG PCS X UDGVTI. X HTT PCS X GTBTBQTG. X SD PCS X JCSTGHIPCS." --RDCUJHXJH
Z:	-251.551499	"W VSOF OBR W TCFUSH. W GSS OBR W FSASAPSF. W RC OBR W IBRSFGHOBR." --QCBTIGWIG

the most likely key and plaintext:
 (-238.2356611472394, 'N', '"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS')

Perhaps a bit underwhelming to use this for cracking the Caesar cipher. However, now we have a useful tool in hand that we can use for other more complicated ciphers as well, like the Vigenere cipher which we will now look at.


Vigenere

Now we will move on to the Vigenere cipher, which is very similar to the Caesar cipher, except that the consecutive shifts of the plaintext characters are determined by a keyword. If the key has length $N$, then each $N^{th}$ character of the plaintext will be shifted by the same shift, determined by the corresponding character in the key. This relation of Vigenere being N interleaved Caesar ciphers is also visible from the code:


In [13]:
def vigenere(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", direction=1):
    t = []
    key_i = 0
    for i in range(len(s)):
        if s[i] in alphabet: # only process symbols from the specified alphabet
            t.append(rot(s[i],key[key_i], alphabet, direction))
            key_i = (key_i + 1) % len(key)
        else:
            t.append(s[i])
    return "".join(t)

With Caesar we had:


In [14]:
ptext = '"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS'
ctext = rot(ptext, 'N')
ctext


Out[14]:
'"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF'

which can be seen as:

"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS
 N NNNN NNN N NNNNNN  N NNN NNN N NNNNNNNN  N NN NNN N NNNNNNNNNN     NNNNNNNNN
 ------------------------------------------------------------------------------
"V URNE NAQ V SBETRG. V FRR NAQ V ERZRZORE. V QB NAQ V HAQREFGNAQ." --PBASHFVHF

Whereas, for Vigenere we have:


In [15]:
ptext = '"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS'
ctext = vigenere(ptext, 'NO')
ctext


Out[15]:
'"V VROE OAR V TBFTSG. W FSR OAR V FRARAOSE. W QC NBQ W HBQSEGGOAR." --PCATHGVIF'

which can be seen as:

"I HEAR AND I FORGET. I SEE AND I REMEMBER. I DO AND I UNDERSTAND." --CONFUSIUS
 N ONON ONO N ONONON  O NON ONO N ONONONON  O NO NON O NONONONONO     NONONONON
 ------------------------------------------------------------------------------
"V VROE OAR V TBFTSG. W FSR OAR V FRARAOSE. W QC NBQ W HBQSEGGOAR." --PCATHGVIF

Vigenere Solution #0: Manually Cracking a Vigenere Cipher

Since Vigenere is so similar to the Caesar cipher, we can re-use many of the same principles which we already saw for solving the Caesar cipher. However, before we can attack the Vigenere cipher in the same way as the Caesar cipher, we need to guess the length of the key. Again, for this we can use the frequency distributions, however, now we look at the frequency distribution of every $N^{th}$, since each $N^{th}$ character is shifted with the same key. In summary, we are going to try different key lengths and see for which key length, the N different frequency distributions look most like shifted English distributions. The technical way to measure this is through the Index of Coincidence (IC), which characterizes piece of text by the likelihood of drawing a pair of identical letters at random from that text.

CHALLENGE: Manually Cracking a Vigenere Cipher


Before moving on to the IC, an alternative more visual approach for determining key length is used on http://www.simonsingh.net/The_Black_Chamber/vigenere_cracking_tool.html, which finds common letter combinations and how far they are apart in the text. Since these **distances between common letter combinations are likely a multiple of the key length**, investigating the common factors of these distances will give an indication of the likely key length.

Use the vigenere cracking tool at the above site to crack the following vigenere ciphertext:
YFMRFDYCYRBEEXEBKRTTKMBRYJKEQNEHXVLXHDMRIRBEEGLNWVKXDUGRSVZIVCUNWVWVLSEAJIMXDNLRYTWRVUSGXFNXKQAYUYIFHFWENKBIQAUGYNMRWKSVCKQQHEIAIZNJHDEAYIWAVQAPMRTTKMBRYJPMIFEQHPKPLOAYQPBSWTEYJWBGRYPNWVLXRFHRUIMZLAUFFCXLDNEGHFZVHEPBSUQRJFOGMVBAHZTLXZFTRESVGCMEHEAEHZXLHDSGIZNJHDEAYGWMQFSVSKPIHZCEDGBMRZPETTMWVFHRHZXLHDUFJJIHLRFRWVVXDXPUFSMXIDOZTEMSIFHRWFEWKQAYUYIFHFUFJUIXHMCUUFQRWPECJELWRZAEJGMEWUNTPVGARDD
Note how similar this approach is to the principles used to crack the Caesar cipher.

Index of Coincidence (IC)

As mentioned, the technical way to measure how much a text is like a possibly shifted English letter frequency distribution, is through the Index of Coincidence (IC), which characterizes piece of text by the likelihood of drawing (without replacement) a pair of identical letters at random from that text:

$$ IC = \sum_{i=1}^c \big( \frac{n_i}{N} \times \frac{n_i-1}{N-1} \big) $$

where $n_i$ is the count of the $i^{th}$ character in the alphabet, $c$ is the number of characters in the alphabet and $N$ is the length of the text.

Each language has its own typical IC, and for N large enough you would get:

$$ IC = \sum_{i=1}^c f_i^2 $$

where $f_i$ is the relative frequency of the $i^{th}$ character in the alphabet of that language. As you can see from the formula, a nice property of the IC is it is not affected by a substitution cipher. Therefore, it can also be used to determine the language of the plaintext if you are only given the ciphertext of a substitution cipher, or whether you are even dealing with a substitution cipher.

For more details, see:

Determining Vigenere Key Length using IC

The IC can be used to determine the most likely key length, using the fact that it is higher for plaintext (or plaintext under a particular substitution) than for random text. To illustrate how this can be used for determining the key length, consider this as a representation of a ciphertext encrypted with a key of length 5:

ABCDEABCDEABCDEABCDEABCDEABCDEABCDE

where A represents a ciphertext character encrypted with the first character of the key, B the second, etc.

Not knowing the key length, we could try a key length of 3 or 4 and obtain:

supposed key length=3
ABC
DEA
BCD
EAB
CDE
ABC
DEA
BCD
EAB
CDE
ABC
DE

and

supposed key length=4
ABCD
EABC
DEAB
CDEA
BCDE
ABCD
EABC
DEAB
CDE

It can be seen that in each column the characters are encrypted using different letters of the key. Therefore the ciphertext characters in that column will be quite random and the IC will be low. However, when trying key length 5, we get

supposed key length=5
ABCDE
ABCDE
ABCDE
ABCDE
ABCDE
ABCDE
ABCDE

Each column will now correspond to a particular substitution / Caesar cipher corresponding to that key character, and the IC will be higher. Looking at average IC of all columns, will indicate the key length.

For more details, see:

Now, suppose we have the following ciphertext:

VYCHVUYPESJMZCJZTXNOOEFSXMBQJTTNQAXWKBWTPDDKTUODCOPGJFNTMOPRLACBZGTWEAEEEOKWAKCXCHVOATLGFMVYZRWBNGHPQUCDRKSGXSGWZRZWMURLSTLCHLZWBAECCIMEESTLLPWVNCCMYLELPKAAPTCZKYCTZQOIKGICRMEQTSPWMGHKFTYOHRDCUBAQEQWTVRBPCFKGPWGGFEQTZFSDWDVCCLOYVPBFWGPVFPLYRTMCWVSDCCIHSBLGCLPWTVKPBNVNRLAVWVPQTOEACSYJIUVVPBXSFARC

Let's try different key lengths and determine the average IC for each of them.


In [16]:
from __future__ import division
import numpy as np

# computes the IC for a given string
def IC(s, alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    s_filtered = filter(s, alphabet) # we only care about the letters in our alphabet
    ic = 0
    for c in alphabet:
        c_count = s_filtered.count(c)
        N = len(s_filtered)
        if c_count > 1:
            ic += (c_count * (c_count - 1)) / (N * (N - 1) / len(alphabet))
    return ic

# helper function to filter out non-alphabet characters
def filter(s, alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    return "".join([c for c in s if c in alphabet])

# computes the avg IC of subsequences of each n'th character
def mean_IC(s, n, alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    s_filtered = filter(s, alphabet) # we only care about the letters in our alphabet
    s_filtered_subseq = [s_filtered[i::n] for i in range(n)]
    ic = []
    for i in range(n):
        ic.append(IC(s_filtered_subseq[i]))
    return np.mean(ic)

In [17]:
ctext = "VYCHVUYPESJMZCJZTXNOOEFSXMBQJTTNQAXWKBWTPDDKTUODCOPGJFNTMOPRLACBZGTWEAEEEOKWAKCXCHVOATLGFMVYZRWBNGHPQUCDRKSGXSGWZRZWMURLSTLCHLZWBAECCIMEESTLLPWVNCCMYLELPKAAPTCZKYCTZQOIKGICRMEQTSPWMGHKFTYOHRDCUBAQEQWTVRBPCFKGPWGGFEQTZFSDWDVCCLOYVPBFWGPVFPLYRTMCWVSDCCIHSBLGCLPWTVKPBNVNRLAVWVPQTOEACSYJIUVVPBXSFARC"

In [18]:
from matplotlib import pyplot as plt
mean_ic_n = []
for n in range(1,32):
    mean_ic = mean_IC(ctext.upper(), n)
    mean_ic_n.append(mean_ic)
    print "%2d %02f" % (n, mean_ic)
plt.bar(range(1,len(mean_ic_n)+1), mean_ic_n, align = 'center')
plt.xlabel("key length")
plt.ylabel("average IC")


 1 1.083234
 2 1.086321
 3 1.098669
 4 1.049241
 5 1.086600
 6 1.102494
 7 1.089516
 8 1.068694
 9 0.965257
10 1.046831
11 1.029495
12 1.071087
13 1.818558
14 1.112678
15 1.079532
16 1.094513
17 1.195790
18 0.895697
19 1.145238
20 1.056190
21 1.186395
22 0.883117
23 0.992095
24 1.027778
25 1.049455
26 1.778788
27 0.826786
28 1.148052
29 1.003413
30 1.107407
31 1.337276
Out[18]:
<matplotlib.text.Text at 0x7f31127ba950>

Given the peaks at a key length of 13 and 26, it can be concluded that the key is likely 13 characters long.

Vigenere Solution #1: Chi-square Criterion for Automatically Determining the Best Key

Now we know that the key is likely 13 characters long, we can determine the most likely shifts for each of the 13 Caesar ciphers using the Chi-square criterion.


In [19]:
key_length = 13
for i in range(key_length):
    ctext_i = ctext[i::key_length]

    alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    chi2_k = []
    for key in alphabet:
        decrypted_ctext_i_k = rot(ctext_i, key, direction = -1)
        f = np.array([decrypted_ctext_i_k.count(c) for c in alphabet])
        N = np.sum(f) # note: not counting the non-alphabet characters here
        f = f / N
        chi2_decrypted_ctext_i_k = chi_square(f, g_english, N)
        chi2_k.append((chi2_decrypted_ctext_i_k, key, decrypted_ctext_i_k))
        print "%s:\t%f\t%s" % (key, chi2_decrypted_ctext_i_k, decrypted_ctext_i_k)
    print "the best key and plaintext:\n %s\n" % repr(min(chi2_k))


A:	174.321536	VCBTJGCYRUEVPGHQPDPDTVV
B:	135.161435	UBASIFBXQTDUOFGPOCOCSUU
C:	174.741152	TAZRHEAWPSCTNEFONBNBRTT
D:	749.974994	SZYQGDZVORBSMDENMAMAQSS
E:	749.753363	RYXPFCYUNQARLCDMLZLZPRR
F:	1023.499890	QXWOEBXTMPZQKBCLKYKYOQQ
G:	1064.201422	PWVNDAWSLOYPJABKJXJXNPP
H:	822.112075	OVUMCZVRKNXOIZAJIWIWMOO
I:	294.183987	NUTLBYUQJMWNHYZIHVHVLNN
J:	445.469333	MTSKAXTPILVMGXYHGUGUKMM
K:	601.732984	LSRJZWSOHKULFWXGFTFTJLL
L:	405.510551	KRQIYVRNGJTKEVWFESESIKK
M:	1436.389441	JQPHXUQMFISJDUVEDRDRHJJ
N:	122.635165	IPOGWTPLEHRICTUDCQCQGII
O:	100.114310	HONFVSOKDGQHBSTCBPBPFHH
P:	137.109864	GNMEURNJCFPGARSBAOAOEGG
Q:	1535.160590	FMLDTQMIBEOFZQRAZNZNDFF
R:	224.235327	ELKCSPLHADNEYPQZYMYMCEE
S:	1039.282345	DKJBROKGZCMDXOPYXLXLBDD
T:	516.142405	CJIAQNJFYBLCWNOXWKWKACC
U:	1101.747422	BIHZPMIEXAKBVMNWVJVJZBB
V:	262.609220	AHGYOLHDWZJAULMVUIUIYAA
W:	2900.479089	ZGFXNKGCVYIZTKLUTHTHXZZ
X:	454.418672	YFEWMJFBUXHYSJKTSGSGWYY
Y:	1342.655748	XEDVLIEATWGXRIJSRFRFVXX
Z:	433.868297	WDCUKHDZSVFWQHIRQEQEUWW
the best key and plaintext:
 (100.11430974338249, 'O', 'HONFVSOKDGQHBSTCBPBPFHH')

A:	453.233158	YJQPFTXZKRCNTIKEWVVCVPV
B:	439.889463	XIPOESWYJQBMSHJDVUUBUOU
C:	89.543826	WHONDRVXIPALRGICUTTATNT
D:	760.551356	VGNMCQUWHOZKQFHBTSSZSMS
E:	110.106465	UFMLBPTVGNYJPEGASRRYRLR
F:	921.946378	TELKAOSUFMXIODFZRQQXQKQ
G:	538.865699	SDKJZNRTELWHNCEYQPPWPJP
H:	221.746522	RCJIYMQSDKVGMBDXPOOVOIO
I:	197.791216	QBIHXLPRCJUFLACWONNUNHN
J:	303.773651	PAHGWKOQBITEKZBVNMMTMGM
K:	495.639937	OZGFVJNPAHSDJYAUMLLSLFL
L:	994.094328	NYFEUIMOZGRCIXZTLKKRKEK
M:	1448.868400	MXEDTHLNYFQBHWYSKJJQJDJ
N:	431.501269	LWDCSGKMXEPAGVXRJIIPICI
O:	318.824218	KVCBRFJLWDOZFUWQIHHOHBH
P:	183.019654	JUBAQEIKVCNYETVPHGGNGAG
Q:	820.546579	ITAZPDHJUBMXDSUOGFFMFZF
R:	169.594542	HSZYOCGITALWCRTNFEELEYE
S:	595.657600	GRYXNBFHSZKVBQSMEDDKDXD
T:	436.684515	FQXWMAEGRYJUAPRLDCCJCWC
U:	866.697138	EPWVLZDFQXITZOQKCBBIBVB
V:	118.950672	DOVUKYCEPWHSYNPJBAAHAUA
W:	2901.245597	CNUTJXBDOVGRXMOIAZZGZTZ
X:	213.717794	BMTSIWACNUFQWLNHZYYFYSY
Y:	1438.809512	ALSRHVZBMTEPVKMGYXXEXRX
Z:	468.588210	ZKRQGUYALSDOUJLFXWWDWQW
the best key and plaintext:
 (89.543826185504713, 'C', 'WHONDRVXIPALRGICUTTATNT')

A:	408.541711	CZJDNWCRSLCCCCFQGCFCKQP
B:	304.935705	BYICMVBQRKBBBBEPFBEBJPO
C:	212.584744	AXHBLUAPQJAAAADOEADAION
D:	10026.034558	ZWGAKTZOPIZZZZCNDZCZHNM
E:	332.483790	YVFZJSYNOHYYYYBMCYBYGML
F:	5013.625246	XUEYIRXMNGXXXXALBXAXFLK
G:	1026.758569	WTDXHQWLMFWWWWZKAWZWEKJ
H:	664.457528	VSCWGPVKLEVVVVYJZVYVDJI
I:	472.704359	URBVFOUJKDUUUUXIYUXUCIH
J:	215.485289	TQAUENTIJCTTTTWHXTWTBHG
K:	220.624176	SPZTDMSHIBSSSSVGWSVSAGF
L:	197.086591	ROYSCLRGHARRRRUFVRURZFE
M:	1784.333331	QNXRBKQFGZQQQQTEUQTQYED
N:	289.248260	PMWQAJPEFYPPPPSDTPSPXDC
O:	259.162427	OLVPZIODEXOOOORCSOROWCB
P:	155.400269	NKUOYHNCDWNNNNQBRNQNVBA
Q:	470.566199	MJTNXGMBCVMMMMPAQMPMUAZ
R:	701.650370	LISMWFLABULLLLOZPLOLTZY
S:	1402.325891	KHRLVEKZATKKKKNYOKNKSYX
T:	5504.038479	JGQKUDJYZSJJJJMXNJMJRXW
U:	216.868525	IFPJTCIXYRIIIILWMILIQWV
V:	275.949952	HEOISBHWXQHHHHKVLHKHPVU
W:	552.924694	GDNHRAGVWPGGGGJUKGJGOUT
X:	374.390860	FCMGQZFUVOFFFFITJFIFNTS
Y:	28.082109	EBLFPYETUNEEEEHSIEHEMSR
Z:	213.776851	DAKEOXDSTMDDDDGRHDGDLRQ
the best key and plaintext:
 (28.08210935339384, 'Y', 'EBLFPYETUNEEEEHSIEHEMSR')

A:	195.328968	HTTDTEHWGSICZRTWGCPIPTB
B:	76.854667	GSSCSDGVFRHBYQSVFBOHOSA
C:	288.790065	FRRBRCFUEQGAXPRUEANGNRZ
D:	1238.225649	EQQAQBETDPFZWOQTDZMFMQY
E:	284.850055	DPPZPADSCOEYVNPSCYLELPX
F:	561.041290	COOYOZCRBNDXUMORBXKDKOW
G:	514.677126	BNNXNYBQAMCWTLNQAWJCJNV
H:	791.944733	AMMWMXAPZLBVSKMPZVIBIMU
I:	757.405810	ZLLVLWZOYKAURJLOYUHAHLT
J:	1504.771990	YKKUKVYNXJZTQIKNXTGZGKS
K:	2281.163715	XJJTJUXMWIYSPHJMWSFYFJR
L:	363.298521	WIISITWLVHXROGILVREXEIQ
M:	226.477437	VHHRHSVKUGWQNFHKUQDWDHP
N:	436.548016	UGGQGRUJTFVPMEGJTPCVCGO
O:	79.139446	TFFPFQTISEUOLDFISOBUBFN
P:	30.965407	SEEOEPSHRDTNKCEHRNATAEM
Q:	846.856643	RDDNDORGQCSMJBDGQMZSZDL
R:	150.223946	QCCMCNQFPBRLIACFPLYRYCK
S:	782.597247	PBBLBMPEOAQKHZBEOKXQXBJ
T:	512.436507	OAAKALODNZPJGYADNJWPWAI
U:	4096.082864	NZZJZKNCMYOIFXZCMIVOVZH
V:	212.511057	MYYIYJMBLXNHEWYBLHUNUYG
W:	2044.716073	LXXHXILAKWMGDVXAKGTMTXF
X:	1077.120620	KWWGWHKZJVLFCUWZJFSLSWE
Y:	456.647460	JVVFVGJYIUKEBTVYIERKRVD
Z:	744.458223	IUUEUFIXHTJDASUXHDQJQUC
the best key and plaintext:
 (30.965407128631636, 'P', 'SEEOEPSHRDTNKCEHRNATAEM')

A:	825.129934	VXTKMAVBXTMMKMYTFLLHBOX
B:	657.855633	UWSJLZUAWSLLJLXSEKKGANW
C:	1252.566512	TVRIKYTZVRKKIKWRDJJFZMV
D:	1552.278202	SUQHJXSYUQJJHJVQCIIEYLU
E:	366.918089	RTPGIWRXTPIIGIUPBHHDXKT
F:	223.356199	QSOFHVQWSOHHFHTOAGGCWJS
G:	234.269118	PRNEGUPVRNGGEGSNZFFBVIR
H:	261.016559	OQMDFTOUQMFFDFRMYEEAUHQ
I:	287.968789	NPLCESNTPLEECEQLXDDZTGP
J:	204.046301	MOKBDRMSOKDDBDPKWCCYSFO
K:	827.942804	LNJACQLRNJCCACOJVBBXREN
L:	852.423430	KMIZBPKQMIBBZBNIUAAWQDM
M:	969.180131	JLHYAOJPLHAAYAMHTZZVPCL
N:	3004.553105	IKGXZNIOKGZZXZLGSYYUOBK
O:	1071.611202	HJFWYMHNJFYYWYKFRXXTNAJ
P:	1538.916003	GIEVXLGMIEXXVXJEQWWSMZI
Q:	95.413161	FHDUWKFLHDWWUWIDPVVRLYH
R:	326.939605	EGCTVJEKGCVVTVHCOUUQKXG
S:	370.187224	DFBSUIDJFBUUSUGBNTTPJWF
T:	14.003307	CEARTHCIEATTRTFAMSSOIVE
U:	1532.422321	BDZQSGBHDZSSQSEZLRRNHUD
V:	151.555577	ACYPRFAGCYRRPRDYKQQMGTC
W:	1813.749607	ZBXOQEZFBXQQOQCXJPPLFSB
X:	68.287552	YAWNPDYEAWPPNPBWIOOKERA
Y:	1850.970784	XZVMOCXDZVOOMOAVHNNJDQZ
Z:	201.572386	WYULNBWCYUNNLNZUGMMICPY
the best key and plaintext:
 (14.003306651079868, 'T', 'CEARTHCIEATTRTFAMSSOIVE')

A:	28.485066	UNNTOEONSLEYYEOVEOYSNES
B:	784.964487	TMMSNDNMRKDXXDNUDNXRMDR
C:	374.195903	SLLRMCMLQJCWWCMTCMWQLCQ
D:	431.623324	RKKQLBLKPIBVVBLSBLVPKBP
E:	1575.947341	QJJPKAKJOHAUUAKRAKUOJAO
F:	5183.609193	PIIOJZJINGZTTZJQZJTNIZN
G:	74.357072	OHHNIYIHMFYSSYIPYISMHYM
H:	2030.910377	NGGMHXHGLEXRRXHOXHRLGXL
I:	514.218578	MFFLGWGFKDWQQWGNWGQKFWK
J:	826.637694	LEEKFVFEJCVPPVFMVFPJEVJ
K:	143.568420	KDDJEUEDIBUOOUELUEOIDUI
L:	144.977727	JCCIDTDCHATNNTDKTDNHCTH
M:	345.112943	IBBHCSCBGZSMMSCJSCMGBSG
N:	83.921243	HAAGBRBAFYRLLRBIRBLFARF
O:	3343.373568	GZZFAQAZEXQKKQAHQAKEZQE
P:	3269.164314	FYYEZPZYDWPJJPZGPZJDYPD
Q:	1278.208874	EXXDYOYXCVOIIOYFOYICXOC
R:	1325.812760	DWWCXNXWBUNHHNXENXHBWNB
S:	158.674775	CVVBWMWVATMGGMWDMWGAVMA
T:	1502.021375	BUUAVLVUZSLFFLVCLVFZULZ
U:	637.781036	ATTZUKUTYRKEEKUBKUEYTKY
V:	2845.557349	ZSSYTJTSXQJDDJTAJTDXSJX
W:	279.609051	YRRXSISRWPICCISZISCWRIW
X:	544.407095	XQQWRHRQVOHBBHRYHRBVQHV
Y:	578.026724	WPPVQGQPUNGAAGQXGQAUPGU
Z:	1480.458917	VOOUPFPOTMFZZFPWFPZTOFT
the best key and plaintext:
 (28.485065754741353, 'A', 'UNNTOEONSLEYYEOVEOYSNES')

A:	237.053907	YOQUPEAGGCELCQHRQYRBVAF
B:	1077.530177	XNPTODZFFBDKBPGQPXQAUZE
C:	253.060179	WMOSNCYEEACJAOFPOWPZTYD
D:	958.796718	VLNRMBXDDZBIZNEONVOYSXC
E:	152.847363	UKMQLAWCCYAHYMDNMUNXRWB
F:	1089.609759	TJLPKZVBBXZGXLCMLTMWQVA
G:	417.184395	SIKOJYUAAWYFWKBLKSLVPUZ
H:	1712.998298	RHJNIXTZZVXEVJAKJRKUOTY
I:	656.908153	QGIMHWSYYUWDUIZJIQJTNSX
J:	342.573311	PFHLGVRXXTVCTHYIHPISMRW
K:	236.859457	OEGKFUQWWSUBSGXHGOHRLQV
L:	160.178130	NDFJETPVVRTARFWGFNGQKPU
M:	344.485091	MCEIDSOUUQSZQEVFEMFPJOT
N:	20.159949	LBDHCRNTTPRYPDUEDLEOINS
O:	262.412759	KACGBQMSSOQXOCTDCKDNHMR
P:	528.349439	JZBFAPLRRNPWNBSCBJCMGLQ
Q:	342.263963	IYAEZOKQQMOVMARBAIBLFKP
R:	1845.200309	HXZDYNJPPLNULZQAZHAKEJO
S:	878.521549	GWYCXMIOOKMTKYPZYGZJDIN
T:	1031.661350	FVXBWLHNNJLSJXOYXFYICHM
U:	423.183312	EUWAVKGMMIKRIWNXWEXHBGL
V:	556.267011	DTVZUJFLLHJQHVMWVDWGAFK
W:	335.601086	CSUYTIEKKGIPGULVUCVFZEJ
X:	429.941731	BRTXSHDJJFHOFTKUTBUEYDI
Y:	196.512028	AQSWRGCIIEGNESJTSATDXCH
Z:	680.520635	ZPRVQFBHHDFMDRISRZSCWBG
the best key and plaintext:
 (20.159949441418178, 'N', 'LBDHCRNTTPRYPDUEDLEOINS')

A:	17.254725	POAORETHWHSETTRBTVTLNCA
B:	767.933941	ONZNQDSGVGRDSSQASUSKMBZ
C:	286.844216	NMYMPCRFUFQCRRPZRTRJLAY
D:	1091.678030	MLXLOBQETEPBQQOYQSQIKZX
E:	277.207516	LKWKNAPDSDOAPPNXPRPHJYW
F:	1077.293351	KJVJMZOCRCNZOOMWOQOGIXV
G:	135.695235	JIUILYNBQBMYNNLVNPNFHWU
H:	437.882563	IHTHKXMAPALXMMKUMOMEGVT
I:	999.662539	HGSGJWLZOZKWLLJTLNLDFUS
J:	551.437917	GFRFIVKYNYJVKKISKMKCETR
K:	2360.671088	FEQEHUJXMXIUJJHRJLJBDSQ
L:	76.595444	EDPDGTIWLWHTIIGQIKIACRP
M:	330.252179	DCOCFSHVKVGSHHFPHJHZBQO
N:	174.373830	CBNBERGUJUFRGGEOGIGYAPN
O:	380.410634	BAMADQFTITEQFFDNFHFXZOM
P:	645.916058	AZLZCPESHSDPEECMEGEWYNL
Q:	354.744649	ZYKYBODRGRCODDBLDFDVXMK
R:	756.636367	YXJXANCQFQBNCCAKCECUWLJ
S:	883.873970	XWIWZMBPEPAMBBZJBDBTVKI
T:	263.336449	WVHVYLAODOZLAAYIACASUJH
U:	4304.916511	VUGUXKZNCNYKZZXHZBZRTIG
V:	463.874087	UTFTWJYMBMXJYYWGYAYQSHF
W:	2119.324291	TSESVIXLALWIXXVFXZXPRGE
X:	320.144818	SRDRUHWKZKVHWWUEWYWOQFD
Y:	561.832829	RQCQTGVJYJUGVVTDVXVNPEC
Z:	146.132659	QPBPSFUIXITFUUSCUWUMODB
the best key and plaintext:
 (17.254724617960768, 'A', 'POAORETHWHSETTRBTVTLNCA')

A:	1518.147850	EEXDLOLPZLTLZSDPZPMGRSR
B:	401.687423	DDWCKNKOYKSKYRCOYOLFQRQ
C:	2077.031150	CCVBJMJNXJRJXQBNXNKEPQP
D:	155.518555	BBUAILIMWIQIWPAMWMJDOPO
E:	693.278719	AATZHKHLVHPHVOZLVLICNON
F:	934.698989	ZZSYGJGKUGOGUNYKUKHBMNM
G:	1051.620416	YYRXFIFJTFNFTMXJTJGALML
H:	581.593732	XXQWEHEISEMESLWISIFZKLK
I:	434.902073	WWPVDGDHRDLDRKVHRHEYJKJ
J:	671.143548	VVOUCFCGQCKCQJUGQGDXIJI
K:	154.431495	UUNTBEBFPBJBPITFPFCWHIH
L:	29.272773	TTMSADAEOAIAOHSEOEBVGHG
M:	2530.418559	SSLRZCZDNZHZNGRDNDAUFGF
N:	316.074476	RRKQYBYCMYGYMFQCMCZTEFE
O:	1457.060617	QQJPXAXBLXFXLEPBLBYSDED
P:	442.037957	PPIOWZWAKWEWKDOAKAXRCDC
Q:	2183.230785	OOHNVYVZJVDVJCNZJZWQBCB
R:	123.737857	NNGMUXUYIUCUIBMYIYVPABA
S:	1351.844455	MMFLTWTXHTBTHALXHXUOZAZ
T:	757.573228	LLEKSVSWGSASGZKWGWTNYZY
U:	894.325538	KKDJRURVFRZRFYJVFVSMXYX
V:	1016.391684	JJCIQTQUEQYQEXIUEURLWXW
W:	176.443622	IIBHPSPTDPXPDWHTDTQKVWV
X:	118.838083	HHAGOROSCOWOCVGSCSPJUVU
Y:	230.962153	GGZFNQNRBNVNBUFRBROITUT
Z:	252.867796	FFYEMPMQAMUMATEQAQNHSTS
the best key and plaintext:
 (29.272773236728021, 'L', 'TTMSADAEOAIAOHSEOEBVGHG')

A:	315.083878	SFWCAKGQRZLPQPCCFBCCLYC
B:	510.430791	REVBZJFPQYKOPOBBEABBKXB
C:	586.526917	QDUAYIEOPXJNONAADZAAJWA
D:	5714.545030	PCTZXHDNOWIMNMZZCYZZIVZ
E:	148.273401	OBSYWGCMNVHLMLYYBXYYHUY
F:	2897.281172	NARXVFBLMUGKLKXXAWXXGTX
G:	1119.907662	MZQWUEAKLTFJKJWWZVWWFSW
H:	582.829518	LYPVTDZJKSEIJIVVYUVVERV
I:	468.206860	KXOUSCYIJRDHIHUUXTUUDQU
J:	216.361762	JWNTRBXHIQCGHGTTWSTTCPT
K:	81.869909	IVMSQAWGHPBFGFSSVRSSBOS
L:	206.918529	HULRPZVFGOAEFERRUQRRANR
M:	1507.963291	GTKQOYUEFNZDEDQQTPQQZMQ
N:	221.942748	FSJPNXTDEMYCDCPPSOPPYLP
O:	350.386509	ERIOMWSCDLXBCBOORNOOXKO
P:	226.282960	DQHNLVRBCKWABANNQMNNWJN
Q:	830.156239	CPGMKUQABJVZAZMMPLMMVIM
R:	766.091534	BOFLJTPZAIUYZYLLOKLLUHL
S:	1201.725316	ANEKISOYZHTXYXKKNJKKTGK
T:	3288.619576	ZMDJHRNXYGSWXWJJMIJJSFJ
U:	138.175067	YLCIGQMWXFRVWVIILHIIREI
V:	312.664713	XKBHFPLVWEQUVUHHKGHHQDH
W:	460.459051	WJAGEOKUVDPTUTGGJFGGPCG
X:	299.281083	VIZFDNJTUCOSTSFFIEFFOBF
Y:	17.794107	UHYECMISTBNRSREEHDEENAE
Z:	392.978830	TGXDBLHRSAMQRQDDGCDDMZD
the best key and plaintext:
 (17.794106752812969, 'Y', 'UHYECMISTBNRSREEHDEENAE')

A:	621.053047	JSKOCWFUZWLKOWUFSFWLAJ
B:	609.861131	IRJNBVETYVKJNVTEREVKZI
C:	541.123143	HQIMAUDSXUJIMUSDQDUJYH
D:	281.884542	GPHLZTCRWTIHLTRCPCTIXG
E:	233.029320	FOGKYSBQVSHGKSQBOBSHWF
F:	440.232896	ENFJXRAPURGFJRPANARGVE
G:	1884.184129	DMEIWQZOTQFEIQOZMZQFUD
H:	50.879775	CLDHVPYNSPEDHPNYLYPETC
I:	844.874803	BKCGUOXMRODCGOMXKXODSB
J:	396.621196	AJBFTNWLQNCBFNLWJWNCRA
K:	817.151212	ZIAESMVKPMBAEMKVIVMBQZ
L:	1013.788681	YHZDRLUJOLAZDLJUHULAPY
M:	1322.896820	XGYCQKTINKZYCKITGTKZOX
N:	1664.584005	WFXBPJSHMJYXBJHSFSJYNW
O:	364.019383	VEWAOIRGLIXWAIGRERIXMV
P:	946.495562	UDVZNHQFKHWVZHFQDQHWLU
Q:	187.348195	TCUYMGPEJGVUYGEPCPGVKT
R:	451.877460	SBTXLFODIFUTXFDOBOFUJS
S:	36.759849	RASWKENCHETSWECNANETIR
T:	893.017563	QZRVJDMBGDSRVDBMZMDSHQ
U:	135.669058	PYQUICLAFCRQUCALYLCRGP
V:	1295.938128	OXPTHBKZEBQPTBZKXKBQFO
W:	758.883192	NWOSGAJYDAPOSAYJWJAPEN
X:	2957.308242	MVNRFZIXCZONRZXIVIZODM
Y:	152.309657	LUMQEYHWBYNMQYWHUHYNCL
Z:	1434.597986	KTLPDXGVAXMLPXVGTGXMBK
the best key and plaintext:
 (36.759849129505348, 'S', 'RASWKENCHETSWECNANETIR')

A:	193.927581	MXBPBAMCWBPAIMBKDWVPVI
B:	769.246056	LWAOAZLBVAOZHLAJCVUOUH
C:	2806.863646	KVZNZYKAUZNYGKZIBUTNTG
D:	1261.430222	JUYMYXJZTYMXFJYHATSMSF
E:	1487.359191	ITXLXWIYSXLWEIXGZSRLRE
F:	414.023126	HSWKWVHXRWKVDHWFYRQKQD
G:	995.381983	GRVJVUGWQVJUCGVEXQPJPC
H:	75.059236	FQUIUTFVPUITBFUDWPOIOB
I:	21.718766	EPTHTSEUOTHSAETCVONHNA
J:	710.078316	DOSGSRDTNSGRZDSBUNMGMZ
K:	137.893481	CNRFRQCSMRFQYCRATMLFLY
L:	994.383282	BMQEQPBRLQEPXBQZSLKEKX
M:	470.561949	ALPDPOAQKPDOWAPYRKJDJW
N:	1936.516978	ZKOCONZPJOCNVZOXQJICIV
O:	137.243021	YJNBNMYOINBMUYNWPIHBHU
P:	786.720303	XIMAMLXNHMALTXMVOHGAGT
Q:	1602.720078	WHLZLKWMGLZKSWLUNGFZFS
R:	670.847951	VGKYKJVLFKYJRVKTMFEYER
S:	2162.223119	UFJXJIUKEJXIQUJSLEDXDQ
T:	140.662158	TEIWIHTJDIWHPTIRKDCWCP
U:	179.513887	SDHVHGSICHVGOSHQJCBVBO
V:	83.954143	RCGUGFRHBGUFNRGPIBAUAN
W:	916.634954	QBFTFEQGAFTEMQFOHAZTZM
X:	686.506064	PAESEDPFZESDLPENGZYSYL
Y:	592.684985	OZDRDCOEYDRCKODMFYXRXK
Z:	906.406128	NYCQCBNDXCQBJNCLEXWQWJ
the best key and plaintext:
 (21.7187657659424, 'I', 'EPTHTSEUOTHSAETCVONHNA')

A:	854.801822	ZMWGZKVDMAWAKGAGWGSWWU
B:	1896.590694	YLVFYJUCLZVZJFZFVFRVVT
C:	461.610523	XKUEXITBKYUYIEYEUEQUUS
D:	1096.644562	WJTDWHSAJXTXHDXDTDPTTR
E:	255.268959	VISCVGRZIWSWGCWCSCOSSQ
F:	117.562758	UHRBUFQYHVRVFBVBRBNRRP
G:	730.972262	TGQATEPXGUQUEAUAQAMQQO
H:	2666.933756	SFPZSDOWFTPTDZTZPZLPPN
I:	51.818136	REOYRCNVESOSCYSYOYKOOM
J:	1510.166700	QDNXQBMUDRNRBXRXNXJNNL
K:	336.714716	PCMWPALTCQMQAWQWMWIMMK
L:	846.634247	OBLVOZKSBPLPZVPVLVHLLJ
M:	577.169986	NAKUNYJRAOKOYUOUKUGKKI
N:	3050.594115	MZJTMXIQZNJNXTNTJTFJJH
O:	52.276471	LYISLWHPYMIMWSMSISEIIG
P:	463.576439	KXHRKVGOXLHLVRLRHRDHHF
Q:	991.237485	JWGQJUFNWKGKUQKQGQCGGE
R:	815.359047	IVFPITEMVJFJTPJPFPBFFD
S:	19.649313	HUEOHSDLUIEISOIOEOAEEC
T:	246.736459	GTDNGRCKTHDHRNHNDNZDDB
U:	270.537368	FSCMFQBJSGCGQMGMCMYCCA
V:	351.973439	ERBLEPAIRFBFPLFLBLXBBZ
W:	581.745863	DQAKDOZHQEAEOKEKAKWAAY
X:	5488.614051	CPZJCNYGPDZDNJDJZJVZZX
Y:	141.479472	BOYIBMXFOCYCMICIYIUYYW
Z:	2088.921693	ANXHALWENBXBLHBHXHTXXV
the best key and plaintext:
 (19.649313116990811, 'S', 'HUEOHSDLUIEISOIOEOAEEC')

According to the Chi-square analysis the key is OCYPTANALYSIS.


In [20]:
vigenere(ctext, 'OCYPTANALYSIS', direction = -1)


Out[20]:
'HWESCULPTUREHOHBEENBOTHAPUNOLEANDAMYSTEFNFORTHOSEWHOVDPETOCRACKTHSRYPHEREDMESSOVESCONTAINEDKXTHINTHESCULDIURESTWOTHOUGPNDALPHABETIQAETTERSINTHEHLENTYYEARSSIBREKRYPTOSWASSGECTEDTHREEOTIHEFOURSECTICCSHAVEBEENCOBUIRMEDTOHAVEPTENSOLVEDNOOBTHASYETBEENAPAETOSOLVETHEFTMAININGNINEHNSEVENCHARACHTRMESSAGE'

Pretty close, but no cigar. Let's try CRYPTANALYSIS.


In [21]:
vigenere(ctext, 'CRYPTANALYSIS', direction = -1)


Out[21]:
'THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE'

CHALLENGE: Using Chi-square criterion to decrypt Vigenere


Using the techniques described above, try deciphering these two messages:
TZKHT BTWJN BKGDP GVFVO HBTWA ZFFZP JVMJT VTHAV VAMFB BSBKU 
JBUKL BDOKR MWZLH RHFFH WLEGF GISTR WBDSL VOHVY UQSWN PFOXR 
WKKAN YBBXY NGHAR TJBZN BQZUE VUGUR MQOST VPBFA WWHWM OFFKU 
AMQFI AFHVR WVUFE GZHYR AMTSS OFSEZ DKTKP RDICN CQAFA OPIKG 
QMYWA AJBXB OBTWF BVFVA LZKHT REAVF BISWS VUPVN AAAXT UFTFH 
AUQKS NHSJG QMRAR FUHYE NMTSV RCSVA BWXNE QXVZY NBTWF BVFKU 
VMEKA TFFVZ JQZKA FPBVB OBTWM BTHWN VWGKU ATCCI NLOGD RTWEG 
QMIGR YEHYR BKGDP GVFVP XVFAN HFGKB KMAXI AUSIR BBFGC EZDKN 
WIXQS GTPFG QIYST RVFRA MXDGF RTGZB WIXOH BBFVN CBQEP GJBXG 
XLQUI CISIG QMRGU EUVJR LBUGN GISRE CQELH NTGFS JZSAV ROHNB 
LTGWS GPHYV BAQUT VPB
more difficult to break (due to the text not being very typical?):
IUOCA LGTAM DAQFN ISJTN MZGWS UZCBH GSSJT NM?LZ EYVKG LLZEE 
BJVPK EAGOW VQUXI EMVZB ZWING GTUSL IOOOC AYSTH FJGLS FDTSS 
PAEAT TFVWV VWRGS MWVVL OAOMP SFGWN MGEIL AONYV QMKDA NHDGG 
CFOWB TQCLL HIT?L JMQKH OVDFQ LKBUS AGLGM TTIWT MKGME XZGZW 
PWHPC PWOKT HFWZI ULLOD SVQGF ?ONMQ YELZI SXSUP AKLAT LOMKK 
AGFPV PAJTY FAIPL VEGSW GAXAF TZKGD WFMIO MVMKK IXQGK VLXIV 
FKGKG FDSOG TBZKE VFFVG KWVEO VGOJW ESFAI PLEIN VLGAX GRTZX 
QCJKE CPFFA OWSTJ VDGJG WS
Points to consider:
  1. Without solving it, do you think this is a transposition cipher of English text? (hint: investigate letter frequencies)
  2. Without solving it, do you think this is a substitution cipher of English text? (hint: investigate the IC of the full text)
  3. In case it is a Vigenere cipher (spoiler: it is):
    1. what would be the mosdt likely key length? (hint: investigate the IC for different key lengths)
    2. which positions would be shifted by the same key/amount?
    3. what can you say about the plaintext language? (hint: investigate IC of subsequences)
    4. solve the key by hand using frequency distributions of the subsequences.
    5. solve the key automatically using the Chi-squared criterion.

Vigenere Solution #2: Maximum Likelihood, Combined with a Search Procedure

For solving Vigenere, we could also use Maximum Likelihood to evaluate how much like plaintext a decrypted ciphertext is. Combined with a way to search through possible keys, we can search for the best key. The advantage of using maximum likelihood is also that we can easily consider longer subsequences, rather than just look at the message as a bag of unordered letters where each letter has a frequency. By considering statistics of longer subsequences, we can better recognize when some text is written in a particular language.

This approach is adapted from:

Note that this is just one possible way to search through the possible keys. It does not use any information like IC for determining the likely key length N, or Chi-square for determining the likely values for the shifts of the N Caesar ciphers. Instead, for a range of key lengths, it exhaustively searches the possibilities for the first 3 characters of the key and keeps track of the best candidates. Then, it extends these subkeys one character at a time, while keeping track of the best candidates.

An alternative way would be to determine the key length and likely key, using IC and Chi-square criteria, and then gradually improve they key using a stochastic search algorithm like simulated annealing.

During the search, we need a way to keep track of the best results so far. This can be easily achieved using the $\texttt{nbest}$ class:


In [22]:
# keep a list of the N best things we have seen, discard anything else
# this class can also be imported using 'import nbest' from a separate file nbest.py
class nbest(object):
    def __init__(self,N=1000):
        self.store = []
        self.N = N
        
    def add(self,item):
        self.store.append(item)
        self.store.sort(reverse=True)
        self.store = self.store[:self.N]
    
    def __getitem__(self,k):
        return self.store[k]

    def __len__(self):
        return len(self.store)

In [23]:
ctext = "VYCHVUYPESJMZCJZTXNOOEFSXMBQJTTNQAXWKBWTPDDKTUODCOPGJFNTMOPRLACBZGTWEAEEEOKWAKCXCHVOATLGFMVYZRWBNGHPQUCDRKSGXSGWZRZWMURLSTLCHLZWBAECCIMEESTLLPWVNCCMYLELPKAAPTCZKYCTZQOIKGICRMEQTSPWMGHKFTYOHRDCUBAQEQWTVRBPCFKGPWGGFEQTZFSDWDVCCLOYVPBFWGPVFPLYRTMCWVSDCCIHSBLGCLPWTVKPBNVNRLAVWVPQTOEACSYJIUVVPBXSFARC"

The search for the best key, as described above, is performed using the code below. Rather than solving and evaluating each shift individually, the criterion used to evaluate the quality of a keyword is the likelihood of the 4-grams in the decrypted ciphertext being English.


In [24]:
from itertools import permutations
from ngram_score import ngram_score

import re
import pprint as pp

qgram = ngram_score('ngrams/en_sherlock_4grams') # load our 4gram statistics
trigram = ngram_score('ngrams/en_sherlock_3grams') # load our 3gram statistics

# keep track of the 100 best keys
N=100 

# test keys up to a length 15
for KLEN in range(3,16):
    
    print "="*80
    rec = nbest(N)
    
    # exhaustively test all possible letters for first 3 entries of the key and keep track of the N best ones
    # if KLEN=7, this will test e.g. FOOAAAA and BARAAAA 
    for i in permutations('ABCDEFGHIJKLMNOPQRSTUVWXYZ',3):
        i = "".join(i)
        key = ''.join(i) + 'A'*(KLEN-len(i))
        decrypted_ctext = vigenere(ctext, key, direction = -1)
        score = 0
        for j in range(0,len(ctext),KLEN):
            score += trigram.score(decrypted_ctext[j:j+3])
        rec.add((score,''.join(i), decrypted_ctext))
    next_rec = nbest(N)
    
    # for the remaining KLEN-3 characters of the key,
    for i in range(0,KLEN-3):
        # go over the N best keys found so far...
        for k in xrange(N):
            # ...and determine the best next character of the key, while keeping best N keys so far
            for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
                key = rec[k][1] + c
                fullkey = key + 'A'*(KLEN-len(key))
                decrypted_ctext = vigenere(ctext, fullkey, direction = -1)
                score = 0
                for j in range(0,len(ctext),KLEN):
                    score += qgram.score(decrypted_ctext[j:j+len(key)])
                next_rec.add((score,key, decrypted_ctext))
        rec = next_rec
        next_rec = nbest(N)
       
    # show the results
    bestscore = rec[0][0]
    bestkey = rec[0][1]    
    #decrypted_ctext = rec[0][2]
    # always show entire decrypted ctext, even if the above analysis is done only on part of the ctext, e.g. ctext[0:100]
    decrypted_ctext = vigenere(ctext, bestkey, direction = -1) 
    print bestscore, 'klen', KLEN, ':"'+bestkey+'",', decrypted_ctext
    # uncomment the following lines to see top-10 results
    #pp.pprint(rec.store[0:10])
    #print '\n'


================================================================================
-572.199596921 klen 3 :"ZAL", WYRIVJZPTTJBACYATMOODFFHYMQRJIUNFBXLLBLUPSEKIVOSDOEHJUOTBPPGMARCZVUWTBETFOZXAZDXRIVDBTAHFBWYOSWQOGWQQJDDGLSVYSVXZGAWBVRATTADHAAWQBERDIBFEHULAQWKOCRNYAFLELAPQTRAKNDTOROXLGXDRBFQITPLNGWLFIZOWSDRVBPREFXTKSBEDFZHPLHGUFQIAFHEWSWCRMONWPQGWVQVUQLNSTBDWKTDRDIWTBAHCAQWIWKECNKORABVLWPFUOTBCHZJXVVKQBMTFPSC
================================================================================
-514.775796714 klen 4 :"LSOS", KGOPKCKXTAVUOKVHIFZWDMRAMUNYYBFVFIJEZJIBELPSICALRWBOYNZBBWBZAIOJOOFETIQMTWWEPSOFRPHWPBXOUUHGOZIJCOTXFCOLGSEOMASEOZLEBCDTHBXKWTLEQIQKRQYMTAFTAXIDCKOUNTQTESMIEBOHZGOBOYAQZOUKGUQYIABEBOTSUBKWWZPKJJMYTYIBKZNXRNWOEESOUMCBONELLLHKRTAGKXNNLOBDUXXGGBYKLDELRKUPHJXORTBEIDWXQVHVGTMDLDBYIWQIRAKRXCHDEJJAUIDK
================================================================================
-816.384029622 klen 5 :"RCOTI", EWOONDWBLKSKLJBIRJUGXCRZPVZCQLCLCHPFINDLYBPRLDMPJGYEVMFCKAWJUYOIRPRILSNCQVCFYWJPLFHVSCJSMEEWLYOKLSOHZSOKJTQSEKPULYRFKGYDBRXJZUXIISNAOPENCEADUNICFLAYFDNJBRSJNFJRTWOARZMURYRADTWZREWOVETRXCWAOJMAGISZCCDLEPNWUOISWOPERLICXRZVFBHJUUMKCHKDINHEDBSQARYJOEQPJURFEIDPAXWOCTWWTWTZYDJTICHZRALSLQKQADTHWTGQRHJL
================================================================================
-1015.21898177 klen 6 :"RLYYWY", ENEJZWHEGUNOIRLBXZWDQGJUGBDSNVCCSCBYTQYVTFMZVWSFLDRINHWIOQTTUPEDDICLGCIGNDMYEMLMEJZQJINIJOENBTADWVJRUWLSTMWIGHIYDTILOWVNBINELNILDCIELXOGIUCANRAXWREOCNNARMECYIEBOALIBSSKTVKEVONFVUTYVVJMJVHDJTHEDQCSISFIXTFRLUMITYPVHGUVIUUFAFERENSAEEDHAIYKHRPAAIOEAXBSEEMJBQNIGNYLVXORKCXPVNJKYXTSCDGCGUHYKWZXYQZUJCAR
================================================================================
-1157.6396952 klen 7 :"RLZPABO", ENDSVTKYTTUMYOSOUINNANUTIMACSIUYQZJFZCHTOPMZUFOCOXEHUFMFVDQCLZOKOHEWDMNTFZKVMTRYNHUAJIMRFLHHOSHBMSQERFCCDTHHISFIIGAHMTDUHUWCGXILCLEBORBFPSSXUEXGNBOVNMPLOWJPQECYWHRUKQNUTVJNRLQZITAWLSQZGEYNTASDFBZCNFXEVQNYRGVGOIPVGPQSLOHEHDUOLAPJVONOLHAVEBUNSEMBIEHENCHTBQMRCKBFIWVPAZECSWAUIEEREODMLHZUITHEECISEMAR
================================================================================
-1256.90806285 klen 8 :"LKPOLXEY", KONTKXURTIUYOFFBINYADHBUMCMCYWPPFQIIZESVETOWIXKFREASYIJVBEADADYDOWEITDAGTEVIPNYZRXGAPWHIUCGKOUSDCWSBFXYFGADSMVCYOHKIBXNNHJWOWOVYQQPORLIGTIEXASSXCSNYNOANEALMEWYBZONFOTKKZWTOGPASIIAIBJDMUJJAWUZEJRLCTTSVKHMBRIGIEMRSUHMVOVDPLGRERBZKKSXHLWAHUSHAGJXOLYOFRSTTHEHIRBAIIYGRQDGZGOWXLLACIRACRIJVXXRXERIEUDNE
================================================================================
-1329.86689478 klen 9 :"YOSCBXMST", XKKFUXMXLUVUXBMNBEPAWCEVLUISVBRMTOFDMNEROGRSAWALANSURMPFUMOUZIJDLORVHOMLGASUZNQFJJHWYSOUNTXKHPVEBOORCCACUYANZEOUYUNETWDTQSOQPSBIJYDFQQTGQARKODECPOKKXOSTWMMINSFNSFEFHONLYOPEDUCPWGXDOSPIEWMWOTPKSADEMXYFDPASQNRIBEEFISYABRABVGJKJNAGTOETENRHNNKBFBTEIDQCFQQOUNTEBODEAXWXZMYBZSCHETOTHWLCOAWILIDCRNFQEDFK
================================================================================
-1380.61733025 klen 10 :"RLOOIMCOLL", ENOTNIWBTHSBLOBNRJCDXTREPAZCYICCCMPKINLIYSPWLIMPRDYVVRFHKAEGUPONRURITPNTQACKYWRMLWHASHJSUBENLDOPLSWEZJOPJYQSMHPLLDRKKGGABIXOZZXIQPNROUESCEIAUEIHFQAYNANABWSONFROTNOFREMUZVRRDYWEREELVVTWXHWAWGMRGNSECCLIEGNBUTISELPVRQIHXRHSFSHOUZMKKEKUISHJDBANAIYOOJQPRRRWENDUAXELCKWBTBTZGAJKIHHERATPLHKVAITHEQGHRMJQ
================================================================================
-1461.81730388 klen 11 :"ITCJELFXRZH", NFAYRJTSNTCEGAAVISQXPXXZVDXFEWCOJSEUBXLOSMEDLBMUYDKJSGGLTMGNAVFKAZLDCRATZRTXTCJVTDKJDCMZXTTPVGREWHAHXSTZGFVPYLYDXIVLHXAMLLSAYHOREJFVUPKVAHOOUQPNUATINGHUQDSHNKYOFBLUSIVGBCXXUVFJLZNNIVCNOURGOPUYJWDZFJOATIXEXITHIONEWAFOCOTWOKTTYAJBEQUXDEGRUKOHSMEJUMOSXFRILTSETHERWELITUTENAVYFWIIAMVWRNBSJNNCNSTHADAD
================================================================================
-1481.69582345 klen 12 :"WBTCNSESBYAE", ZXJFICUXDUJIDBQXGFJWNGFOBLIOWBPVPCXSOADRCLZSSWOZGNWEWNJBLQPNPZJZMOPEDCEAINRUNSYFBJVKESSESURGYTWXRFONDCYLQMSCBRNUMZVELWRHWSSAUTVEACEYGHTCRAPTKRWRRBJKLTATOMAWTSJXXGYBYSOEOFPAEUAYSUPSQFOISBUWGTDYYAHORYSBUTBLGERECECOEGQPDEZBJLRKBNOUZOIDJOLDERLUVSTAJDOLBEIDWASEPTLESXKLFMCLETWDVXPMXNLYPAURHWVRTAEQSINK
================================================================================
-982.668177856 klen 13 :"CRYPTANALYSIS", THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
================================================================================
-1511.85448167 klen 14 :"VGXNYIJILPPONA", ASFUXMPHTDUYMCOTWKPGFWUDIYOQONWASSOOZMHFCDIEWHQVTGERURATRISENSTTOREIRAJYHBMORCRINTIOFNOTHEMQOCHNAGMJTHEVICHRIETWELCJOMIDHEWOULEQENGUTABPPEGLQJZIPUTENWPXCKFUSGERBQREKCBIPALPTEVIIDAIZGMEIGAGYJSNFNNQJKZGXJSHRQVSCWLAIRSLQXHOHPICHFRLXHSXLRAHSPQSUGOUNNHONOVHXVOTEDGOIGVBONAHUYCNNNEBEARAHMBWKMMNEMIESAWW
================================================================================
-1565.3751932 klen 15 :"DARYWIOOOICRYDC", SYLJZMKBQKHVBZHWTGPSGQREPKKSGRQNZCBOWNILNMFHRROMESHSVRFRVQMPIALDDYFIQSCNGLITATEBUTHASRUICKSYITATZSTHODEAPHSPZWYILDRUVWOJPTUELDLINSCLEFKBEBVPDBIHFALOVJBLYMESBFORIHEQXNORMKAODYWOCUMUJGQMJLKATJBLWYYNEZYXNDNBUDTIMUDGOGULLREVUMXZAIOHXTTRISHTORIWOTVEANEPOUGQUYJDCURALHWBTLEPOJXVFXTIFAQSABAGGRVERFPERMJA

Here we can see that using the different criterion and search algorithm the correct key and plaintext have successfully been found.

Vigenere Solution #3: Maximum Likelihood, Combined with Brute Force

Instead of trying to determine the key character by character, we can also bruteforce it and keep track of the best key as before.

The wordlist used here was downloaded from: https://packetstormsecurity.com/Crackers/wordlists/dictionaries/


In [25]:
ctext = "VYCHVUYPESJMZCJZTXNOOEFSXMBQJTTNQAXWKBWTPDDKTUODCOPGJFNTMOPRLACBZGTWEAEEEOKWAKCXCHVOATLGFMVYZRWBNGHPQUCDRKSGXSGWZRZWMURLSTLCHLZWBAECCIMEESTLLPWVNCCMYLELPKAAPTCZKYCTZQOIKGICRMEQTSPWMGHKFTYOHRDCUBAQEQWTVRBPCFKGPWGGFEQTZFSDWDVCCLOYVPBFWGPVFPLYRTMCWVSDCCIHSBLGCLPWTVKPBNVNRLAVWVPQTOEACSYJIUVVPBXSFARC"

In [26]:
from ngram_score import ngram_score

import re
import pprint as pp

qgram = ngram_score('ngrams/en_sherlock_4grams') # load our 4gram statistics
trigram = ngram_score('ngrams/en_sherlock_3grams') # load our 3gram statistics

# keep track of the 100 best keys
N=100 
rec = nbest(N)

f = open('wordlists/websters-dictionary','r')

i = 1
for key in f:    
    key = re.sub(r'[^A-Z]','',key.upper())
    decrypted_ctext = vigenere(ctext, key, direction = -1)
    score = qgram.score(decrypted_ctext)
    rec.add((score,key, decrypted_ctext))
    i += 1
    if i % 10000 == 0:
        bestscore = rec[0][0]
        bestkey = rec[0][1]
        decrypted_ctext = vigenere(ctext, bestkey, direction = -1)
        print "%20s\t%5.2f,\t%s,\t%s,\t%s" % (key, bestscore, len(bestkey), bestkey, decrypted_ctext)


          ANTICRISIS	-1863.09,	13,	ACROPARALYSIS,	VWLTGUHPTUREHCHIFINXOTHAPUBOSFENZAMYSTETNMPVTDOSEWHOJDWFXOYRACKTHGRFQLENEDMESSCVLTGOJTAINEDYXAIMNPHESCULRIBSISPWOTHOUUPUEELLHABETIEALUXENSINTHEVLLOXYUEARSSIPRLLVYLTOSWASGGLDXEZTHREEOHIOFJOQRSECTIQCZIEVABEENCOPUPSQEZTOHAVEDTLOWOHVEDNOOPTOBWYATBEENADALUSSKLVETHETTTBMNENGNINEVNZFZEJCHARACVTYNISOAGE
         BEERBACHITE	-1863.09,	13,	ACROPARALYSIS,	VWLTGUHPTUREHCHIFINXOTHAPUBOSFENZAMYSTETNMPVTDOSEWHOJDWFXOYRACKTHGRFQLENEDMESSCVLTGOJTAINEDYXAIMNPHESCULRIBSISPWOTHOUUPUEELLHABETIEALUXENSINTHEVLLOXYUEARSSIPRLLVYLTOSWASGGLDXEZTHREEOHIOFJOQRSECTIQCZIEVABEENCOPUPSQEZTOHAVEDTLOWOHVEDNOOPTOBWYATBEENADALUSSKLVETHETTTBMNENGNINEVNZFZEJCHARACVTYNISOAGE
            CAPRIOTE	-1863.09,	13,	ACROPARALYSIS,	VWLTGUHPTUREHCHIFINXOTHAPUBOSFENZAMYSTETNMPVTDOSEWHOJDWFXOYRACKTHGRFQLENEDMESSCVLTGOJTAINEDYXAIMNPHESCULRIBSISPWOTHOUUPUEELLHABETIEALUXENSINTHEVLLOXYUEARSSIPRLLVYLTOSWASGGLDXEZTHREEOHIOFJOQRSECTIQCZIEVABEENCOPUPSQEZTOHAVEDTLOWOHVEDNOOPTOBWYATBEENADALUSSKLVETHETTTBMNENGNINEVNZFZEJCHARACVTYNISOAGE
          COMMORIENT	-1850.74,	13,	CHROMATOLYSIS,	TRLTJUFBTUREHACIFLNVATHAPUZJSFHNXMMYSTERIMPYTBASEWHOHYWFAOWDACKTHEMFQOELQDMESSAQLTJOHFAINEDWSAIPNNTESCULPDBSLSNIOTHOUSKUEHLJTABETICVLUAELEINTHETGLOAYSQARSSINMLLYYJFOSWASEBLDAEXFHREEOFDOFMOODSECTIOXZIHVYNEENCONPPSTEXFOHAVEBOLOZOFHEDNOONOOBZYYFBEENABVLUVSIXVETHEROTBPNCZGNINETIZFCEHOHARACTOYNLSMMGE
          DEGRADEDLY	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
               ELDER	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
            FIRELING	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
              GROOTY	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           HYPOTOXIC	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
            JONGLERY	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
         MALAPROPISH	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           MULTIFOIL	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
        OMBROPHOBOUS	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           PAUPERATE	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
        POLYSYLLABIC	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           PYGOPAGUS	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           ROMERILLO	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
          SHRIMPLIKE	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           STRIDENCE	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
           TETRAPODA	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
       UNADVENTURING	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
        UNPRETENDING	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE
              WAXILY	-1262.44,	13,	CRYPTANALYSIS,	THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE

Again, the key was found successfully. Interestingly, before finding CRYPTANALYSIS, very similar words of the same length were identified as candidate keys:

- ACROPARALYSIS
- CHROMATOLYSIS
- CRYPTANALYSIS

Keyed Caesar Variants

Keyed Caesar Variants

The Caesar Cipher is a substitution cipher, which substitutes letters by letters at the same index in a shifted version of the alphabet, i.e. in case of rot13, a Caesar cipher with key 'N'

    plain  alphabet       ABCDEFGHIJKLMNOPQRSTUVWXYZ
    cipher alphabet       NOPQRSTUVWXYZABCDEFGHIJKLM (and other shifted versions of it, shown alphabet corresponds to key 'N')

A variant of the Caesar cipher is the Keyed Caesar Cipher, which uses an additional alphabet keyword to create a keyed cipher alphabet, i.e. in case the alphabet keyword is KRYPTOS:

    plain  alphabet       ABCDEFGHIJKLMNOPQRSTUVWXYZ
    cipher alphabet       KRYPTOSABCDEFGHIJLMNQUVWXZ (and other shifted versions of it, shown alphabet corresponds to key 'A')
                          RYPTOSABCDEFGHIJLMNQUVWXZK (and other shifted versions of it, shown alphabet corresponds to key 'B')


Finally, we can also use a keyed alphabet for both the plain alphabet and the cipher alphabet to obtain we will call a Caesar Cipher with a keyed alphabet here (note the difference with Keyed Caesar):

    plain  alphabet       KRYPTOSABCDEFGHIJLMNQUVWXZ
    cipher alphabet       KRYPTOSABCDEFGHIJLMNQUVWXZ (and other shifted versions of it, shown alphabet corresponds to key 'K')
                          RYPTOSABCDEFGHIJLMNQUVWXZK (and other shifted versions of it, shown alphabet corresponds to key 'R')

First, let's define a function for computing a keyed alphabet. It takes an alphabet and a keyword consisting of symbols from that alphabet. Then, it removes all but the first occurence of each character in the keyword (i.e. CRYPTANALYSIS becomes CRYPTANLSI), and then appends the remaining characters of the alphabet in order to obtain the keyed alphabet (i.e. CRYPTANLSIBDEFGHJKMOQUVWXZ)


In [27]:
import re

def compute_keyed_alphabet(keyword, alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
    # remove double keyword letters, keeping first occurence
    keyword_unique = ""
    for c in keyword:
        if c not in keyword_unique:
            keyword_unique += c
    # compute cipher alphabet
    keyed_alphabet = keyword_unique + re.sub("[%s]" % keyword, "", alphabet)
    return keyed_alphabet

In [28]:
def keyed_caesar(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="A", direction=1):
    # compute cipher alphabet
    keyed_alphabet = compute_keyed_alphabet(alpha_key, alphabet)
    # set source and destination alphabet depending on direction
    if direction == 1:
        src_alphabet = alphabet
        dst_alphabet = keyed_alphabet
    else:
        src_alphabet = keyed_alphabet
        dst_alphabet = alphabet
    # encrypt / decrypt
    keyval = alphabet.find(key)
    t = ""
    for sc in s:
        i = src_alphabet.find(sc)
        t += dst_alphabet[(i + keyval * direction) % len(dst_alphabet)] if i > -1 else sc
    return t

Cryptanalysis - Keyed Caesar

Effect of Keyed Caesar on Frequency Distributions


In [29]:
ptext = "THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE"

Now, like we did before with the normal Caesar cipher and look at the effect of the cipher on the frequency distribution:


In [30]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,9]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
keyed_alphabet = compute_keyed_alphabet("KRYPTOS", alphabet)  # UNCOMMENT IF YOU WANT ORDER THE 

def rotate(l, n):
    return l[n:] + l[:n]

@interact(key_i = (0,25,1))
def plot_graphs(key_i = 0):
    # encrypt ptext using this key
    key = alphabet[key_i]
    ctext = keyed_caesar(ptext, key, alpha_key="KRYPTOS", direction=1)
    
    # determine letter frequency in plaintext, and plot
    ptext_freq = np.array([ptext.count(c)/len(ptext) for c in alphabet])
    pfig = plt.subplot(3,1,1)
    plt.bar(range(len(alphabet)), ptext_freq, tick_label = list(alphabet), align = 'center', color = 'b')
    plt.title('blue = plaintext letter frequency, red = ciphertext letter frequency')
  
    # determine letter frequency in ciphertext, and plot, ordered by keyed alphabet
    ctext_freq = np.array([ctext.count(c)/len(ctext) for c in keyed_alphabet])
    cfig = plt.subplot(3,1,2)
    plt.bar(range(len(keyed_alphabet)), ctext_freq, tick_label = rotate(list(keyed_alphabet), key_i), align = 'center', color = 'r')
    plt.xlabel("key = %s" % key)

    # determine letter frequency in ciphertext, and plot, ordered by alphabet
    ctext_freq = np.array([ctext.count(c)/len(ctext) for c in alphabet])
    cfig = plt.subplot(3,1,3)
    plt.bar(range(len(alphabet)), ctext_freq, tick_label = rotate(list(alphabet), key_i), align = 'center', color = 'r')
    plt.xlabel("key = %s" % key)
    
    plt.show()
    print ctext


NATMYQEINQLTAKMRTTGRHNAKIQZZETKGPKFXMNTLXOHLNAHMTVAHAHITNHYLKYDNATYXIATLTPFTMMKSTMYHGNKBGTPVBNABGNATMYQEINQLTMNVHNAHQMKGPKEIAKRTNBYETNNTLMBGNATNVTGNXXTKLMMBGYTDLXINHMVKMTLTYNTPNALTTHONATOHQLMTYNBHGMAKUTRTTGYHGOBLFTPNHAKUTRTTGMHEUTPGHHGTAKMXTNRTTGKRETNHMHEUTNATLTFKBGBGSGBGTNXMTUTGYAKLKYNTLFTMMKST

We can see that due to the keyed alphabet, the effect of Keyed Caesar is not just a shift of the frequency distributions (unless we look at the frequencies in order of the keyed alphabet).

Determine Key by Aligning Frequency Distributions?

Now, we'll start with a ciphertext resulting from Keyed Caesar encryption, and look at decrypting it by aligning frequency distributions. Note that when just given the ciphertext, we have no idea what the key to create the keyed alphabet is.


In [31]:
ctext = keyed_caesar(ptext, "N", alpha_key="KRYPTOS", direction=1)
ctext


Out[31]:
'SQLOIAXYSATLQGOHLLKHRSQGYAFFXLGKJGZEOSLTEMRTSQROLCQRQRYLSRITGIWSQLIEYQLTLJZLOOGNLOIRKSGUKLJCUSQUKSQLOIAXYSATLOSCRSQRAOGKJGXYQGHLSUIXLSSLTOUKSQLSCLKSEELGTOOUKILWTEYSROCGOLTLISLJSQTLLRMSQLMRATOLISURKOQGBLHLLKIRKMUTZLJSRQGBLHLLKORXBLJKRRKLQGOELSHLLKGHXLSRORXBLSQLTLZGUKUKNKUKLSEOLBLKIQGTGISLTZLOOGNL'

In [32]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,6]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def rotate(l, n):
    return l[n:] + l[:n]

@interact(key_i = (0,25,1))
def plot_graphs(key_i = 0):
    # decrypt ciphertext using this key
    key = alphabet[key_i]
    # NOTE THAT WE DO NOT KNOW THE KEY TO THE ALPHABET, SO WE CANNOT USE THAT!!!
    decrypted_ctext = keyed_caesar(ctext, key, alpha_key = "A", direction=-1)
    # determine letter frequency in plaintext, and plot # FIXME: base on separate text
    ptext_freq = np.array([ptext.count(c)/len(ptext) for c in alphabet])
    pfig = plt.subplot(2,1,1)
    plt.bar(range(len(alphabet)), ptext_freq, tick_label = list(alphabet), align = 'center', color = 'b')
    plt.title('blue = plaintext letter frequency, green = decrypted ciphertext letter frequency')

    # determine letter frequency in ciphertext, and plot
    ctext_freq = np.array([decrypted_ctext.count(c)/len(decrypted_ctext) for c in alphabet])
    cfig = plt.subplot(2,1,2)
    plt.bar(range(len(alphabet)), ctext_freq, tick_label = rotate(list(alphabet), key_i), align = 'center', color = 'g')
    plt.xlabel("key = %s" % key)
    
    plt.show()
    print decrypted_ctext


SQLOIAXYSATLQGOHLLKHRSQGYAFFXLGKJGZEOSLTEMRTSQROLCQRQRYLSRITGIWSQLIEYQLTLJZLOOGNLOIRKSGUKLJCUSQUKSQLOIAXYSATLOSCRSQRAOGKJGXYQGHLSUIXLSSLTOUKSQLSCLKSEELGTOOUKILWTEYSROCGOLTLISLJSQTLLRMSQLMRATOLISURKOQGBLHLLKIRKMUTZLJSRQGBLHLLKORXBLJKRRKLQGOELSHLLKGHXLSRORXBLSQLTLZGUKUKNKUKLSEOLBLKIQGTGISLTZLOOGNL

For easy reference, this is the subtitution when keying the alphabet with 'KRYPTOS', and using key 'N':

    plain  alphabet       ABCDEFGHIJKLMNOPQRSTUVWXYZ
    cipher alphabet       KRYPTOSABCDEFGHIJLMNQUVWXZ (and other shifted versions of it, shown alphabet corresponds to key 'A')
                          GHIJLMNQUVWXZKRYPTOSABCDEF (and other shifted versions of it, shown alphabet corresponds to key 'N')

From this plot, it can be seen that when we do not know what was the keyword for creating the keyed alphabet, the cipher cannot be broken by simply aligning the frequency distributions and a more general approach for breaking the substitution is needed.

Keyed Caesar Solution #1: Maximum Likelihood, Combined with Search Procedure

However, since it is a substitution cipher, we could find the substitution and entire cipher alphabet as follows:

  • guessing the individual substitutions based on frequency: assume most frequent ciphertext character is 'E' etc
  • given this cipher alphabet, gradually try to improve the substitution by swapping characters in the cipher alphabet
  • measure the likelihood of plaintext being English as before, and keep track of best keys found so far

More on this approach can be found here.


In [33]:
ctext = keyed_caesar(ptext, "N", alpha_key="KRYPTOS", direction=1)
ctext


Out[33]:
'SQLOIAXYSATLQGOHLLKHRSQGYAFFXLGKJGZEOSLTEMRTSQROLCQRQRYLSRITGIWSQLIEYQLTLJZLOOGNLOIRKSGUKLJCUSQUKSQLOIAXYSATLOSCRSQRAOGKJGXYQGHLSUIXLSSLTOUKSQLSCLKSEELGTOOUKILWTEYSROCGOLTLISLJSQTLLRMSQLMRATOLISURKOQGBLHLLKIRKMUTZLJSRQGBLHLLKORXBLJKRRKLQGOELSHLLKGHXLSRORXBLSQLTLZGUKUKNKUKLSEOLBLKIQGTGISLTZLOOGNL'

In [34]:
def substitution(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", direction=1):
    if direction == 1:
        src_alphabet = alphabet
        dst_alphabet = key
    else:
        src_alphabet = key
        dst_alphabet = alphabet
    
    t = ""
    for c in s:
        if c in src_alphabet:
            t += dst_alphabet[src_alphabet.find(c)]
        else:
            t += c
    return t

In [35]:
import random
import re
from ngram_score import ngram_score
fitness = ngram_score('ngrams/en_sherlock_4grams') # load our quadgram statistics

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
maxkey = list(alphabet)
maxscore = -99e9
parentscore,parentkey = maxscore,maxkey[:]

print "Substitution Cipher solver, you may have to wait several iterations for the correct result. Halt execution by halting the kernel, using the stop button in the toolbar."
# keep going until we are killed by the user
i = 0
rec = nbest(1000)

MAX_ITER = 10

while i < MAX_ITER:
    i = i+1
    random.shuffle(parentkey)
    # KEYED CAESAR WITH KEY "A" IS JUST A SUBSTITUTION CIPHER
    deciphered = substitution(ctext, "".join(parentkey), alphabet, direction=-1)
    parentscore = fitness.score(deciphered)
    count = 0
    while count < 1000:
        a = random.randint(0,len(alphabet)-1)
        b = random.randint(0,len(alphabet)-1)
        child = parentkey[:]
        # swap two characters in the child
        child[a],child[b] = child[b],child[a]
        deciphered = substitution(ctext, "".join(child), alphabet, direction=-1)
        score = fitness.score(deciphered)
        # if the child was better, replace the parent with it
        if score > parentscore:
            parentscore = score
            parentkey = child[:]
            count = 0
        count = count+1
        rec.add((score, child, deciphered))
    # keep track of best score printed so far, and print if improved
    if parentscore > maxscore:
        maxscore = rec[0][0]
        maxkey = rec[0][1]
        #deciphered = rec[0][2]
        deciphered = substitution(ctext, "".join(maxkey), alphabet, direction=-1)
        print '\nbest score so far:',maxscore,'on iteration',i
        print '    best key: '+ "".join(maxkey)
        print '    plaintext: '+ deciphered


Substitution Cipher solver, you may have to wait several iterations for the correct result. Halt execution by halting the kernel, using the stop button in the toolbar.

best score so far: -1574.47410192 on iteration 1
    best key: RBZJLWMQIVAOHTGCPKSEUNYFXD
    plaintext: SHELIKYWSKNEHOLMEERMASHOWKXXYEORDOCTLSENTGANSHALEPHAHAWESAINOIFSHEITWHENEDCELLOVELIARSOUREDPUSHURSHELIKYWSKNELSPASHAKLORDOYWHOMESUIYESSENLURSHESPERSTTEONLLURIEFNTWSALPOLENEISEDSHNEEAGSHEGAKNLEISUARLHOBEMEERIARGUNCEDSAHOBEMEERLAYBEDRAAREHOLTESMEEROMYESALAYBESHENECOURURVRURESTLEBERIHONOISENCELLOVE

best score so far: -1262.44303835 on iteration 2
    best key: GHIJLMNQUPWXZKRYVTOSABCDEF
    plaintext: THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE

Cryptanalysis - Caesar with Keyed Alphabet

Effect of Caesar with Keyed Alphabet on Frequency Distributions


In [36]:
def caesar_keyed_alphabet(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="", direction=1):
    # compute keyed alphabet
    keyed_alphabet = compute_keyed_alphabet(alpha_key, alphabet)
    return rot(s, key, keyed_alphabet, direction)

In [37]:
ptext = "THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE"
alpha_key_ = "KRYPTOS"

In [38]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,9]
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
keyed_alphabet = compute_keyed_alphabet(alpha_key_, alphabet)

def rotate(l, n):
    return l[n:] + l[:n]

@interact(key_i = (0,25,1))
def plot_graphs(key_i = 0):
    # encrypt plaintext using this key
    key = keyed_alphabet[key_i]
    ctext = caesar_keyed_alphabet(ptext, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key=alpha_key_, direction=1)    
    
    # determine letter frequency in plaintext, and plot
    ptext_freq = np.array([ptext.count(c)/len(ptext) for c in keyed_alphabet])
    pfig = plt.subplot(3,1,1)
    plt.bar(range(len(keyed_alphabet)), ptext_freq, tick_label = list(keyed_alphabet), align = 'center', color = 'b')
    plt.title('blue = plaintext letter frequency, red = ciphertext letter frequency')

    # determine letter frequency in ciphertext, and plot, in order keyed alphabet
    ctext_freq = np.array([ctext.count(c)/len(ctext) for c in keyed_alphabet])
    cfig = plt.subplot(3,1,2)
    plt.bar(range(len(keyed_alphabet)), ctext_freq, tick_label = rotate(list(keyed_alphabet), key_i), align = 'center', color = 'r')
    plt.xlabel("key = %s" % key)
    
    # determine letter frequency in ciphertext, and plot, in order of alphabet
    ctext_freq = np.array([ctext.count(c)/len(ctext) for c in alphabet])
    cfig = plt.subplot(3,1,3)
    plt.bar(range(len(alphabet)), ctext_freq, tick_label = rotate(list(alphabet), key_i), align = 'center', color = 'r')
    plt.xlabel("key = %s" % key)
    
    plt.show()
    print ctext


THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE

If we look at the frequencies in order of the keyed alphabet, then again the effect of Caesar with Keyed Alphabet can be seen as shifting the frequency distributions. However, this is not the case when the looking at the frequencies in order of the alphabet.

Determine Key by Aligning Frequency Distributions?


In [39]:
ptext = "THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE"
ctext = caesar_keyed_alphabet(ptext, key="N", alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="KRYPTOS", direction=1)    
ctext


Out[39]:
'WATZYHDVWHQTAKZRTTFRXWAKVHMMDTKFPKEUZWTQUOXQWAXZTJAXAXVTWXYQKYNWATYUVATQTPETZZKSTZYXFWKBFTPJBWABFWATZYHDVWHQTZWJXWAXHZKFPKDVAKRTWBYDTWWTQZBFWATWJTFWUUTKQZZBFYTNQUVWXZJKZTQTYWTPWAQTTXOWATOXHQZTYWBXFZAKITRTTFYXFOBQETPWXAKITRTTFZXDITPFXXFTAKZUTWRTTFKRDTWXZXDITWATQTEKBFBFSFBFTWUZTITFYAKQKYWTQETZZKST'

In [40]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,6]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def rotate(l, n):
    return l[n:] + l[:n]

@interact(key_i = (0,25,1))
def plot_graphs(key_i = 0):
    # decrypt ciphertext using this key
    key = alphabet[key_i]
    # NOTE THAT WE DO NOT KNOW THE KEY TO THE ALPHABET, SO WE CANNOT USE THAT!!!
    decrypted_ctext = caesar_keyed_alphabet(ctext, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="A", direction=-1) 
    
    # determine letter frequency in plaintext, and plot
    ptext_freq = np.array([ptext.count(c)/len(ptext) for c in alphabet])
    pfig = plt.subplot(2,1,1)
    plt.bar(range(len(alphabet)), ptext_freq, tick_label = list(alphabet), align = 'center', color = 'b')
    plt.title('blue = plaintext letter frequency, green = decrypted ciphertext letter frequency')

    # determine letter frequency in ciphertext, and plot
    ctext_freq = np.array([decrypted_ctext.count(c)/len(decrypted_ctext) for c in alphabet])
    cfig = plt.subplot(2,1,2)
    plt.bar(range(len(alphabet)), ctext_freq, tick_label = rotate(list(alphabet), key_i), align = 'center', color = 'g')
    plt.xlabel("key = %s" % key)
    
    plt.show()
    print decrypted_ctext


WATZYHDVWHQTAKZRTTFRXWAKVHMMDTKFPKEUZWTQUOXQWAXZTJAXAXVTWXYQKYNWATYUVATQTPETZZKSTZYXFWKBFTPJBWABFWATZYHDVWHQTZWJXWAXHZKFPKDVAKRTWBYDTWWTQZBFWATWJTFWUUTKQZZBFYTNQUVWXZJKZTQTYWTPWAQTTXOWATOXHQZTYWBXFZAKITRTTFYXFOBQETPWXAKITRTTFZXDITPFXXFTAKZUTWRTTFKRDTWXZXDITWATQTEKBFBFSFBFTWUZTITFYAKQKYWTQETZZKST

Caesar with Keyed Alphabet Solution #1: Maximum Likelihood, Combined with Search Procedure


In [41]:
ptext = "THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE"
ctext = caesar_keyed_alphabet(ptext, key="N", alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="KRYPTOS", direction=1)    
ctext


Out[41]:
'WATZYHDVWHQTAKZRTTFRXWAKVHMMDTKFPKEUZWTQUOXQWAXZTJAXAXVTWXYQKYNWATYUVATQTPETZZKSTZYXFWKBFTPJBWABFWATZYHDVWHQTZWJXWAXHZKFPKDVAKRTWBYDTWWTQZBFWATWJTFWUUTKQZZBFYTNQUVWXZJKZTQTYWTPWAQTTXOWATOXHQZTYWBXFZAKITRTTFYXFOBQETPWXAKITRTTFZXDITPFXXFTAKZUTWRTTFKRDTWXZXDITWATQTEKBFBFSFBFTWUZTITFYAKQKYWTQETZZKST'

In [42]:
import random
import re
from ngram_score import ngram_score
fitness = ngram_score('ngrams/en_sherlock_4grams') # load our quadgram statistics

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
maxkey = list(alphabet)
maxscore = -99e9
parentscore,parentkey = maxscore,maxkey[:]

print "Substitution Cipher solver, you may have to wait several iterations for the correct result."
# keep going until we are killed by the user
i = 0
rec = nbest(1000)

MAX_ITER = 10 # increase if needed
while i < MAX_ITER:
    i = i+1
    random.shuffle(parentkey)
    # KEYED CAESAR WITH KEY "A" IS JUST A SUBSTITUTION CIPHER
    deciphered = substitution(ctext, "".join(parentkey), alphabet, direction=-1)
    parentscore = fitness.score(deciphered)
    count = 0
    while count < 1000:
        a = random.randint(0,len(alphabet)-1)
        b = random.randint(0,len(alphabet)-1)
        child = parentkey[:]
        # swap two characters in the child
        child[a],child[b] = child[b],child[a]
        deciphered = substitution(ctext, "".join(child), alphabet, direction=-1)
        score = fitness.score(deciphered)
        # if the child was better, replace the parent with it
        if score > parentscore:
            parentscore = score
            parentkey = child[:]
            count = 0
        count = count+1
        rec.add((score, child, deciphered))
    # keep track of best score printed so far, and print if improved
    if parentscore > maxscore:
        maxscore = rec[0][0]
        maxkey = rec[0][1]
        #deciphered = rec[0][2]
        deciphered = substitution(ctext, "".join(maxkey), alphabet, direction=-1)
        print '\nbest score so far:',maxscore,'on iteration',i
        print '    best key: '+ "".join(maxkey)
        print '    plaintext: '+ deciphered


Substitution Cipher solver, you may have to wait several iterations for the correct result.

best score so far: -1732.88847392 on iteration 1
    best key: QIUKZSEHWNOPYAFJMTBXRCVGDL
    plaintext: INREMHYWIHARNDEURROUTINDWHQQYRDOLDGCEIRACKTAINTERPNTNTWRITMADMJINRMCWNRARLGREEDFREMTOIDSORLPSINSOINREMHYWIHAREIPTINTHEDOLDYWNDURISMYRIIRAESOINRIPROICCRDAEESOMRJACWITEPDERARMIRLINARRTKINRKTHAERMISTOENDBRURROMTOKSAGRLITNDBRURROETYBRLOTTORNDECRIURRODUYRITETYBRINRARGDSOSOFOSORICERBROMNDADMIRAGREEDFR

best score so far: -1722.60104784 on iteration 2
    best key: XNJBKRUEFMHDPWTILZQAYSOGVC
    plaintext: NTORUKLYNKSOTERFOOIFANTEYKJJLOEIMEHGRNOSGWASNTAROCTATAYONAUSEUBNTOUGYTOSOMHORREVORUAINEDIOMCDNTDINTORUKLYNKSORNCANTAKREIMELYTEFONDULONNOSRDINTONCOINGGOESRRDIUOBSGYNARCEROSOUNOMNTSOOAWNTOWAKSROUNDAIRTEPOFOOIUAIWDSHOMNATEPOFOOIRALPOMIAAIOTERGONFOOIEFLONARALPONTOSOHEDIDIVIDIONGROPOIUTESEUNOSHORREVO

best score so far: -1628.20389331 on iteration 4
    best key: XEOPTRNBFMSQDYKHGZWAUIJLVC
    plaintext: STERNPMYSPLETORFEEIFASTOYPJJMEOIDOBURSELUCALSTAREWTATAYESANLONGSTENUYTELEDBERROKERNAISOHIEDWHSTHISTERNPMYSPLERSWASTAPROIDOMYTOFESHNMESSELRHISTESWEISUUEOLRRHINEGLUYSARWORELENSEDSTLEEACSTECAPLRENSHAIRTOVEFEEINAICHLBEDSATOVEFEEIRAMVEDIAAIETORUESFEEIOFMESARAMVESTELEBOHIHIKIHIESUREVEINTOLONSELBERROKE

best score so far: -1602.73906767 on iteration 6
    best key: XMEZTJURHLNDYQKOGFWABISPVC
    plaintext: STEDMILYSINETODHEERHASTOYIBBLEORXOCGDSENGPANSTADEFTATAYESAMNOMKSTEMGYTENEXCEDDOWEDMARSOUREXFUSTURSTEDMILYSINEDSFASTAIDORXOLYTOHESUMLESSENDURSTESFERSGGEONDDURMEKNGYSADFODENEMSEXSTNEEAPSTEPAINDEMSUARDTOVEHEERMARPUNCEXSATOVEHEERDALVEXRAARETODGESHEEROHLESADALVESTENECOURURWRURESGDEVERMTONOMSENCEDDOWE

best score so far: -1262.44303835 on iteration 7
    best key: KRYPTOSABLNDEFXVGQZWHIJCUM
    plaintext: THESCULPTUREHASBEENBOTHAPUZZLEANDAMYSTERYFORTHOSEWHOHOPETOCRACKTHECYPHEREDMESSAGESCONTAINEDWITHINTHESCULPTURESTWOTHOUSANDALPHABETICLETTERSINTHETWENTYYEARSSINCEKRYPTOSWASERECTEDTHREEOFTHEFOURSECTIONSHAVEBEENCONFIRMEDTOHAVEBEENSOLVEDNOONEHASYETBEENABLETOSOLVETHEREMAININGNINETYSEVENCHARACTERMESSAGE

Keyed Vigenere

Now we will move on to the Keyed Vigenere cipher, which is very similar to the Caesar cipher with a keyed alphabet, except that the consecutive shifts of the plaintext characters are determined by a keyword. If the key has length $N$, then each $N^{th}$ character of the plaintext will be shifted by the same shift, determined by the corresponding character in the key. This relation of Vigenere being N interleaved Caesar ciphers with keyed alphabet is also visible from the code:


In [43]:
def keyed_vigenere(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="", direction=1):
    # compute keyed alphabet
    keyed_alphabet = compute_keyed_alphabet(alpha_key, alphabet)
    t = []
    key_i = 0
    for i in range(len(s)):
        if s[i] in alphabet:
            t_i = rot(s[i], key[key_i % len(key)],keyed_alphabet, direction)
            t.append(t_i)
            key_i += 1
        else:
            t.append(s[i])
    return "".join(t)

Cryptanalysis - Keyed Vigenere

First, let's load some Keyed Vigenere ciphertext, with unknown key and unknown alphabet key.


In [44]:
ctext = "BYDMXVPQAMCWFJGOTCCNXTKHOGSEIPAJTWGNBYDNBHCJWLBNCBHPNRNXNHSUOXHLTQQKYAUEMBOUUBWGNQVWUFKKBRVIITHVWNOWCSEXWOICJBPANTCNRPTNHSJVVLICIRSEIAPMQEEHIIVRETZGSEIQPLAXLXKOTWPCSOJGOSEEOKETKBDOOBHSJVPMOWHLTSHHPOVGVATKBSEVTFDVHONNQALNUKOXHPCDQKYAUZSPQPSKAGFPOQWBCWUHLPLMSAAZPBIECBXOTFMIKTITTICPOFTVHONKFIZDNCAJIVWDNXUWUHLCSEWTIHTXOECKBGUUMOHSUUCNLEQJETEOMABYDRUURUHLSWSJDAPMPPDNJIOBTCADIHYNBRQOGEFKQDAEBEHXDOFCNGXEFJVSELPZEHTSCLPPJAHEAPLEIZACVODENXXACTANMLAJITWXSORTPOWDDUEMGZITKOSHSNRONPYGCXVWNHVBUSNXDNZHKQTLWNEPKCJEUYLRXECNRATZEXBNHOCPQWUKYGNBYDTOTPUYAQALSKCNRAIRTITNBTPPLLCGTCEBTVOTWHGTXMIEHKQPNRTZKHYPFAMOPUJCDUOWCSEZCGQEMGVAJTBUHLSLITJVVLICIRSEIITSQFSNCDSSLGLRXKONNPRKAJVVAEABLGSJCJRLOWFOGQDHXTGEYTOIDLVHPITAND"

Determine if Plaintext


In [45]:
IC(ctext)


Out[45]:
1.0775083214455536

The IC tells us that it is likely not a transposition of English plaintext.

Determine Key Length


In [46]:
from matplotlib import pyplot as plt
%matplotlib inline

mean_ic_n = []
for n in range(1,32):
    mean_ic = mean_IC(ctext.upper(), n)
    mean_ic_n.append(mean_ic)
    print "%2d %02f" % (n, mean_ic)
plt.rcParams["figure.figsize"] = [12,4]
plt.bar(range(1,len(mean_ic_n)+1), mean_ic_n, align = 'center')
plt.xlabel("key length")
plt.ylabel("average IC")


 1 1.077508
 2 1.080635
 3 1.262120
 4 1.086865
 5 1.075394
 6 1.296935
 7 1.062786
 8 1.072415
 9 1.732564
10 1.060137
11 1.038554
12 1.298596
13 1.063592
14 1.070083
15 1.226713
16 1.074336
17 1.069878
18 1.793372
19 1.013136
20 1.058571
21 1.320028
22 1.013282
23 1.069910
24 1.257485
25 1.119788
26 1.042735
27 1.688889
28 1.035714
29 1.081189
30 1.218928
31 1.102494
Out[46]:
<matplotlib.text.Text at 0x7f3111d22d50>

The average IC for different key lengths tells us that it is likely encrypted with a key of length 9. We therefore have to crack 9 Caesar ciphers with a Keyed Alphabet, each of them with their own shift as defined by the Vigenere key.

Frequency analysis of subsequences

Now, let's look at the frequency distributions for the 9 subsequences:


In [47]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,16]

def rotate(l, n):
    return l[n:] + l[:n]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

key_length = 9

g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]
pfig = plt.subplot(key_length+1,1,1)
plt.bar(range(len(alphabet)), g_english, tick_label = rotate(list(alphabet), 0), align = 'center', color = 'b')

ctext_freq = []
for i in range(key_length):
    ctext_i = ctext[i::key_length]
    ctext_freq.append(np.array([ctext_i.count(c)/len(ctext_i) for c in alphabet]))
    
    cfig = plt.subplot(key_length+1,1,i+2)
    plt.bar(range(len(alphabet)), ctext_freq[i], tick_label = rotate(list(alphabet), 0), align = 'center', color = 'r')


Each distribution looks quite different as a result using Caesar ciphers keyed alphabet. Each of the 9 Caesar ciphers with keyed alphabet uses the same unknown keyed alphabet, and its own shift, as defined by the 9-letter Vigenere keyword. Therefore, we need to determine both the keyed alphabet, and the Vigenere keywords.

CHALLENGE: Determine Vigenere Key in Special Case of Keyed Alphabet Starting with Letter E


Luck has it that the key to determine the keyed alphabet starts with the letter E. As a result, it becomes pretty easy to determine the Vigenere key. Can you guess the Vigenere key from the frequency distribution plots above?

Case 1: Vigenere Key is Known, Alphabet Key Unknown

When the Vigenere key is known, the alphabet key can often be determined like before through e.g. brute force:


In [48]:
from ngram_score import ngram_score

import re
import pprint as pp

qgram = ngram_score('ngrams/en_sherlock_4grams') # load our 4gram statistics
trigram = ngram_score('ngrams/en_sherlock_3grams') # load our 3gram statistics

# keep track of the 100 best keys
N=100 
rec = nbest(N)
f = open('wordlists/websters-dictionary','r')
L = 42 # for speed, only consider first L characters in evaluation

i = 1
for alphakey_ in f:    
    alphakey_ = re.sub(r'[^A-Z]','',alphakey_.upper())
    decrypted_ctext = keyed_vigenere(ctext[:L], key = "LODESTONE", alpha_key = alphakey_, direction = -1)
    score = qgram.score(decrypted_ctext)
    rec.add((score,alphakey_, decrypted_ctext))
    i += 1
    if i % 10000 == 0:
        bestscore = rec[0][0]
        bestalphakey_ = rec[0][1]
        decrypted_ctext = keyed_vigenere(ctext, key = "LODESTONE", alpha_key = bestalphakey_, direction = -1)
        print "%20s\t%5.2f,\t%s,\t%s,\t%s" % (alphakey_, bestscore, len(bestalphakey_), bestalphakey_, decrypted_ctext)


          ANTICRISIS	-240.04,	8,	ALANGINE,	EHADENLOVKTOITHARNBOPOURALMIQCVTADLXEHAXORTFRASUNORLAKLFUEALAVEAEDJUEKSAKSBPNMDLXPCOPRTYIKUQWOSNDAFVTHAEGANNHSCVHATAKOEUEASCTWGTWKAKQYHKNXASJQTKIERYAKQOHAKPWETARROTHFTHAQAIAIALTSBFMSNMTNLJFVWTOARWMFUPMVLTSQAUELGGRAAXPKTXNTAVEOTAJUEKSURLDHATKLIOADROODSEALTDACKXHEQXNOIARIKQIOKAENNOALOGRAACDQRGHOKFZUDAXELDSEATHAIAQDOWAXNUMPSPKANMNLTAWINGALKAJVEHAKNLGSEAIOMTPKMDOLAXTJAIOBKAZSEOIKPAVARTNBVISXEEPACNLPPARSCQAALRASAIEWOLGVSKKMWIQRVPNABALFPVPAKADAKGZLGFQFQECFIPUSAKPRZLTAQEROFFHWHLNWCOXSNSSMLFAXCRYOOADUAXTTFATHTKEKTAKZERAEMODFBLDRNTHLXEHAOWALSTZNSWATTAKZQFOKAOIOOLTWPHEEAEEMFLGWLOWZWASTNMXQERCSELCVKACPTOUSFVTHACOPOAKPMVTASSEAITZLSCTWGTWKAKQNORNLMHOUQMAPTKETAAXOGIVTNCYAZSTYASTFKAAOIWHNBEWEVABAANGACNHKAKAG
         BEERBACHITE	-240.04,	8,	ALANGINE,	EHADENLOVKTOITHARNBOPOURALMIQCVTADLXEHAXORTFRASUNORLAKLFUEALAVEAEDJUEKSAKSBPNMDLXPCOPRTYIKUQWOSNDAFVTHAEGANNHSCVHATAKOEUEASCTWGTWKAKQYHKNXASJQTKIERYAKQOHAKPWETARROTHFTHAQAIAIALTSBFMSNMTNLJFVWTOARWMFUPMVLTSQAUELGGRAAXPKTXNTAVEOTAJUEKSURLDHATKLIOADROODSEALTDACKXHEQXNOIARIKQIOKAENNOALOGRAACDQRGHOKFZUDAXELDSEATHAIAQDOWAXNUMPSPKANMNLTAWINGALKAJVEHAKNLGSEAIOMTPKMDOLAXTJAIOBKAZSEOIKPAVARTNBVISXEEPACNLPPARSCQAALRASAIEWOLGVSKKMWIQRVPNABALFPVPAKADAKGZLGFQFQECFIPUSAKPRZLTAQEROFFHWHLNWCOXSNSSMLFAXCRYOOADUAXTTFATHTKEKTAKZERAEMODFBLDRNTHLXEHAOWALSTZNSWATTAKZQFOKAOIOOLTWPHEEAEEMFLGWLOWZWASTNMXQERCSELCVKACPTOUSFVTHACOPOAKPMVTASSEAITZLSCTWGTWKAKQNORNLMHOUQMAPTKETAAXOGIVTNCYAZSTYASTFKAAOIWHNBEWEVABAANGACNHKAKAG
            CAPRIOTE	-240.04,	8,	ALANGINE,	EHADENLOVKTOITHARNBOPOURALMIQCVTADLXEHAXORTFRASUNORLAKLFUEALAVEAEDJUEKSAKSBPNMDLXPCOPRTYIKUQWOSNDAFVTHAEGANNHSCVHATAKOEUEASCTWGTWKAKQYHKNXASJQTKIERYAKQOHAKPWETARROTHFTHAQAIAIALTSBFMSNMTNLJFVWTOARWMFUPMVLTSQAUELGGRAAXPKTXNTAVEOTAJUEKSURLDHATKLIOADROODSEALTDACKXHEQXNOIARIKQIOKAENNOALOGRAACDQRGHOKFZUDAXELDSEATHAIAQDOWAXNUMPSPKANMNLTAWINGALKAJVEHAKNLGSEAIOMTPKMDOLAXTJAIOBKAZSEOIKPAVARTNBVISXEEPACNLPPARSCQAALRASAIEWOLGVSKKMWIQRVPNABALFPVPAKADAKGZLGFQFQECFIPUSAKPRZLTAQEROFFHWHLNWCOXSNSSMLFAXCRYOOADUAXTTFATHTKEKTAKZERAEMODFBLDRNTHLXEHAOWALSTZNSWATTAKZQFOKAOIOOLTWPHEEAEEMFLGWLOWZWASTNMXQERCSELCVKACPTOUSFVTHACOPOAKPMVTASSEAITZLSCTWGTWKAKQNORNLMHOUQMAPTKETAAXOGIVTNCYAZSTYASTFKAAOIWHNBEWEVABAANGACNHKAKAG
          COMMORIENT	-240.04,	8,	ALANGINE,	EHADENLOVKTOITHARNBOPOURALMIQCVTADLXEHAXORTFRASUNORLAKLFUEALAVEAEDJUEKSAKSBPNMDLXPCOPRTYIKUQWOSNDAFVTHAEGANNHSCVHATAKOEUEASCTWGTWKAKQYHKNXASJQTKIERYAKQOHAKPWETARROTHFTHAQAIAIALTSBFMSNMTNLJFVWTOARWMFUPMVLTSQAUELGGRAAXPKTXNTAVEOTAJUEKSURLDHATKLIOADROODSEALTDACKXHEQXNOIARIKQIOKAENNOALOGRAACDQRGHOKFZUDAXELDSEATHAIAQDOWAXNUMPSPKANMNLTAWINGALKAJVEHAKNLGSEAIOMTPKMDOLAXTJAIOBKAZSEOIKPAVARTNBVISXEEPACNLPPARSCQAALRASAIEWOLGVSKKMWIQRVPNABALFPVPAKADAKGZLGFQFQECFIPUSAKPRZLTAQEROFFHWHLNWCOXSNSSMLFAXCRYOOADUAXTTFATHTKEKTAKZERAEMODFBLDRNTHLXEHAOWALSTZNSWATTAKZQFOKAOIOOLTWPHEEAEEMFLGWLOWZWASTNMXQERCSELCVKACPTOUSFVTHACOPOAKPMVTASSEAITZLSCTWGTWKAKQNORNLMHOUQMAPTKETAAXOGIVTNCYAZSTYASTFKAAOIWHNBEWEVABAANGACNHKAKAG
          DEGRADEDLY	-240.04,	8,	ALANGINE,	EHADENLOVKTOITHARNBOPOURALMIQCVTADLXEHAXORTFRASUNORLAKLFUEALAVEAEDJUEKSAKSBPNMDLXPCOPRTYIKUQWOSNDAFVTHAEGANNHSCVHATAKOEUEASCTWGTWKAKQYHKNXASJQTKIERYAKQOHAKPWETARROTHFTHAQAIAIALTSBFMSNMTNLJFVWTOARWMFUPMVLTSQAUELGGRAAXPKTXNTAVEOTAJUEKSURLDHATKLIOADROODSEALTDACKXHEQXNOIARIKQIOKAENNOALOGRAACDQRGHOKFZUDAXELDSEATHAIAQDOWAXNUMPSPKANMNLTAWINGALKAJVEHAKNLGSEAIOMTPKMDOLAXTJAIOBKAZSEOIKPAVARTNBVISXEEPACNLPPARSCQAALRASAIEWOLGVSKKMWIQRVPNABALFPVPAKADAKGZLGFQFQECFIPUSAKPRZLTAQEROFFHWHLNWCOXSNSSMLFAXCRYOOADUAXTTFATHTKEKTAKZERAEMODFBLDRNTHLXEHAOWALSTZNSWATTAKZQFOKAOIOOLTWPHEEAEEMFLGWLOWZWASTNMXQERCSELCVKACPTOUSFVTHACOPOAKPMVTASSEAITZLSCTWGTWKAKQNORNLMHOUQMAPTKETAAXOGIVTNCYAZSTYASTFKAAOIWHNBEWEVABAANGACNHKAKAG
               ELDER	-240.04,	8,	ALANGINE,	EHADENLOVKTOITHARNBOPOURALMIQCVTADLXEHAXORTFRASUNORLAKLFUEALAVEAEDJUEKSAKSBPNMDLXPCOPRTYIKUQWOSNDAFVTHAEGANNHSCVHATAKOEUEASCTWGTWKAKQYHKNXASJQTKIERYAKQOHAKPWETARROTHFTHAQAIAIALTSBFMSNMTNLJFVWTOARWMFUPMVLTSQAUELGGRAAXPKTXNTAVEOTAJUEKSURLDHATKLIOADROODSEALTDACKXHEQXNOIARIKQIOKAENNOALOGRAACDQRGHOKFZUDAXELDSEATHAIAQDOWAXNUMPSPKANMNLTAWINGALKAJVEHAKNLGSEAIOMTPKMDOLAXTJAIOBKAZSEOIKPAVARTNBVISXEEPACNLPPARSCQAALRASAIEWOLGVSKKMWIQRVPNABALFPVPAKADAKGZLGFQFQECFIPUSAKPRZLTAQEROFFHWHLNWCOXSNSSMLFAXCRYOOADUAXTTFATHTKEKTAKZERAEMODFBLDRNTHLXEHAOWALSTZNSWATTAKZQFOKAOIOOLTWPHEEAEEMFLGWLOWZWASTNMXQERCSELCVKACPTOUSFVTHACOPOAKPMVTASSEAITZLSCTWGTWKAKQNORNLMHOUQMAPTKETAAXOGIVTNCYAZSTYASTFKAAOIWHNBEWEVABAANGACNHKAKAG
            FIRELING	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
              GROOTY	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           HYPOTOXIC	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
            JONGLERY	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
         MALAPROPISH	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           MULTIFOIL	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
        OMBROPHOBOUS	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           PAUPERATE	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
        POLYSYLLABIC	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           PYGOPAGUS	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           ROMERILLO	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
          SHRIMPLIKE	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           STRIDENCE	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
           TETRAPODA	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
       UNADVENTURING	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
        UNPRETENDING	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED
              WAXILY	-160.88,	6,	ENIGMA,	THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED

CHALLENGE: A Special Case


Break the following keyed vigenere ciphertext:
KWRGHGPEOKDBXGAWVMJDFVCVDHJZAMQGRNFHKWRLAVKWKWVFSCHLIVGRZCIHDUAWHPUFOKDNTLLEFCWHVBKBJOZRHVWCYVVGWAWADSDKNWGMBLUUPNKAHOHFZAJICCLGYKIMGZFKFODLIYCVHKHBIMGEFWEZHKZWVLMGAYDSDORZIYDJZMKBPLEMDGPSWAOJKAHPABWEXUGFIORNHHWSHWIHYOOLFFDUAODNUFOKDIQJTHIFTHIOIPNCWNDSDJOGIXTIFVAOSCDWVQTCDVRNQJUOIHKSHWIXNCHNPBTWENMNPKVNDSDGAWXNGRZCIOSFXHDDTAEMFHKAXZFRDJUWQJKWRKFHYGAWXBIGFKAYMJRLDQDKZEENMLOZHVYASWEFMPOZVOZKFWMMGEFWEOSORWGVDLRGJCMJNUVMTCXZAVQEEWKNGRFUYNTADWERMJNAEBOKUYXZJGRKYVMJZWESQDIYPLZHUCKBPLEMDOGRRLKVBEZWMFDTZVRNSWOKHMKAHMHVDKXZPBJJTPFFZHVVRNKURLDTIHMZIFKAHMAQKWRZHFMJOZYSQMRVHCGJNPFFZVYWVFMCVGHVVLOLMJTAUEDBJGWADSDPWHYNTEXUDNIGAWXJMJJICCLGYKIMGJZUFHIRWEEODEOKHFDAVOQYQGEIXNILOBIOKWHWIBXUAOKSZKSWJNDJTWKGLWRKIP

Case 2: Vigenere Key is Unknown, Alphabet Key is Known

Now, let's take the same ciphertext and suppose we only know the alphabet key.


In [49]:
ctext = "BYDMXVPQAMCWFJGOTCCNXTKHOGSEIPAJTWGNBYDNBHCJWLBNCBHPNRNXNHSUOXHLTQQKYAUEMBOUUBWGNQVWUFKKBRVIITHVWNOWCSEXWOICJBPANTCNRPTNHSJVVLICIRSEIAPMQEEHIIVRETZGSEIQPLAXLXKOTWPCSOJGOSEEOKETKBDOOBHSJVPMOWHLTSHHPOVGVATKBSEVTFDVHONNQALNUKOXHPCDQKYAUZSPQPSKAGFPOQWBCWUHLPLMSAAZPBIECBXOTFMIKTITTICPOFTVHONKFIZDNCAJIVWDNXUWUHLCSEWTIHTXOECKBGUUMOHSUUCNLEQJETEOMABYDRUURUHLSWSJDAPMPPDNJIOBTCADIHYNBRQOGEFKQDAEBEHXDOFCNGXEFJVSELPZEHTSCLPPJAHEAPLEIZACVODENXXACTANMLAJITWXSORTPOWDDUEMGZITKOSHSNRONPYGCXVWNHVBUSNXDNZHKQTLWNEPKCJEUYLRXECNRATZEXBNHOCPQWUKYGNBYDTOTPUYAQALSKCNRAIRTITNBTPPLLCGTCEBTVOTWHGTXMIEHKQPNRTZKHYPFAMOPUJCDUOWCSEZCGQEMGVAJTBUHLSLITJVVLICIRSEIITSQFSNCDSSLGLRXKONNPRKAJVVAEABLGSJCJRLOWFOGQDHXTGEYTOIDLVHPITAND"

In [50]:
from itertools import permutations
from ngram_score import ngram_score

import re
import pprint as pp

qgram = ngram_score('ngrams/en_sherlock_4grams') # load our 4gram statistics
trigram = ngram_score('ngrams/en_sherlock_3grams') # load our 3gram statistics

# keep track of the 100 best keys
N=100 
L=42 # only process L characters for speed
alphakey = "ENIGMA"

for KLEN in [9]:  #range(3,10):    
    print "="*80
    rec = nbest(N)
    
    # exhaustively test all possible letters for first 3 entries of the key and keep track of the N best ones
    # if KLEN=7, this will test e.g. FOOAAAA and BARAAAA 
    for i in permutations('ABCDEFGHIJKLMNOPQRSTUVWXYZ',3):
        i = "".join(i)
        key = ''.join(i) + 'A'*(KLEN-len(i))
        decrypted_ctext = keyed_vigenere(ctext[:L], key, alpha_key = alphakey, direction = -1)
        score = 0
        for j in range(0,len(ctext),KLEN):
            score += trigram.score(decrypted_ctext[j:j+3])
        rec.add((score,''.join(i), decrypted_ctext))
    next_rec = nbest(N)
    
    # for the remaining KLEN-3 characters of the key,
    for i in range(0,KLEN-3):
        # go over the N best keys found so far...
        for k in xrange(N):
            # ...and determine the best next character of the key, while keeping best N keys so far
            for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
                key = rec[k][1] + c
                fullkey = key + 'A'*(KLEN-len(key))
                decrypted_ctext = keyed_vigenere(ctext[:L], fullkey, alpha_key = alphakey, direction = -1)
                score = 0
                for j in range(0,len(ctext),KLEN):
                    score += qgram.score(decrypted_ctext[j:j+len(key)])
                next_rec.add((score, key, decrypted_ctext))
        rec = next_rec
        next_rec = nbest(N)
       
    # show the results
    bestscore = rec[0][0]
    bestkey = rec[0][1]    
    #decrypted_ctext = rec[0][2]
    # always show entire decrypted ctext, even if the above analysis is done only on part of the ctext, e.g. ctext[0:100]
    decrypted_ctext = keyed_vigenere(ctext, bestkey, alpha_key = alphakey, direction = -1) 
    print bestscore, 'klen', KLEN, ':"'+bestkey+'",', decrypted_ctext
    # uncomment the following lines to see top-10 results
    #pp.pprint(rec.store[0:10])
    #print '\n'


================================================================================
-110.240294115 klen 9 :"LODESTONE", THEMAINPARTOFTHESCULPTUREISLOCATEDINTHENORTHWESTCORNEROFTHENEWHEADQUARTERSBUILDINGCOURTYARDOUTSIDEOFTHEAGENCYSCAFETERIATHESCULPTURECOMPRISESFOURLARGECOPPERPLATESWITHOTHERELEMENTSCONSISTINGOFWATERWOODPLANTSREDANDGREENGRANITEWHITEQUARTZANDPETRIFIEDWOODTHENAMEKRYPTOSCOMESFROMTHEANCIENTGREEKWORDFORHIDDENANDTHETHEMEOFTHESCULPTUREISINTELLIGENCEGATHERINGTHEMOSTPROMINENTFEATUREISALARGEVERTICALSSHAPEDCOPPERSCREENRESEMBLINGASCROLLORAPIECEOFPAPEREMERGINGFROMACOMPUTERPRINTERHALFOFWHICHCONSISTSOFENCRYPTEDTEXTTHECHARACTERSAREALLFOUNDWITHINTHETWENTYSIXLETTERSOFTHELATINALPHABETALONGWITHQUESTIONMARKSANDARECUTOUTOFTHECOPPERPLATESTHEMAINSCULPTURECONTAINSFOURSEPARATEENIGMATICMESSAGESTHREEOFWHICHHAVEBEENDECIPHERED

Kryptos


In [51]:
kryptos_L1 = """
EMUFPHZLRFAXYUSDJKZLDKRNSHGNFIVJ
YQTQUXQBQVYUVLLTREVJYQTMKYRDMFD
VFPJUDEEHZWETZYVGWHKKQETGFQJNCE
GGWHKK?DQMCPFQZDQMMIAGPFXHQRLG
TIMVMZJANQLVKQEDAGDVFRPJUNGEUNA
QZGZLECGYUXUEENJTBJLBQCRTBJDFHRR
YIZETKZEMVDUFKSJHKFWHKUWQLSZFTI
HHDDDUVH?DWKBFUFPWNTDFIYCUQZERE
EVLDKFEZMOQQJLTTUGSYQPFEUNLAVIDX
FLGGTEZ?FKZBSFDQVGOGIPUFXHHDRKF
FHQNTGPUAECNUVPDJMQCLQUMUNEDFQ
ELZZVRRGKFFVOEEXBDMVPNFQXEZLGRE
DNQFMPNZGLFLPMRJQYALMGNUVPDXVKP
DQUMEBEDMHDAFMJGZNUPLGEWJLLAETG"""

kryptos_L2 = """
ENDYAHROHNLSRHEOCPTEOIBIDYSHNAIA
CHTNREYULDSLLSLLNOHSNOSMRWXMNE
TPRNGATIHNRARPESLNNELEBLPIIACAE
WMTWNDITEENRAHCTENEUDRETNHAEOE
TFOLSEDTIWENHAEIOYTEYQHEENCTAYCR
EIFTBRSPAMHHEWENATAMATEGYEERLB
TEEFOASFIOTUETUAEOTOARMAEERTNRTI
BSEDDNIAAHTTMSTEWPIEROAGRIEWFEB
AECTDDHILCEIHSITEGOEAOSDDRYDLORIT
RKLMLEHAGTDHARDPNEOHMGFMFEUHE
ECDMRIPFEIMEHNLSSTTRTVDOHW?OBKR
UOXOGHULBSOLIFBBWFLRVQQPRNGKSSO
TWTQSJQSSEKZZWATJKLUDIAWINFBNYP
VTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR"""

In [52]:
ctext_kryptos = (kryptos_L1+kryptos_L2).replace(" ", "").strip().split("\n")

Prerequisites

These are the functions used for deciphering Kryptos K1 and K2 (copied from above for easy initialization of the notebook).


In [53]:
def rot(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", direction=1):
    keyval = alphabet.find(key)
    t = ""
    for sc in s:
        i = alphabet.find(sc)
        t += alphabet[(i + keyval * direction) % len(alphabet)] if i > -1 else sc
    return t

In [54]:
import re

def compute_keyed_alphabet(keyword, alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
    # remove double keyword letters, keeping first occurence
    keyword_unique = ""
    for c in keyword:
        if c not in keyword_unique:
            keyword_unique += c
    # compute cipher alphabet
    keyed_alphabet = keyword_unique + re.sub("[%s]" % keyword, "", alphabet)
    return keyed_alphabet

In [55]:
def keyed_vigenere(s, key, alphabet="ABCDEFGHIJKLMNOPQRSTUVWXYZ", alpha_key="", direction=1):
    # compute keyed alphabet
    keyed_alphabet = compute_keyed_alphabet(alpha_key, alphabet)
    t = []
    key_i = 0
    for i in range(len(s)):
        if s[i] in alphabet:
            t_i = rot(s[i], key[key_i % len(key)],keyed_alphabet, direction)
            t.append(t_i)
            key_i += 1
        else:
            t.append(s[i])
    return "".join(t)

In [56]:
from __future__ import division
import numpy as np

# computes the IC for a given string
def IC(s, alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    s_filtered = filter(s, alphabet) # we only care about the letters in our alphabet
    ic = 0
    for c in alphabet:
        c_count = s_filtered.count(c)
        N = len(s_filtered)
        if c_count > 1:
            ic += (c_count * (c_count - 1)) / (N * (N - 1) / len(alphabet))
    return ic

# helper function to filter out non-alphabet characters
def filter(s, alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    return "".join([c for c in s if c in alphabet])

# computes the avg IC of subsequences of each n'th character
def mean_IC(s, n, alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
    s_filtered = filter(s, alphabet) # we only care about the letters in our alphabet
    s_filtered_subseq = [s_filtered[i::n] for i in range(n)]
    ic = []
    for i in range(n):
        ic.append(IC(s_filtered_subseq[i]))
    return np.mean(ic)

In [57]:
# keep a list of the N best things we have seen, discard anything else
# this class can also be imported using 'import nbest' from a separate file nbest.py
class nbest(object):
    def __init__(self,N=1000):
        self.store = []
        self.N = N
        
    def add(self,item):
        self.store.append(item)
        self.store.sort(reverse=True)
        self.store = self.store[:self.N]
    
    def __getitem__(self,k):
        return self.store[k]

    def __len__(self):
        return len(self.store)

Kryptos - K1


In [58]:
ctext = "".join(ctext_kryptos[:2])
ctext


Out[58]:
'EMUFPHZLRFAXYUSDJKZLDKRNSHGNFIVJYQTQUXQBQVYUVLLTREVJYQTMKYRDMFD'

K1 - Determine if plaintext


In [59]:
IC(ctext)


Out[59]:
0.9851510496671787

Based on the IC, it looks like this is not plaintext, or a transposition of plaintext.

K1 - Determine Key Length


In [60]:
from matplotlib import pyplot as plt
%matplotlib inline

mean_ic_n = []
for n in range(1,32):
    mean_ic = mean_IC(ctext.upper(), n)
    mean_ic_n.append(mean_ic)
    print "%2d %02f" % (n, mean_ic)
plt.rcParams["figure.figsize"] = [12,4]
plt.bar(range(1,len(mean_ic_n)+1), mean_ic_n, align = 'center')
plt.xlabel("key length")
plt.ylabel("average IC")


 1 0.985151
 2 1.025672
 3 0.949206
 4 0.897619
 5 2.054545
 6 0.840404
 7 0.928571
 8 0.232143
 9 0.825397
10 2.377143
11 0.709091
12 0.650000
13 1.933333
14 0.680952
15 1.617778
16 0.000000
17 0.254902
18 0.962963
19 0.912281
20 3.900000
21 0.412698
22 0.393939
23 0.376812
24 0.000000
25 1.040000
26 3.333333
27 1.283951
28 0.000000
29 0.298851
30 1.733333
31 0.000000
Out[60]:
<matplotlib.text.Text at 0x7f3112011150>

From the mean IC and peaks at key length 10 and 20, it looks like the key length might be 10.

K1 - Frequency Analysis of Subsequences #1


In [61]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,16]

def rotate(l, n):
    return l[n:] + l[:n]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

key_length = 10

g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]
pfig = plt.subplot(key_length+1,1,1)
plt.bar(range(len(alphabet)), g_english, tick_label = rotate(list(alphabet), 0), align = 'center', color = 'b')

ctext_freq = []
for i in range(key_length):
    ctext_i = ctext[i::key_length]
    ctext_freq.append(np.array([ctext_i.count(c)/len(ctext_i) for c in alphabet]))
    
    cfig = plt.subplot(key_length+1,1,i+2)
    plt.bar(range(len(alphabet)), ctext_freq[i], tick_label = rotate(list(alphabet), 0), align = 'center', color = 'r')


K1 - Frequency Analysis of Subsequences #2

Now, let's try that again, but use the cipher alphabet order.


In [62]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,16]

def rotate(l, n):
    return l[n:] + l[:n]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
keyed_alphabet = compute_keyed_alphabet("KRYPTOS")

key_length = 10

g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]
g_english_keyed = [g_english[alphabet.find(c)] for c in keyed_alphabet]
pfig = plt.subplot(key_length+1,1,1)
plt.bar(range(len(alphabet)), g_english_keyed, tick_label = rotate(list(keyed_alphabet), 0), align = 'center', color = 'b')

ctext_freq = []
for i in range(key_length):
    ctext_i = ctext[i::key_length]
    ctext_freq.append(np.array([ctext_i.count(c)/len(ctext_i) for c in keyed_alphabet]))
    
    cfig = plt.subplot(key_length+1,1,i+2)
    plt.bar(range(len(keyed_alphabet)), ctext_freq[i], tick_label = rotate(list(keyed_alphabet), 0), align = 'center', color = 'r')


K1 - Vigenere Key is Unknown, Alphabet Key is Known


In [63]:
from itertools import permutations
from ngram_score import ngram_score

import re
import pprint as pp

qgram = ngram_score('ngrams/en_sherlock_4grams') # load our 4gram statistics
trigram = ngram_score('ngrams/en_sherlock_3grams') # load our 3gram statistics

# keep track of the 100 best keys
N=100 
L=42 # only process L characters for speed
alphakey = "KRYPTOS"

for KLEN in range(3,16):    
    print "="*80
    rec = nbest(N)
    
    # exhaustively test all possible letters for first 3 entries of the key and keep track of the N best ones
    # if KLEN=7, this will test e.g. FOOAAAA and BARAAAA 
    for i in permutations('ABCDEFGHIJKLMNOPQRSTUVWXYZ',3):
        i = "".join(i)
        key = ''.join(i) + 'A'*(KLEN-len(i))
        decrypted_ctext = keyed_vigenere(ctext[:L], key, alpha_key = alphakey, direction = -1)
        score = 0
        for j in range(0,len(ctext),KLEN):
            score += trigram.score(decrypted_ctext[j:j+3])
        rec.add((score,''.join(i), decrypted_ctext))
    next_rec = nbest(N)
    
    # for the remaining KLEN-3 characters of the key,
    for i in range(0,KLEN-3):
        # go over the N best keys found so far...
        for k in xrange(N):
            # ...and determine the best next character of the key, while keeping best N keys so far
            for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
                key = rec[k][1] + c
                fullkey = key + 'A'*(KLEN-len(key))
                decrypted_ctext = keyed_vigenere(ctext[:L], fullkey, alpha_key = alphakey, direction = -1)
                score = 0
                for j in range(0,len(ctext),KLEN):
                    score += qgram.score(decrypted_ctext[j:j+len(key)])
                next_rec.add((score, key, decrypted_ctext))
        rec = next_rec
        next_rec = nbest(N)
       
    # show the results
    bestscore = rec[0][0]
    bestkey = rec[0][1]    
    #decrypted_ctext = rec[0][2]
    # always show entire decrypted ctext, even if the above analysis is done only on part of the ctext, e.g. ctext[0:100]
    decrypted_ctext = keyed_vigenere(ctext, bestkey, alpha_key = alphakey, direction = -1) 
    print bestscore, 'klen', KLEN, ':"'+bestkey+'",', decrypted_ctext
    # uncomment the following lines to see top-10 results
    #pp.pprint(rec.store[0:10])
    #print '\n'


================================================================================
-61.660059643 klen 3 :"IDC", VBFWNODAMWWIGEWUSLDARELDLTTTYSASNOQESHENDGGEGYAUFRGRMEIBLGLRPYR
================================================================================
-59.2154010926 klen 4 :"YCJO", CCOAROCFZPLNKFJOHLCFBLEHTOWHDSSEKEHINITPMGFJQBRZZYSEKEHGXNEOJPQ
================================================================================
-89.9701573757 klen 5 :"PALHX", BETXOEMKGHTLEABACCENANDOBESYXLNCESSLHASDLIEAXHDGGGNCESSINEGFION
================================================================================
-117.050498758 klen 6 :"MAUREC", NEKEMOADSEVIDHECOLADIZJDHAMMRSTCANNEPLZACGDHRJSUCTRILEFEORJRKOI
================================================================================
-127.525938083 klen 7 :"SDORNCG", OBJEDOFELASONBKKEZSBWQLHOUTSSOLICELHENNIECVELJXUHOFERRUOQMVCZPW
================================================================================
-138.792596964 klen 8 :"KPYOLCZR", EINAFOKJRCONEFACJWWFNLYMSEEHUSWIYLYITIUAQNKJOBMPRBQEEEOLKZZORPE
================================================================================
-141.899963908 klen 9 :"DEGIDGRCC", RABWNRXBMYVEGENCALISWELSOOTCRYASINUEEGANDCRFGASLFRCINEQAGGLWLPR
================================================================================
-108.919407769 klen 10 :"PALIMPSEST", BETWEENSUBTLESHADINGANDTHEABSENCEOFLIGHTLIESTHENUANCEOFIQLUSION
================================================================================
-149.613822927 klen 11 :"QGYZYCQGJUJ", LONGROOTELLTINABASFRIDASTIEDMYSUFKLMVVEHASAOYTIOZYYPFZHXGKYBCMW
================================================================================
-134.252813442 klen 12 :"AGYRRVMXLIMI", TONEYMANDWICUBTCITANNECTZREMENTMEOFOHEMANKDWOYZIQXQIRXFQCGCUEZB
================================================================================
-132.622931496 klen 13 :"KHAOZDXSNWHMZ", ETHATTREBINSPUMPERINTATOHIGOODWSTHEWASUBSIWVFNEETWTLYSWGRMPTZIV
================================================================================
-147.441605578 klen 14 :"WPIIVFNEJWRNAB", HISWAYSSEISOUGCAREPOLIEVOUSEIFARSBECORNIGHOMAYUMBKSNRRWDPZFUVKL
================================================================================
-145.859484498 klen 15 :"BRNSFCJFCRNCIAY", PLYSLOCOMEHIGHTYIANORDIDOUTTOGHICHMEOFEARGGHQCJEUZGKJEPZLGQBDEL

K1 - Determining Message Boundaries


In [64]:
ctext = "".join(ctext_kryptos[:3])
key = "PALIMPSEST"
alphakey = "KRYPTOS"
print ctext
print keyed_vigenere(ctext, key, alpha_key = alphakey, direction = -1)


EMUFPHZLRFAXYUSDJKZLDKRNSHGNFIVJYQTQUXQBQVYUVLLTREVJYQTMKYRDMFDVFPJUDEEHZWETZYVGWHKKQETGFQJNCE
BETWEENSUBTLESHADINGANDTHEABSENCEOFLIGHTLIESTHENUANCEOFIQLUSIONAQKDDTABABBNRNLJCQACEYBXYSJGFMV

From this, it can be seen that only the first 2 lines make up K1.


In [65]:
ctext_k1 = "".join(ctext_kryptos[:2])

key = "PALIMPSEST"
alphakey = "KRYPTOS"
ptext_k1 = keyed_vigenere(ctext_k1, key, alpha_key = alphakey, direction = -1)

print ctext_k1
print ptext_k1


EMUFPHZLRFAXYUSDJKZLDKRNSHGNFIVJYQTQUXQBQVYUVLLTREVJYQTMKYRDMFD
BETWEENSUBTLESHADINGANDTHEABSENCEOFLIGHTLIESTHENUANCEOFIQLUSION

In [66]:
from word_score import word_score
fitness = word_score()
print fitness.score(ptext_k1)


(-55.91447679719653, ['BETWEEN', 'SUBTLE', 'SHADING', 'AND', 'THE', 'ABSENCE', 'OF', 'LIGHT', 'LIES', 'THE', 'NUANCE', 'OF', 'IQ', 'LU', 'SION'])
BETWEEN SUBTLE SHADING AND THE ABSENCE OF LIGHT LIES THE NUANCE OF IQLUSION

Kryptos - K2

Let's start with the rest of the lines of the first panel for K2:


In [67]:
ctext = "".join(ctext_kryptos[2:14])
ctext


Out[67]:
'VFPJUDEEHZWETZYVGWHKKQETGFQJNCEGGWHKK?DQMCPFQZDQMMIAGPFXHQRLGTIMVMZJANQLVKQEDAGDVFRPJUNGEUNAQZGZLECGYUXUEENJTBJLBQCRTBJDFHRRYIZETKZEMVDUFKSJHKFWHKUWQLSZFTIHHDDDUVH?DWKBFUFPWNTDFIYCUQZEREEVLDKFEZMOQQJLTTUGSYQPFEUNLAVIDXFLGGTEZ?FKZBSFDQVGOGIPUFXHHDRKFFHQNTGPUAECNUVPDJMQCLQUMUNEDFQELZZVRRGKFFVOEEXBDMVPNFQXEZLGREDNQFMPNZGLFLPMRJQYALMGNUVPDXVKPDQUMEBEDMHDAFMJGZNUPLGEWJLLAETG'

K2 - Determine if Plaintext


In [68]:
IC(ctext)


Out[68]:
1.182131495227996

Based on the IC value, it looks like we are not dealing with plaintext, or a transposition of plaintext.

K2 - Determine Key Length


In [69]:
from matplotlib import pyplot as plt
%matplotlib inline

mean_ic_n = []
for n in range(1,32):
    mean_ic = mean_IC(ctext.upper(), n)
    mean_ic_n.append(mean_ic)
    print "%2d %02f" % (n, mean_ic)
plt.rcParams["figure.figsize"] = [12,4]
plt.bar(range(1,len(mean_ic_n)+1), mean_ic_n, align = 'center')
plt.xlabel("key length")
plt.ylabel("average IC")


 1 1.182131
 2 1.222922
 3 1.174730
 4 1.447021
 5 1.287058
 6 1.212844
 7 1.160615
 8 1.788987
 9 1.127371
10 1.305800
11 1.175498
12 1.422593
13 1.257617
14 1.182434
15 1.276889
16 1.704216
17 1.268551
18 1.129128
19 0.988304
20 1.577709
21 1.112667
22 1.097237
23 1.118244
24 1.768155
25 1.180190
26 1.379487
27 1.259259
28 1.284014
29 1.149425
30 1.415152
31 1.174194
Out[69]:
<matplotlib.text.Text at 0x7f311223fed0>

This suggests a key length of 8.

K2 - Frequency Analysis of Subsequences #1


In [70]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,16]

def rotate(l, n):
    return l[n:] + l[:n]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

key_length = 8

g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]
pfig = plt.subplot(key_length+1,1,1)
plt.bar(range(len(alphabet)), g_english, tick_label = rotate(list(alphabet), 0), align = 'center', color = 'b')

ctext_freq = []
for i in range(key_length):
    ctext_i = ctext[i::key_length]
    ctext_freq.append(np.array([ctext_i.count(c)/len(ctext_i) for c in alphabet]))
    
    cfig = plt.subplot(key_length+1,1,i+2)
    plt.bar(range(len(alphabet)), ctext_freq[i], tick_label = rotate(list(alphabet), 0), align = 'center', color = 'r')


K2 - Frequency Analysis of Subsequences #2

Now, let's try that again, but use the cipher alphabet order instead.


In [71]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [12,16]

def rotate(l, n):
    return l[n:] + l[:n]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
keyed_alphabet = compute_keyed_alphabet("KRYPTOS")

key_length = 8

g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]
g_english_keyed = [g_english[alphabet.find(c)] for c in keyed_alphabet]
pfig = plt.subplot(key_length+1,1,1)
plt.bar(range(len(alphabet)), g_english_keyed, tick_label = rotate(list(keyed_alphabet), 0), align = 'center', color = 'b')

ctext_freq = []
for i in range(key_length):
    ctext_i = ctext[i::key_length]
    ctext_freq.append(np.array([ctext_i.count(c)/len(ctext_i) for c in keyed_alphabet]))
    
    cfig = plt.subplot(key_length+1,1,i+2)
    plt.bar(range(len(keyed_alphabet)), ctext_freq[i], tick_label = rotate(list(keyed_alphabet), 0), align = 'center', color = 'r')


K2 - Frequency Analysis of Subsequences #3

For K2, it looks like it might be possible to align the frequency plots by hand to break the key. To make this easier, let's create plots with two concatenated alphabets. That way, we can print them and align them (or do this by yourself on graph paper of course).


In [73]:
# https://blog.dominodatalab.com/interactive-dashboards-in-jupyter/
# http://ipywidgets.readthedocs.io/en/latest/examples/Using%20Interact.html
from __future__ import division
from matplotlib import pyplot as plt
import numpy as np
from ipywidgets import interact, interactive, fixed
#import ipywidgets as widgets
plt.rcParams["figure.figsize"] = [18,16]

def rotate(l, n):
    return l[n:] + l[:n]

alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
keyed_alphabet = compute_keyed_alphabet("KRYPTOS")
ncopies = 2

key_length = 8

g_english = [0.0736, 0.0148, 0.0445, 0.0302, 0.102, 0.0227, 0.0122, 0.0277, 0.0855, 0.000557, 0.00237, 0.0342, 0.0206, 0.0717, 0.103, 0.0246, 0.00181, 0.0735, 0.0608, 0.0889, 0.0392, 0.0153, 0.0173, 0.000557, 0.032, 0.000278]
g_english_keyed = [g_english[alphabet.find(c)] for c in keyed_alphabet]
pfig = plt.subplot(key_length+1,1,1)
plt.bar(range(len(alphabet)*ncopies), g_english_keyed + [0]*26, tick_label = rotate(list(keyed_alphabet)*ncopies, 0), align = 'center', color = 'b')

ctext_freq = []
for i in range(key_length):
    ctext_i = ctext[i::key_length]
    ctext_freq.append(np.array([ctext_i.count(c)/len(ctext_i) for c in keyed_alphabet]))
    
    cfig = plt.subplot(key_length+1,1,i+2)
    plt.bar(range(len(keyed_alphabet)*ncopies), list(ctext_freq[i])*ncopies, tick_label = rotate(list(keyed_alphabet)*ncopies, 0), align = 'center', color = 'r')


Some of these could be guessed based on the alignment: at least the A and the first S seem reasonable to guess, giving A????S?? for the key, after which the possible candidate keys can be guessed or looked up, e.g. on onelook.com.

K2 - Vigenere Key is Unknown, Alphabet Key is Known


In [74]:
from itertools import permutations
from ngram_score import ngram_score

import re
import pprint as pp

qgram = ngram_score('ngrams/en_sherlock_4grams') # load our 4gram statistics
trigram = ngram_score('ngrams/en_sherlock_3grams') # load our 3gram statistics

# keep track of the 100 best keys
N=100 
L=42 # only process first L characters for speed, increase if needed for accuracy
alphakey = "KRYPTOS"

for KLEN in [8]: #range(3,16):    
    print "="*80
    rec = nbest(N)
    
    # exhaustively test all possible letters for first 3 entries of the key and keep track of the N best ones
    # if KLEN=7, this will test e.g. FOOAAAA and BARAAAA 
    for i in permutations('ABCDEFGHIJKLMNOPQRSTUVWXYZ',3):
        i = "".join(i)
        key = ''.join(i) + 'A'*(KLEN-len(i))
        decrypted_ctext = keyed_vigenere(ctext[:L], key, alpha_key = alphakey, direction = -1)
        score = 0
        for j in range(0,len(ctext),KLEN):
            score += trigram.score(decrypted_ctext[j:j+3])
        rec.add((score,''.join(i), decrypted_ctext))
    next_rec = nbest(N)
    
    # for the remaining KLEN-3 characters of the key,
    for i in range(0,KLEN-3):
        # go over the N best keys found so far...
        for k in xrange(N):
            # ...and determine the best next character of the key, while keeping best N keys so far
            for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
                key = rec[k][1] + c
                fullkey = key + 'A'*(KLEN-len(key))
                decrypted_ctext = keyed_vigenere(ctext[:L], fullkey, alpha_key = alphakey, direction = -1)
                score = 0
                for j in range(0,len(ctext),KLEN):
                    score += qgram.score(decrypted_ctext[j:j+len(key)])
                next_rec.add((score, key, decrypted_ctext))
        rec = next_rec
        next_rec = nbest(N)
       
    # show the results
    bestscore = rec[0][0]
    bestkey = rec[0][1]    
    #decrypted_ctext = rec[0][2]
    # always show entire decrypted ctext, even if the above analysis is done only on part of the ctext, e.g. ctext[0:100]
    decrypted_ctext = keyed_vigenere(ctext, bestkey, alpha_key = alphakey, direction = -1) 
    print bestscore, 'klen', KLEN, ':"'+bestkey+'",', decrypted_ctext
    # uncomment the following lines to see top-10 results
    #pp.pprint(rec.store[0:10])
    #print '\n'


================================================================================
-107.207729249 klen 8 :"ABSCISSA", ITWASTOTALLYINVISIBLEHOWSTHATPOSSIBLE?THEYUSEDTHEEARTHSMAGNETICFIELDXTHEINFORMATIONWASGATHEREDANDTRANSMITTEDUNDERGRUUNDTOANUNKNOWNLOCATIONXDOESLANGLEYKNOWABOUTTHIS?THEYSHOULDITSBURIEDOUTTHERESOMEWHEREXWHOKNOWSTHEEXACTLOCATION?ONLYWWTHISWASHISLASTMESSAGEXTHIRTYEIGHTDEGREESFIFTYSEVENMINUTESSIXPOINTFIVESECONDSNORTHSEVENTYSEVENDEGREESEIGHTMINUTESFORTYFOURSECONDSWESTIDBYROWS

K2 - Determining Message Boundaries


In [75]:
ctext = "".join(ctext_kryptos[2:])
key = "ABSCISSA"
alphakey = "KRYPTOS"
print ctext
print keyed_vigenere(ctext, key, alpha_key = alphakey, direction = -1)


VFPJUDEEHZWETZYVGWHKKQETGFQJNCEGGWHKK?DQMCPFQZDQMMIAGPFXHQRLGTIMVMZJANQLVKQEDAGDVFRPJUNGEUNAQZGZLECGYUXUEENJTBJLBQCRTBJDFHRRYIZETKZEMVDUFKSJHKFWHKUWQLSZFTIHHDDDUVH?DWKBFUFPWNTDFIYCUQZEREEVLDKFEZMOQQJLTTUGSYQPFEUNLAVIDXFLGGTEZ?FKZBSFDQVGOGIPUFXHHDRKFFHQNTGPUAECNUVPDJMQCLQUMUNEDFQELZZVRRGKFFVOEEXBDMVPNFQXEZLGREDNQFMPNZGLFLPMRJQYALMGNUVPDXVKPDQUMEBEDMHDAFMJGZNUPLGEWJLLAETGENDYAHROHNLSRHEOCPTEOIBIDYSHNAIACHTNREYULDSLLSLLNOHSNOSMRWXMNETPRNGATIHNRARPESLNNELEBLPIIACAEWMTWNDITEENRAHCTENEUDRETNHAEOETFOLSEDTIWENHAEIOYTEYQHEENCTAYCREIFTBRSPAMHHEWENATAMATEGYEERLBTEEFOASFIOTUETUAEOTOARMAEERTNRTIBSEDDNIAAHTTMSTEWPIEROAGRIEWFEBAECTDDHILCEIHSITEGOEAOSDDRYDLORITRKLMLEHAGTDHARDPNEOHMGFMFEUHEECDMRIPFEIMEHNLSSTTRTVDOHW?OBKRUOXOGHULBSOLIFBBWFLRVQQPRNGKSSOTWTQSJQSSEKZZWATJKLUDIAWINFBNYPVTTMZFPKWGDKZXTJCDIGKUHUAUEKCAR
ITWASTOTALLYINVISIBLEHOWSTHATPOSSIBLE?THEYUSEDTHEEARTHSMAGNETICFIELDXTHEINFORMATIONWASGATHEREDANDTRANSMITTEDUNDERGRUUNDTOANUNKNOWNLOCATIONXDOESLANGLEYKNOWABOUTTHIS?THEYSHOULDITSBURIEDOUTTHERESOMEWHEREXWHOKNOWSTHEEXACTLOCATION?ONLYWWTHISWASHISLASTMESSAGEXTHIRTYEIGHTDEGREESFIFTYSEVENMINUTESSIXPOINTFIVESECONDSNORTHSEVENTYSEVENDEGREESEIGHTMINUTESFORTYFOURSECONDSWESTIDBYROWSPGRGRBQXSGBLUBTXRWUVZCRBYVWZGRBKRBUTUOUHCTWYEKDDEZOLGZZENLIPGOWVNGTMXCAFNRMHOKDFEOBVYEVBARKMOLEWIGRKXOTFNROQXOFTGTMVXGAKPZYISZDZPTUKLOFAZOSJVXTUFBYVGPWKQPMVCSWRNKQMFBATIODMXREKVOTGOOQDKXYVSZKZTCVIIOWHZOVIZRQEZOYFXGQWAYWVTTFBZROIXFZWPLQKOUXKOUSVLSTRZOKITTABCPYKBKBWPAVVRZZPYUNUEZQBVULYFETAZAUUBRQPUGYJBFSODSYSBOTYYFMKWSTBDOOTEKZWVUUATZAJ?WYLFIZLXOBFYYKXDASZNLSDQHHEHUGSNXKVILXGZBHWLOQMMIRURQEHPARHKGSRFQWGIXFMOUQHXTQMLVDKUCANHSIXSOQYKN

So K2 indeed seems to be just the remainder of the first panel.


In [76]:
ctext_k2 = "".join(ctext_kryptos[2:14])

key = "ABSCISSA"
alphakey = "KRYPTOS"
ptext_k2 = keyed_vigenere(ctext_k2, key, alpha_key = alphakey, direction = -1)

print ctext_k2
print ptext_k2


VFPJUDEEHZWETZYVGWHKKQETGFQJNCEGGWHKK?DQMCPFQZDQMMIAGPFXHQRLGTIMVMZJANQLVKQEDAGDVFRPJUNGEUNAQZGZLECGYUXUEENJTBJLBQCRTBJDFHRRYIZETKZEMVDUFKSJHKFWHKUWQLSZFTIHHDDDUVH?DWKBFUFPWNTDFIYCUQZEREEVLDKFEZMOQQJLTTUGSYQPFEUNLAVIDXFLGGTEZ?FKZBSFDQVGOGIPUFXHHDRKFFHQNTGPUAECNUVPDJMQCLQUMUNEDFQELZZVRRGKFFVOEEXBDMVPNFQXEZLGREDNQFMPNZGLFLPMRJQYALMGNUVPDXVKPDQUMEBEDMHDAFMJGZNUPLGEWJLLAETG
ITWASTOTALLYINVISIBLEHOWSTHATPOSSIBLE?THEYUSEDTHEEARTHSMAGNETICFIELDXTHEINFORMATIONWASGATHEREDANDTRANSMITTEDUNDERGRUUNDTOANUNKNOWNLOCATIONXDOESLANGLEYKNOWABOUTTHIS?THEYSHOULDITSBURIEDOUTTHERESOMEWHEREXWHOKNOWSTHEEXACTLOCATION?ONLYWWTHISWASHISLASTMESSAGEXTHIRTYEIGHTDEGREESFIFTYSEVENMINUTESSIXPOINTFIVESECONDSNORTHSEVENTYSEVENDEGREESEIGHTMINUTESFORTYFOURSECONDSWESTIDBYROWS

K2 - Determining Word Boundaries


In [77]:
from word_score import word_score
fitness = word_score()
print fitness.score(ptext_k2)


(-319.3567312524284, ['IT', 'WAS', 'TOTALLY', 'INVISIBLE', 'HOWS', 'THAT', 'POSSIBLE', '?', 'THEY', 'USED', 'THE', 'EARTHS', 'MAGNETIC', 'FIELD', 'XTHE', 'INFORMATION', 'WAS', 'GATHERED', 'AND', 'TRANSMITTED', 'UNDER', 'GRU', 'UND', 'TO', 'AN', 'UNKNOWN', 'LOCATION', 'XD', 'OES', 'LANGLEY', 'KNOW', 'ABOUT', 'THIS', '?', 'THEY', 'SHOULD', 'ITS', 'BURIED', 'OUT', 'THERE', 'SOMEWHERE', 'XW', 'HO', 'KNOWS', 'THE', 'EXACT', 'LOCATION', '?ONLYWW', 'THIS', 'WAS', 'HIS', 'LAST', 'MESSAG', 'EX', 'THIRTY', 'EIGHT', 'DEGREES', 'FIFTY', 'SEVEN', 'MINUTES', 'SIX', 'POINT', 'FIVE', 'SECONDS', 'NORTH', 'SEVENTY', 'SEVEN', 'DEGREES', 'EIGHT', 'MINUTES', 'FORTY', 'FOUR', 'SECONDS', 'WEST', 'ID', 'BY', 'ROWS'])

K2 - Correction to Ciphertext


In [78]:
ctext_k2_corrected = ctext_k2.replace("WJLLAETG", "SWJLLAETG") # insert missing character near the end

key = "ABSCISSA"
alphakey = "KRYPTOS"
ptext_k2_corrected = keyed_vigenere(ctext_k2_corrected, key, alpha_key = alphakey, direction = -1)

print ctext_k2_corrected
print ptext_k2_corrected


VFPJUDEEHZWETZYVGWHKKQETGFQJNCEGGWHKK?DQMCPFQZDQMMIAGPFXHQRLGTIMVMZJANQLVKQEDAGDVFRPJUNGEUNAQZGZLECGYUXUEENJTBJLBQCRTBJDFHRRYIZETKZEMVDUFKSJHKFWHKUWQLSZFTIHHDDDUVH?DWKBFUFPWNTDFIYCUQZEREEVLDKFEZMOQQJLTTUGSYQPFEUNLAVIDXFLGGTEZ?FKZBSFDQVGOGIPUFXHHDRKFFHQNTGPUAECNUVPDJMQCLQUMUNEDFQELZZVRRGKFFVOEEXBDMVPNFQXEZLGREDNQFMPNZGLFLPMRJQYALMGNUVPDXVKPDQUMEBEDMHDAFMJGZNUPLGESWJLLAETG
ITWASTOTALLYINVISIBLEHOWSTHATPOSSIBLE?THEYUSEDTHEEARTHSMAGNETICFIELDXTHEINFORMATIONWASGATHEREDANDTRANSMITTEDUNDERGRUUNDTOANUNKNOWNLOCATIONXDOESLANGLEYKNOWABOUTTHIS?THEYSHOULDITSBURIEDOUTTHERESOMEWHEREXWHOKNOWSTHEEXACTLOCATION?ONLYWWTHISWASHISLASTMESSAGEXTHIRTYEIGHTDEGREESFIFTYSEVENMINUTESSIXPOINTFIVESECONDSNORTHSEVENTYSEVENDEGREESEIGHTMINUTESFORTYFOURSECONDSWESTXLAYERTWO

In [79]:
from word_score import word_score
fitness = word_score()
print fitness.score(ptext_k2_corrected)


(-321.0621473857477, ['IT', 'WAS', 'TOTALLY', 'INVISIBLE', 'HOWS', 'THAT', 'POSSIBLE', '?', 'THEY', 'USED', 'THE', 'EARTHS', 'MAGNETIC', 'FIELD', 'XTHE', 'INFORMATION', 'WAS', 'GATHERED', 'AND', 'TRANSMITTED', 'UNDER', 'GRU', 'UND', 'TO', 'AN', 'UNKNOWN', 'LOCATION', 'XD', 'OES', 'LANGLEY', 'KNOW', 'ABOUT', 'THIS', '?', 'THEY', 'SHOULD', 'ITS', 'BURIED', 'OUT', 'THERE', 'SOMEWHERE', 'XW', 'HO', 'KNOWS', 'THE', 'EXACT', 'LOCATION', '?ONLYWW', 'THIS', 'WAS', 'HIS', 'LAST', 'MESSAG', 'EX', 'THIRTY', 'EIGHT', 'DEGREES', 'FIFTY', 'SEVEN', 'MINUTES', 'SIX', 'POINT', 'FIVE', 'SECONDS', 'NORTH', 'SEVENTY', 'SEVEN', 'DEGREES', 'EIGHT', 'MINUTES', 'FORTY', 'FOUR', 'SECONDS', 'WE', 'STX', 'LAYER', 'TWO'])
IT WAS TOTALLY INVISIBLE HOWS THAT POSSIBLE? THEY USED THE EARTHS MAGNETIC FIELD X
THE INFORMATION WAS GATHERED AND TRANSMITTED UNDERGRUUND TO AN UNKNOWN LOCATION X
DOES LANGLEY KNOW ABOUT THIS? THEY SHOULD ITS BURIED OUT THERE SOMEWHERE X
WHO KNOWS THE EXACT LOCATION? ONLY WW THIS WAS HIS LAST MESSAGE X 
THIRTY EIGHT DEGREES FIFTY SEVEN MINUTES SIX POINT FIVE SECONDS NORTH SEVENTY SEVEN DEGREES EIGHT MINUTES FORTY FOUR SECONDS WEST X 
LAYER TWO