This notebook is designed to reproduce several findings from Emily Thornbury's chapter "The Poet Alone" in her book Becoming a Poet in Anglo-Saxon England. In particular, Fig. 4.5 on page 170.
First, however, we're going to think about what we might do with lists of strings. After all, how else can we count features of a string unless we can somehow make a list of items out of it?
Here's a list:
In [ ]:
["þæt", "wearð", "underne"]
How do I know?
In [ ]:
type(["þæt", "wearð", "underne"])
We can assign these to variables too!
In [ ]:
first_hemistich = ["þæt", "wearð", "underne"]
second_hemistich = ["eorðbuendum"]
And perform mathematical operations:
In [ ]:
print(first_hemistich + second_hemistich)
Let's assign that to first_line
In [ ]:
first_line = first_hemistich + second_hemistich
You can get the length of a list
using the len
In [ ]:
You can index lists with brackets []
, let's get the first word of the first line:
In [ ]:
In [ ]:
You can get ranges using a semi-colon :
In [ ]:
In [ ]:
first_line = ['þæt', 'wearð', 'underne', 'eorðbuendum,']
second_line = ['þæt', 'meotod', 'hæfde', 'miht', 'and', 'strengðo']
third_line = ['ða', 'he', 'gefestnade', 'foldan', 'sceatas.']
In [ ]:
For now, think of a list comprehension as a fast way to sift out items from a list, instead of writing a for
loop that appends to a new one.
In [ ]:
[word for word in first_line if "e" in word]
In [ ]:
has_e = []
for word in first_line:
if "e" in word:
Now you know why list comprehensions are one of the best parts of Python!
Especially for text analysis, these will come in handy when we want to parse and sift through text.
In [ ]:
In [ ]:
with open('data/christ-and-satan.txt', 'r') as f:
christ_and_satan =
In [ ]:
tokens = christ_and_satan.split()
In [ ]:
Looks like a decent start. But we still have verse numbering in there, as well as some punctuation. What if we just want the words?
In [ ]:
from string import punctuation, digits
In [ ]:
In [ ]:
In [ ]:
Python comes with the convenient Counter
method from the collections
library. It returns a dictionary
like object that will return the frequency of a particular key.
In [ ]:
from collections import Counter
cs_dict = Counter(tokens)
In [ ]:
In [ ]:
In [ ]:
In [ ]:
Believe it or not, even 1000 years ago "and" was still used all the time :) .
In [ ]:
In [ ]:
%matplotlib inline
from datascience import *
import numpy as np
words, frequency = zip(*cs_dict.items())
t = Table(["Words", "Frequency"])
t.append_column("Words", words)
t.append_column("Frequency", frequency)
top_table = t.sort("Frequency", descending="True").take(np.arange(5))"Words")
We can now put together our knowledge of strings, list comprehensions, and plotting frequencies to look at frequency of alliteration letters. Remember: Alliteration is the repetition of a sound at the beginning of two or more words in the same line.
Let's start by looking at the first letter of every word in the whole text:
In [ ]:
cs_tokens = christ_and_satan.lower().split()
first_letters = [x[0] if x[0] not in ['a','e','i','o','u','y'] else 'a' for x in cs_tokens]
first_l_dict = Counter(first_letters)
first_l_freq = first_l_dict.most_common()
In [ ]:
# plot
letters, frequency = zip(*first_l_dict.items())
t = Table(["Letters", "Frequency"])
t.append_column("Letters", letters)
t.append_column("Frequency", frequency)
top_table = t.sort("Frequency", descending="True").take(np.arange(5))"Letters")
Cool! But we need it within a line, and Thornbury specifically does it for each Fitt. What's a "Fitt"? It's a further division in poetry constituted by a group of lines. Luckily this is nicely delimited by double line breaks (\n\n
In [ ]:
cs_fitts = christ_and_satan.split('\n\n')
In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize = (10,10))
# iterate through fitts
for i in range(len(cs_fitts)):
# lowercase the string and get the tokens for each line back
fitt_tokens = [l.split() for l in cs_fitts[i].lower().split('\n')]
# collect letter of most freq alliteration
most_freq_allit = []
# cycle through lines
for l in fitt_tokens:
# get first letter of all words in line
first_letters = [x[0] if x[0] not in ['a','e','i','o','u','y'] else 'a' for x in l]
# count freq of all first letters
allit_freq = Counter(first_letters).most_common()
# append most freq letter (alliterated letter) to list for all lines
# use Counter to get the most common alliterations
allit_freq = Counter(most_freq_allit).most_common()
# need keys for x axis
common_keys = [x[0] for x in allit_freq]
# need values for y axes
common_values = [x[1] for x in allit_freq]
# normalize so we can compare across Fitts despite different number of words
normed_values = [x[1]/sum(common_values) for x in allit_freq]
# add up to get cumulative alliteration of the four most preferred patterns
cumulative_values = np.cumsum(normed_values)
# add the Fitt to the plot
plt.xticks(range(4), ['1st','2nd','3rd','4th'], rotation='vertical')
plt.plot(cumulative_values[:4], color =*.085), lw=3)
plt.legend(labels=['Fitt '+str(i+1) for i in range(12)], loc=0)
In poetry, an acrostic is a message created by taking certain letters in a pattern over lines. One 9th century German writer, Otfrid of Weissenburg, was notorius for his early use of acrostics, one instance of which is in the text below: Salomoni episcopo Otfridus. His message can be found by taking the first character of every other line. Print Otfrid's message!
In [ ]:
text = '''si sálida gimúati sálomones gúati,
ther bíscof ist nu édiles kóstinzero sédales;
allo gúati gidúe thio sín, thio bíscofa er thar hábetin,
ther ínan zi thiu giládota, in hóubit sinaz zuívalta!
lékza ih therera búachi iu sentu in suábo richi,
thaz ir irkíaset ubar ál, oba siu frúma wesan scal;
oba ir hiar fíndet iawiht thés thaz wírdig ist thes lésannes:
iz iuer húgu irwállo, wísduames fóllo.
mir wárun thio iuo wízzi ju ófto filu núzzi,
íueraz wísduam; thes duan ih míhilan ruam.
ófto irhugg ih múates thes mánagfalten gúates,
thaz ír mih lértut hárto íues selbes wórto.
ni thaz míno dohti giwérkon thaz io móhti,
odo in thén thingon thio húldi so gilángon;
iz datun gómaheiti, thio íues selbes gúati,
íueraz giráti, nales míno dati.
emmizen nu ubar ál ih druhtin férgon scal,
mit lón er iu iz firgélte joh sínes selbes wórte;
páradyses résti gébe iu zi gilústi;
ungilónot ni biléip ther gotes wízzode kleip.
in hímilriches scóne so wérde iz iu zi lóne
mit géltes ginúhti, thaz ír mir datut zúhti.
sínt in thesemo búache, thes gómo theheiner rúache;
wórtes odo gúates, thaz lích iu iues múates:
chéret thaz in múate bi thia zúhti iu zi gúate,
joh zellet tház ana wánc al in íuweran thanc.
ofto wírdit, oba gúat thes mannes júngoro giduat,
thaz es líwit thráto ther zúhtari gúato.
pétrus ther rícho lono iu es blídlicho,
themo zi rómu druhtin gráp joh hús inti hóf gap;
óbana fon hímile sént iu io zi gámane
sálida gimýato selbo kríst ther gúato!
oba ih irbálden es gidár, ni scal ih firlázan iz ouh ál,
nub ih ío bi iuih gerno gináda sina férgo,
thaz hóh er iuo wírdi mit sínes selbes húldi,
joh iu féstino in thaz múat thaz sinaz mánagfalta gúat;
firlíhe iu sines ríches, thes hohen hímilriches,
bi thaz ther gúato hiar io wíaf joh émmizen zi góte riaf;
rihte íue pédi thara frúa joh míh gifúage tharazúa,
tház wir unsih fréwen thar thaz gotes éwiniga jár,
in hímile unsih blíden, thaz wízi wir bimíden;
joh dúe uns thaz gimúati thúruh thio síno guati!
dúe uns thaz zi gúate blídemo múate!
mit héilu er gibóran ward, ther io thia sálida thar fand,
uuanta es ni brístit furdir (thes gilóube man mír),
nirfréwe sih mit múatu íamer thar mit gúatu.
sélbo krist ther guato firlíhe uns hiar gimúato,
wir íamer fro sin múates thes éwinigen gúates!'''
In [ ]:
# HINT: remember what % does, (maybe) lookup enumerate
Otfrid was more skillful than to settle for the first letter of every other line. What happens if you extract the last letter of the last word of each line, for every other line starting on the second line?
In [ ]:
# HINT: first remove punctuation, tab is represented by \t
from string import punctuation