In this exercise, follow the instructions here: read the Markdown cells and execute the Code cells (the ones with In + a number on their left).
Not sure how to execute cells in a Notebook? Check the Jupyter Notebook tutorial.
Import the collatex Python library
In [1]:
from collatex import *
Create a collation object
In [2]:
collation = Collation()
Now open the texts in "../data/Darwin" and let Python read them.
The indication 'par1' in the name of each file indicates here that it is only the first paragraph.
The code below is how Python read a file: it is not CollateX code, but general Python way of doing things. Each file is opened, read (using a specific character encoding) and stored in a variable ('witness_1859', etc.). The name of the variable cannot contain whitespaces!
In [3]:
witness_1859 = open( "../data/Darwin/darwin1859_par1.txt", encoding='utf-8' ).read()
witness_1860 = open( "../data/Darwin/darwin1860_par1.txt", encoding='utf-8' ).read()
witness_1861 = open( "../data/Darwin/darwin1861_par1.txt", encoding='utf-8' ).read()
witness_1866 = open( "../data/Darwin/darwin1866_par1.txt", encoding='utf-8' ).read()
witness_1869 = open( "../data/Darwin/darwin1869_par1.txt", encoding='utf-8' ).read()
witness_1872 = open( "../data/Darwin/darwin1872_par1.txt", encoding='utf-8' ).read()
Just to be sure that the text in the files has been stored, try to print one of them.
In [4]:
print(witness_1859)
Or another one
In [5]:
print(witness_1872)
In [6]:
collation.add_plain_witness( "1859", witness_1859 )
collation.add_plain_witness( "1860", witness_1860 )
collation.add_plain_witness( "1861", witness_1861 )
collation.add_plain_witness( "1866", witness_1866 )
collation.add_plain_witness( "1869", witness_1869 )
collation.add_plain_witness( "1872", witness_1872 )
In [7]:
alignment_table = collate(collation, layout='vertical', output="html")
print(alignment_table)
In the second exercise, repeat the previous steps, now using the texts at "../data/Woolf/Lighthouse-2" and visualizing the output with the more sophisticated HTML option (HTML2).
We will be using different editions of Virginia Woolf's To the lighthouse:
USA = New York: Harcourt, Brace & Company, 1927 (1st USA edition)
UK = Londond: R & R Clark Limited, 1827 (1st UK edition)
EM (EVERYMAN) = London: J. M. Dent & Sons LTD, 1938 (reprint 1952)
The facsimiles and trascriptions of the editions are available at http://woolfonline.com/. Please refer to the information in the data directory for the materials licence.
Note that the output 'html2' is specified this time: colors should appear!
In [8]:
from collatex import *
collation = Collation()
witness_USA = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )
alignment_table = collate(collation, output='html2')
You now know how to collate texts stored in files. Try with the other materials inside the data directory: the sonnet about writing a sonnet, that you have been using to start encoding in TEI. Collate the two versions of the sonnet.
In [10]:
from collatex import *
collation = Collation()
witness_1707 = open( "../data/sonnet/Lope_soneto_FR_1707.txt", encoding='utf-8' ).read()
witness_1822 = open( "../data/sonnet/Lope_soneto_FR_1822.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "witness1707", witness_1707 )
collation.add_plain_witness( "witness1822", witness_1822 )
alignment_table = collate(collation, output='html2')