Collating for real with CollateX. Files

In this exercise, follow the instructions here: read the Markdown cells and execute the Code cells (the ones with In + a number on their left).

Not sure how to execute cells in a Notebook? Check the Jupyter Notebook tutorial.

1. First exercise (Darwin texts). Read from files and HTML output.

Import the collatex Python library


In [1]:
from collatex import *

Create a collation object


In [2]:
collation = Collation()

Read text from files

Now open the texts in "../data/Darwin" and let Python read them.

The indication 'par1' in the name of each file indicates here that it is only the first paragraph.

The code below is how Python read a file: it is not CollateX code, but general Python way of doing things. Each file is opened, read (using a specific character encoding) and stored in a variable ('witness_1859', etc.). The name of the variable cannot contain whitespaces!


In [3]:
witness_1859 = open( "../data/Darwin/darwin1859_par1.txt", encoding='utf-8' ).read()
witness_1860 = open( "../data/Darwin/darwin1860_par1.txt", encoding='utf-8' ).read()
witness_1861 = open( "../data/Darwin/darwin1861_par1.txt", encoding='utf-8' ).read()
witness_1866 = open( "../data/Darwin/darwin1866_par1.txt", encoding='utf-8' ).read()
witness_1869 = open( "../data/Darwin/darwin1869_par1.txt", encoding='utf-8' ).read()
witness_1872 = open( "../data/Darwin/darwin1872_par1.txt", encoding='utf-8' ).read()

Just to be sure that the text in the files has been stored, try to print one of them.


In [4]:
print(witness_1859)


WHEN we look to the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us, is, that they generally differ much more from each other, than do the individuals of any one species or variety in a state of nature. When we reflect on the vast diversity of the plants and animals which have been cultivated, and which have varied during all ages under the most different climates and treatment, I think we are driven to conclude that this greater variability is simply due to our domestic productions having been raised under conditions of life not so uniform as, and somewhat different from, those to which the parent-species have been exposed under nature. There is, also, I think, some probability in the view propounded by Andrew Knight, that this variability may be partly connected with excess of food. It seems pretty clear that organic beings must be exposed during several generations to the new conditions of life to cause any appreciable amount of variation; and that when the organisation has once begun to vary, it generally continues to vary for many generations. No case is on record of a variable being ceasing to be variable under cultivation. Our oldest cultivated plants, such as wheat, still often yield new varieties: our oldest domesticated animals are still capable of rapid improvement or modification.

Or another one


In [5]:
print(witness_1872)


Causes of Variability. WHEN we compare the individuals of the same variety or sub-variety of our older cultivated plants and animals, one of the first points which strikes us is, that they generally differ more from each other than do the individuals of any one species or variety in a state of nature. And if we reflect on the vast diversity of the plants and animals which have been cultivated, and which have varied during all ages under the most different climates and treatment, we are driven to conclude that this great variability is due to our domestic productions having been raised under conditions of life not so uniform as, and somewhat different from, those to which the parent-species had been exposed under nature. There is, also, some probability in the view propounded by Andrew Knight, that this variability may be partly connected with excess of food. It seems clear that organic beings must be exposed during several generations to new conditions to cause any great amount of variation; and that, when the organisation has once begun to vary, it generally continues varying for many generations. No case is on record of a variable organism ceasing to vary under cultivation. Our oldest cultivated plants, such as wheat, still yield new varieties: our oldest domesticated animals are still capable of rapid improvement or modification.

Add them to the CollateX instance as witnesses

This is similar to what we've done in the previous exercise, but instead of the text we put here the variable containing the text read from the files.


In [6]:
collation.add_plain_witness( "1859", witness_1859 )
collation.add_plain_witness( "1860", witness_1860 )
collation.add_plain_witness( "1861", witness_1861 )
collation.add_plain_witness( "1866", witness_1866 )
collation.add_plain_witness( "1869", witness_1869 )
collation.add_plain_witness( "1872", witness_1872 )

New output: HTML table

When you create the collation result, use the output option to specify the output you want. Here, set to hmlt.


In [7]:
alignment_table = collate(collation, layout='vertical', output="html")
print(alignment_table)


1859 1860 1861 1866 1869 1872
- - - Causes of
Variability.
Causes of
Variability.
Causes of
Variability.
WHEN we WHEN we WHEN we WHEN we WHEN we WHEN we
look to look to look to look to compare compare
the individuals of
the same variety or
sub-variety of our
older cultivated
plants and animals,
one of the first
points which strikes
us
the individuals of
the same variety or
sub-variety of our
older cultivated
plants and animals,
one of the first
points which strikes
us
the individuals of
the same variety or
sub-variety of our
older cultivated
plants and animals,
one of the first
points which strikes
us
the individuals of
the same variety or
sub-variety of our
older cultivated
plants and animals,
one of the first
points which strikes
us
the individuals of
the same variety or
sub-variety of our
older cultivated
plants and animals,
one of the first
points which strikes
us
the individuals of
the same variety or
sub-variety of our
older cultivated
plants and animals,
one of the first
points which strikes
us
, , , , - -
is, that they
generally differ
is, that they
generally differ
is, that they
generally differ
is, that they
generally differ
is, that they
generally differ
is, that they
generally differ
much - - - - -
more more more more - more
from each other from each other from each other from each other from each other from each other
, - - - more -
than do the
individuals of any
one species or
variety in a state
of nature.
than do the
individuals of any
one species or
variety in a state
of nature.
than do the
individuals of any
one species or
variety in a state
of nature.
than do the
individuals of any
one species or
variety in a state
of nature.
than do the
individuals of any
one species or
variety in a state
of nature.
than do the
individuals of any
one species or
variety in a state
of nature.
When When When When And if And if
we reflect on the
vast diversity of
the plants and
animals which have
been cultivated, and
which have varied
during all ages
under the most
different climates
and treatment,
we reflect on the
vast diversity of
the plants and
animals which have
been cultivated, and
which have varied
during all ages
under the most
different climates
and treatment,
we reflect on the
vast diversity of
the plants and
animals which have
been cultivated, and
which have varied
during all ages
under the most
different climates
and treatment,
we reflect on the
vast diversity of
the plants and
animals which have
been cultivated, and
which have varied
during all ages
under the most
different climates
and treatment,
we reflect on the
vast diversity of
the plants and
animals which have
been cultivated, and
which have varied
during all ages
under the most
different climates
and treatment,
we reflect on the
vast diversity of
the plants and
animals which have
been cultivated, and
which have varied
during all ages
under the most
different climates
and treatment,
I think I think I think I think - -
we are driven to
conclude that this
we are driven to
conclude that this
we are driven to
conclude that this
we are driven to
conclude that this
we are driven to
conclude that this
we are driven to
conclude that this
greater great great great great great
variability is variability is variability is variability is variability is variability is
simply simply simply simply - -
due to our domestic
productions having
been raised under
conditions of life
not so uniform as,
and somewhat
different from,
those to which the
parent-species
due to our domestic
productions having
been raised under
conditions of life
not so uniform as,
and somewhat
different from,
those to which the
parent-species
due to our domestic
productions having
been raised under
conditions of life
not so uniform as,
and somewhat
different from,
those to which the
parent-species
due to our domestic
productions having
been raised under
conditions of life
not so uniform as,
and somewhat
different from,
those to which the
parent-species
due to our domestic
productions having
been raised under
conditions of life
not so uniform as,
and somewhat
different from,
those to which the
parent-species
due to our domestic
productions having
been raised under
conditions of life
not so uniform as,
and somewhat
different from,
those to which the
parent-species
have have have have had had
been exposed under
nature. There is
been exposed under
nature. There is
been exposed under
nature. There is
been exposed under
nature. There is
been exposed under
nature. There is
been exposed under
nature. There is
, - - - - ,
also also also also also also
, I think , I think , I think , I think , I think -
, some probability
in the view
propounded by Andrew
Knight, that this
variability may be
partly connected
with excess of food.
It seems
, some probability
in the view
propounded by Andrew
Knight, that this
variability may be
partly connected
with excess of food.
It seems
, some probability
in the view
propounded by Andrew
Knight, that this
variability may be
partly connected
with excess of food.
It seems
, some probability
in the view
propounded by Andrew
Knight, that this
variability may be
partly connected
with excess of food.
It seems
, some probability
in the view
propounded by Andrew
Knight, that this
variability may be
partly connected
with excess of food.
It seems
, some probability
in the view
propounded by Andrew
Knight, that this
variability may be
partly connected
with excess of food.
It seems
pretty pretty pretty pretty - -
clear that organic
beings must be
exposed during
several generations
to
clear that organic
beings must be
exposed during
several generations
to
clear that organic
beings must be
exposed during
several generations
to
clear that organic
beings must be
exposed during
several generations
to
clear that organic
beings must be
exposed during
several generations
to
clear that organic
beings must be
exposed during
several generations
to
the the the the - -
new conditions new conditions new conditions new conditions new conditions new conditions
of life of life of life of life - -
to cause any to cause any to cause any to cause any to cause any to cause any
appreciable appreciable appreciable appreciable appreciable great
amount of variation;
and that
amount of variation;
and that
amount of variation;
and that
amount of variation;
and that
amount of variation;
and that
amount of variation;
and that
- - - , , ,
when the
organisation has
once begun to vary,
it generally
when the
organisation has
once begun to vary,
it generally
when the
organisation has
once begun to vary,
it generally
when the
organisation has
once begun to vary,
it generally
when the
organisation has
once begun to vary,
it generally
when the
organisation has
once begun to vary,
it generally
continues continues continues continues con- tinues continues
to vary to vary to vary to vary varying varying
for many
generations. No case
is on record of a
variable
for many
generations. No case
is on record of a
variable
for many
generations. No case
is on record of a
variable
for many
generations. No case
is on record of a
variable
for many
generations. No case
is on record of a
variable
for many
generations. No case
is on record of a
variable
being being being being organism organism
ceasing to ceasing to ceasing to ceasing to ceasing to ceasing to
be variable be variable be variable be variable vary vary
under cultivation.
Our oldest
cultivated plants,
such as wheat, still
under cultivation.
Our oldest
cultivated plants,
such as wheat, still
under cultivation.
Our oldest
cultivated plants,
such as wheat, still
under cultivation.
Our oldest
cultivated plants,
such as wheat, still
under cultivation.
Our oldest
cultivated plants,
such as wheat, still
under cultivation.
Our oldest
cultivated plants,
such as wheat, still
often often often often - -
yield new varieties:
our oldest
domesticated animals
are still capable of
rapid improvement or
modification.
yield new varieties:
our oldest
domesticated animals
are still capable of
rapid improvement or
modification.
yield new varieties:
our oldest
domesticated animals
are still capable of
rapid improvement or
modification.
yield new varieties:
our oldest
domesticated animals
are still capable of
rapid improvement or
modification.
yield new varieties:
our oldest
domesticated animals
are still capable of
rapid improvement or
modification.
yield new varieties:
our oldest
domesticated animals
are still capable of
rapid improvement or
modification.
None

2. Second exercise (Woolf texts). Read from files and HTML2 output.

In the second exercise, repeat the previous steps, now using the texts at "../data/Woolf/Lighthouse-2" and visualizing the output with the more sophisticated HTML option (HTML2).

We will be using different editions of Virginia Woolf's To the lighthouse:

USA = New York: Harcourt, Brace & Company, 1927 (1st USA edition)
UK = Londond: R & R Clark Limited, 1827 (1st UK edition)
EM (EVERYMAN) = London: J. M. Dent & Sons LTD, 1938 (reprint 1952)

The facsimiles and trascriptions of the editions are available at http://woolfonline.com/. Please refer to the information in the data directory for the materials licence.

Note that the output 'html2' is specified this time: colors should appear!


In [8]:
from collatex import *
collation = Collation()
witness_USA = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )
alignment_table = collate(collation, output='html2')


USA UK EM
When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters
,
Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a
Queen Queen queen
’s raising from the mud ’s raising from the mud ’s raising from the mud
to wash - -
a beggar a beggar a beggar
'
s dirty foot s dirty foot s dirty foot
- and washing and washing
- - it
, when she thus admonished them so very severely about that wretched atheist who had chased them , when she thus admonished them so very severely about that wretched atheist who had chased them , when she thus admonished them so very severely about that wretched atheist who had chased them
- - to
—or, speaking accurately, been invited to stay with them —or, speaking accurately, been invited to stay with them —or, speaking accurately, been invited to stay with them
- - in
in in -
the the the
Isles Isle Isle
of Skye. of Skye. of Skye.

3. Third exercise (the sonnet about writing a sonnet). Read from files and HTML2 output.

You now know how to collate texts stored in files. Try with the other materials inside the data directory: the sonnet about writing a sonnet, that you have been using to start encoding in TEI. Collate the two versions of the sonnet.


In [10]:
from collatex import *
collation = Collation()
witness_1707 = open( "../data/sonnet/Lope_soneto_FR_1707.txt", encoding='utf-8' ).read()
witness_1822 = open( "../data/sonnet/Lope_soneto_FR_1822.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "witness1707", witness_1707 )
collation.add_plain_witness( "witness1822", witness_1822 )
alignment_table = collate(collation, output='html2')


witness1707 witness1822
Doris, qui sait qu'aux vers quelquefois je me plais, Me demande un sonnet Doris, qui sait qu'aux vers quelquefois je me plais, Me demande un sonnet
; ,
et je m'en désespère: Quatorze vers, grand Dieu! et je m'en désespère: Quatorze vers, grand Dieu!
le Le
moyen de les faire moyen de les faire
! ?
En voilà cependant déjà quatre de faits. Je ne pouvais d'abord trouver de rime En voilà cependant déjà quatre de faits. Je ne pouvais d'abord trouver de rime
; ,
mais mais
, -
En faisant on apprend à se tirer d'affaire En faisant on apprend à se tirer d'affaire
: .
Poursuivons Poursuivons
, :
les quatrains ne m'étonneront guère les quatrains ne m'étonneront guère
, -
Si du premier tercet je puis faire les frais. Je commence au hasard Si du premier tercet je puis faire les frais. Je commence au hasard
; ,
et et
- ,
si je ne m'abuse, Je n'ai si je ne m'abuse, Je n'ai
pas point
commencé commencé
, -
sans l'aveu de sans l'aveu de
la ma
muse, Puisqu'en si peu de temps je muse, Puisqu'en si peu de temps je
m'en me
tire tire
si du
net. J'entame le second net. J'entame le second
; ,
et ma joie est extrême; Car des vers commandés j'achève le treizième; Comptez s'ils sont quatorze et ma joie est extrême; Car des vers commandés j'achève le treizième; Comptez s'ils sont quatorze
; ,
et voilà le sonnet. et voilà le sonnet.