Collation outputs

  • Introduction
  • In practice
    • Table: HTML
    • Table: JSON
    • Table: XML and XML/TEI
    • Graph: SVG
  • Exercise
  • What's next

Introduction

In this tutorial we will be trying different outputs for our collation, meaning different graphical representations, formats and visualizations of the result.

The visualization of the collation result is an open discussion: several possibilities have been suggested and used and new ones are always being proposed. When the output of the collation is a printed format, such as a book, it is rare to see anything different from the traditional critical apparatus. Now that output formats are more frequently digital (or at least have a digital component), collation tools tend to offer more than one visualization option. This is the case for both Juxta and CollateX. The different visualizations are not incompatible; on the contrary, they can be complementary, highlighting different aspects of the result and suitable for different users or different stages of the workflow.

In the previous tutorials we used the alignment table and the graph. The alignment table, in use since the 1960's, is the equivalent of the matrix of bioinformatic for sequence alignment (for example, strings of DNA). In contrast, the graph is meant to represent the fluidity of the text and its variation. The idea of a graph-oriented model for expressing textual variance has been originally developed by Desmond Schmidt (2008). You can refer to this video, for a presentation on Apparatus vs. Graph – an Interface as Scholarly Argument by Tara Andrews and Joris van Zundert. Other outputs, such as the histogram and the side-by-side visualization offered by Juxta, allow users to visualize the result of the comparison between two witnesses only. This reflects the way the algorithm is built and shows that the graphical representation is connected with the approach to collation that informs the software.

CollateX has two main ways to conceive of the collation result: as a table (with many different formatting options) and as a graph:

  • table formats
    • plain text table (no need to specify the output)
    • HTML table (output='html')
    • HTML vertical table with colors (output='html2')
    • JSON (output='json')
    • XML (output='xml')
    • XML/TEI (output='tei')
  • graph format
    • SVG (output='svg')

In practice

Even though we have already encountered some of these outputs, it is worth going through them one more time focussing on part of the code that needs to change to produce the different formats.

Table: plain text

In this tutorial we will use some simple texts already used in the previous tutorial: the fox and dog example.

Let's start with the most simple output, for which we don't need to specify any output format (note that you can name the variable containing the output anything you like, but in this tutorial we call it alignment_table, table or graph)

In the code cell below the lines starting with a hash (#) are comments and are not executed. They are there in this instance to help you remember what the different parts of the code do. You do not need to use them in your notebook (although sometimes it is helpful to add comments to your code so you remember what things do).


In [1]:
#import the collatex library
from collatex import *
#create an instance of the CollateX engine
collation = Collation()
#add witnesses to the CollateX instance
collation.add_plain_witness( "A", "The quick brown fox jumped over the lazy dog.")
collation.add_plain_witness( "B", "The brown fox jumped over the dog." )
collation.add_plain_witness( "C", "The bad fox jumped over the lazy dog." )
#collate the witnesses and store the result in a variable called 'table'
#as we have not specified an output this will be stored in plain text
table = collate(collation)
#print the collation result
print(table)


+---+-----+-------+-------+---------------------+------+------+
| A | The | quick | brown | fox jumped over the | lazy | dog. |
| B | The | -     | brown | fox jumped over the | -    | dog. |
| C | The | bad   | -     | fox jumped over the | lazy | dog. |
+---+-----+-------+-------+---------------------+------+------+

Table: HTML

Now let's try a different output. This time we still want a table format but instead of it being in plain text we would like it exported in HTML (the markup language used for web pages), and we would like it to be displayed vertically with nice colors to highlight the comparison. To achieve this all you need to do is add the keyword output to the collate command and give it that value html2.


In [2]:
table = collate(collation, output='html2')


A B C
The The The
quick - bad
brown brown -
fox jumped over the fox jumped over the fox jumped over the
lazy - lazy
dog. dog. dog.

Before moving to the other outputs, try html instead of html2.

Table: JSON

The same alignment table can be exported in a variety of formats, as we have seen, including JSON (Javascript Object Notation), a format widely used for storing and interchanging data nowadays. In order to produce JSON as output, we need to specify json as the output format.


In [3]:
table = collate(collation, output='json')
print(table)


{"table": [[[{"_sigil": "A", "_token_array_position": 0, "n": "The", "t": "The "}], [{"_sigil": "A", "_token_array_position": 1, "n": "quick", "t": "quick "}], [{"_sigil": "A", "_token_array_position": 2, "n": "brown", "t": "brown "}], [{"_sigil": "A", "_token_array_position": 3, "n": "fox", "t": "fox "}, {"_sigil": "A", "_token_array_position": 4, "n": "jumped", "t": "jumped "}, {"_sigil": "A", "_token_array_position": 5, "n": "over", "t": "over "}, {"_sigil": "A", "_token_array_position": 6, "n": "the", "t": "the "}], [{"_sigil": "A", "_token_array_position": 7, "n": "lazy", "t": "lazy "}], [{"_sigil": "A", "_token_array_position": 8, "n": "dog", "t": "dog"}, {"_sigil": "A", "_token_array_position": 9, "n": ".", "t": "."}]], [[{"_sigil": "B", "_token_array_position": 11, "n": "The", "t": "The "}], null, [{"_sigil": "B", "_token_array_position": 12, "n": "brown", "t": "brown "}], [{"_sigil": "B", "_token_array_position": 13, "n": "fox", "t": "fox "}, {"_sigil": "B", "_token_array_position": 14, "n": "jumped", "t": "jumped "}, {"_sigil": "B", "_token_array_position": 15, "n": "over", "t": "over "}, {"_sigil": "B", "_token_array_position": 16, "n": "the", "t": "the "}], null, [{"_sigil": "B", "_token_array_position": 17, "n": "dog", "t": "dog"}, {"_sigil": "B", "_token_array_position": 18, "n": ".", "t": "."}]], [[{"_sigil": "C", "_token_array_position": 20, "n": "The", "t": "The "}], [{"_sigil": "C", "_token_array_position": 21, "n": "bad", "t": "bad "}], null, [{"_sigil": "C", "_token_array_position": 22, "n": "fox", "t": "fox "}, {"_sigil": "C", "_token_array_position": 23, "n": "jumped", "t": "jumped "}, {"_sigil": "C", "_token_array_position": 24, "n": "over", "t": "over "}, {"_sigil": "C", "_token_array_position": 25, "n": "the", "t": "the "}], [{"_sigil": "C", "_token_array_position": 26, "n": "lazy", "t": "lazy "}], [{"_sigil": "C", "_token_array_position": 27, "n": "dog", "t": "dog"}, {"_sigil": "C", "_token_array_position": 28, "n": ".", "t": "."}]]], "witnesses": ["A", "B", "C"]}

Table: XML and XML/TEI

We can use the same procedure in order to export the table in XML or XML/TEI (the latter produces a condensed version of the table only listing witnesses at points of divergence - also called a negative apparatus). To do this you just specify a different output format. Let's start with the XML output (that you can later post-process using XSLT or other tools).


In [4]:
table = collate(collation, output='xml')
print(table)


<root><app><rdg wit="#A">The </rdg><rdg wit="#B">The </rdg><rdg wit="#C">The </rdg></app><app><rdg wit="#A">quick </rdg><rdg wit="#C">bad </rdg></app><app><rdg wit="#A">brown </rdg><rdg wit="#B">brown </rdg></app><app><rdg wit="#A">fox jumped over the </rdg><rdg wit="#B">fox jumped over the </rdg><rdg wit="#C">fox jumped over the </rdg></app><app><rdg wit="#A">lazy </rdg><rdg wit="#C">lazy </rdg></app><app><rdg wit="#A">dog.</rdg><rdg wit="#B">dog.</rdg><rdg wit="#C">dog.</rdg></app></root>

And, finally, you can test the XML/TEI output that produces XML following the TEI parallel segmentation encoding guidelines.


In [5]:
table = collate(collation, output='tei')
print(table)


<?xml version="1.0" ?><cx:apparatus xmlns="http://www.tei-c.org/ns/1.0" xmlns:cx="http://interedition.eu/collatex/ns/1.0">The <app><rdg wit="#A">quick</rdg><rdg wit="#C">bad</rdg></app> <app><rdg wit="#A #B">brown</rdg></app> fox jumped over the <app><rdg wit="#A #C">lazy</rdg></app> dog.</cx:apparatus>

Graph: SVG

And now for something different: try with the graph, exported in the SVG format


In [6]:
graph = collate(collation, output='svg')


%3 1 start exact: 0 3 The exact: 1 The A, B, C 1->3 A, B, C 2 end exact: 7 4 quick exact: 2 quick A 3->4 A 5 brown exact: 3 brown A, B 3->5 B 9 bad exact: 2 bad C 3->9 C 4->5 A 6 fox jumped over the exact: 4 fox jumped over the A, B, C 5->6 A, B 7 lazy exact: 5 lazy A, C 6->7 A, C 8 dog. exact: 6 dog. A, B, C 6->8 B 7->8 A, C 8->2 A, B, C 9->6 C

Exercise

In this tutorial we have used the fox and dog example. Now try to produce a JSON or TEI output for Woolf's To the lighthouse, that we've been using in the previous notebook. The data are stored in data/Woolf/Lighthouse-1.


In [7]:
from collatex import *
collation = Collation()
witness_USA = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )
alignment_table = collate(collation, output='svg')


%3 1 start exact: 0 3 When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better— her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters exact: 1 When she looked in the glass and saw her hair grey, her cheek sunk, at fifty, she thought, possibly she might have managed things better—her husband; money; his books. But for her own part she would never for a single second regret her decision, evade difficulties, or slur over duties. She was now formidable to behold, and it was only in silence, looking up from their plates, after she had spoken so severely about Charles Tansley, that her daughters USA, UK, EM 1->3 USA, UK, EM 2 end exact: 21 4 , exact: 2 , USA 3->4 USA 19 exact: 2 UK, EM 3->19 UK, EM 5 Prue, Nancy, Rose— could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’ s eyes, honour her strange severity, her extreme courtesy, like a exact: 3 Prue, Nancy, Rose—could sport with infidel ideas which they had brewed for themselves of a life different from hers; in Paris, perhaps; a wilder life; not always taking care of some man or other; for there was in all their minds a mute questioning of deference and chivalry, of the Bank of England and the Indian Empire, of ringed fingers and lace, though to them all there was something in this of the essence of beauty, which called out the manliness in their girlish hearts, and made them, as they sat at table beneath their mother’s eyes, honour her strange severity, her extreme courtesy, like a USA, UK, EM 4->5 USA 6 Queen exact: 4 Queen USA, UK 5->6 USA, UK 22 queen exact: 4 queen EM 5->22 EM 7 ’ s raising from the mud exact: 5 ’s raising from the mud USA, UK, EM 6->7 USA, UK 8 to wash exact: 6 to wash USA 7->8 USA 9 a beggar exact: 7 a beggar USA, UK, EM 7->9 UK, EM 8->9 USA 10 exact: 8 USA, UK 9->10 USA, UK 23 ' exact: 8 ' EM 9->23 EM 11 s dirty foot exact: 9 s dirty foot USA s dirty foot UK, EM 10->11 USA, UK 12 , when she thus admonished them so very severely about that wretched atheist who had chased them exact: 12 , when she thus admonished them so very severely about that wretched atheist who had chased them USA, UK , when she thus admonished them so very severely about that wretched atheist who had chased them EM 11->12 USA 20 and washing exact: 10 and washing UK and washing EM 11->20 UK, EM 13 — or, speaking accurately, been invited to stay with them exact: 14 —or, speaking accurately, been invited to stay with them USA, UK —or, speaking accurately, been invited to stay with them EM 12->13 USA, UK 25 to exact: 13 to EM 12->25 EM 14 exact: 16 USA, UK, EM 13->14 USA, UK 26 in exact: 15 in EM 13->26 EM 15 in exact: 17 in USA, UK 14->15 USA, UK 16 the exact: 18 the USA, UK, EM 14->16 EM 15->16 USA, UK 17 Isles exact: 19 Isles USA 16->17 USA 21 Isle exact: 19 Isle UK, EM 16->21 UK, EM 18 of Skye. exact: 20 of Skye. USA, EM of Skye. UK 17->18 USA 18->2 USA, UK, EM 19->5 UK, EM 20->12 UK 24 it exact: 11 it EM 20->24 EM 21->18 UK, EM 22->7 EM 23->11 EM 24->12 EM 25->13 EM 26->14 EM