Collation outputs

Introduction
In practice
- Table: HTML
- Table: JSON
- Table: XML and XML/TEI
- Graph: SVG
Exercise
What's next

Introduction

In this tutorial we will be trying different outputs for our collation, meaning different graphical representations, formats and visualizations of the result.

The visualization of the collation result is an open discussion: several possibilities have been suggested and used and new ones are always being proposed. When the output of the collation is a printed format, such as a book, it is rare to see anything different from the traditional critical apparatus. Now that output formats are more frequently digital (or at least have a digital component), collation tools tend to offer more than one visualization option. This is the case for both Juxta and CollateX. The different visualizations are not incompatible; on the contrary, they can be complementary, highlighting different aspects of the result and suitable for different users or different stages of the workflow.

In the previous tutorials we used the alignment table and the graph. The alignment table, in use since the 1960's, is the equivalent of the matrix of bioinformatic for sequence alignment (for example, strings of DNA). In contrast, the graph is meant to represent the fluidity of the text and its variation. The idea of a graph-oriented model for expressing textual variance has been originally developed by Desmond Schmidt (2008). You can refer to this video, for a presentation on Apparatus vs. Graph – an Interface as Scholarly Argument by Tara Andrews and Joris van Zundert. Other outputs, such as the histogram and the side-by-side visualization offered by Juxta, allow users to visualize the result of the comparison between two witnesses only. This reflects the way the algorithm is built and shows that the graphical representation is connected with the approach to collation that informs the software.

CollateX has two main ways to conceive of the collation result: as a table (with many different formatting options) and as a graph:

table formats
- plain text table (no need to specify the output)
- HTML table (output='html')
- HTML vertical table with colors (output='html2')
- JSON (output='json')
- XML (output='xml')
- XML/TEI (output='tei')
graph format
- SVG (output='svg')

In practice

Even though we have already encountered some of these outputs, it is worth going through them one more time focussing on part of the code that needs to change to produce the different formats.

Table: plain text

In this tutorial we will use some simple texts already used in the previous tutorial: the fox and dog example.

Let's start with the most simple output, for which we don't need to specify any output format (note that you can name the variable containing the output anything you like, but in this tutorial we call it alignment_table, table or graph)

In the code cell below the lines starting with a hash (#) are comments and are not executed. They are there in this instance to help you remember what the different parts of the code do. You do not need to use them in your notebook (although sometimes it is helpful to add comments to your code so you remember what things do).



In [1]:

    
#import the collatex library
from collatex import *
#create an instance of the CollateX engine
collation = Collation()
#add witnesses to the CollateX instance
collation.add_plain_witness( "A", "The quick brown fox jumped over the lazy dog.")
collation.add_plain_witness( "B", "The brown fox jumped over the dog." )
collation.add_plain_witness( "C", "The bad fox jumped over the lazy dog." )
#collate the witnesses and store the result in a variable called 'table'
#as we have not specified an output this will be stored in plain text
table = collate(collation)
#print the collation result
print(table)









    



+---+-----+-------+-------+---------------------+------+------+
| A | The | quick | brown | fox jumped over the | lazy | dog. |
| B | The | -     | brown | fox jumped over the | -    | dog. |
| C | The | bad   | -     | fox jumped over the | lazy | dog. |
+---+-----+-------+-------+---------------------+------+------+

Table: HTML

Now let's try a different output. This time we still want a table format but instead of it being in plain text we would like it exported in HTML (the markup language used for web pages), and we would like it to be displayed vertically with nice colors to highlight the comparison. To achieve this all you need to do is add the keyword output to the collate command and give it that value html2.



In [2]:

    
table = collate(collation, output='html2')









    





 
  A
  B
  C
 
 
  The
  The
  The
 
 
  quick
  -
  bad
 
 
  brown
  brown
  -
 
 
  fox jumped over the
  fox jumped over the
  fox jumped over the
 
 
  lazy
  -
  lazy
 
 
  dog.
  dog.
  dog.

Before moving to the other outputs, try html instead of html2.

Table: JSON

The same alignment table can be exported in a variety of formats, as we have seen, including JSON (Javascript Object Notation), a format widely used for storing and interchanging data nowadays. In order to produce JSON as output, we need to specify json as the output format.



In [3]:

    
table = collate(collation, output='json')
print(table)









    



{"table": [[[{"_sigil": "A", "_token_array_position": 0, "n": "The", "t": "The "}], [{"_sigil": "A", "_token_array_position": 1, "n": "quick", "t": "quick "}], [{"_sigil": "A", "_token_array_position": 2, "n": "brown", "t": "brown "}], [{"_sigil": "A", "_token_array_position": 3, "n": "fox", "t": "fox "}, {"_sigil": "A", "_token_array_position": 4, "n": "jumped", "t": "jumped "}, {"_sigil": "A", "_token_array_position": 5, "n": "over", "t": "over "}, {"_sigil": "A", "_token_array_position": 6, "n": "the", "t": "the "}], [{"_sigil": "A", "_token_array_position": 7, "n": "lazy", "t": "lazy "}], [{"_sigil": "A", "_token_array_position": 8, "n": "dog", "t": "dog"}, {"_sigil": "A", "_token_array_position": 9, "n": ".", "t": "."}]], [[{"_sigil": "B", "_token_array_position": 11, "n": "The", "t": "The "}], null, [{"_sigil": "B", "_token_array_position": 12, "n": "brown", "t": "brown "}], [{"_sigil": "B", "_token_array_position": 13, "n": "fox", "t": "fox "}, {"_sigil": "B", "_token_array_position": 14, "n": "jumped", "t": "jumped "}, {"_sigil": "B", "_token_array_position": 15, "n": "over", "t": "over "}, {"_sigil": "B", "_token_array_position": 16, "n": "the", "t": "the "}], null, [{"_sigil": "B", "_token_array_position": 17, "n": "dog", "t": "dog"}, {"_sigil": "B", "_token_array_position": 18, "n": ".", "t": "."}]], [[{"_sigil": "C", "_token_array_position": 20, "n": "The", "t": "The "}], [{"_sigil": "C", "_token_array_position": 21, "n": "bad", "t": "bad "}], null, [{"_sigil": "C", "_token_array_position": 22, "n": "fox", "t": "fox "}, {"_sigil": "C", "_token_array_position": 23, "n": "jumped", "t": "jumped "}, {"_sigil": "C", "_token_array_position": 24, "n": "over", "t": "over "}, {"_sigil": "C", "_token_array_position": 25, "n": "the", "t": "the "}], [{"_sigil": "C", "_token_array_position": 26, "n": "lazy", "t": "lazy "}], [{"_sigil": "C", "_token_array_position": 27, "n": "dog", "t": "dog"}, {"_sigil": "C", "_token_array_position": 28, "n": ".", "t": "."}]]], "witnesses": ["A", "B", "C"]}

Table: XML and XML/TEI

We can use the same procedure in order to export the table in XML or XML/TEI (the latter produces a condensed version of the table only listing witnesses at points of divergence - also called a negative apparatus). To do this you just specify a different output format. Let's start with the XML output (that you can later post-process using XSLT or other tools).



In [4]:

    
table = collate(collation, output='xml')
print(table)









    



<root><app><rdg wit="#A">The </rdg><rdg wit="#B">The </rdg><rdg wit="#C">The </rdg></app><app><rdg wit="#A">quick </rdg><rdg wit="#C">bad </rdg></app><app><rdg wit="#A">brown </rdg><rdg wit="#B">brown </rdg></app><app><rdg wit="#A">fox jumped over the </rdg><rdg wit="#B">fox jumped over the </rdg><rdg wit="#C">fox jumped over the </rdg></app><app><rdg wit="#A">lazy </rdg><rdg wit="#C">lazy </rdg></app><app><rdg wit="#A">dog.</rdg><rdg wit="#B">dog.</rdg><rdg wit="#C">dog.</rdg></app></root>

And, finally, you can test the XML/TEI output that produces XML following the TEI parallel segmentation encoding guidelines.



In [5]:

    
table = collate(collation, output='tei')
print(table)









    



<?xml version="1.0" ?><cx:apparatus xmlns="http://www.tei-c.org/ns/1.0" xmlns:cx="http://interedition.eu/collatex/ns/1.0">The <app><rdg wit="#A">quick</rdg><rdg wit="#C">bad</rdg></app> <app><rdg wit="#A #B">brown</rdg></app> fox jumped over the <app><rdg wit="#A #C">lazy</rdg></app> dog.</cx:apparatus>

Graph: SVG

And now for something different: try with the graph, exported in the SVG format



In [6]:

    
graph = collate(collation, output='svg')

Exercise

In this tutorial we have used the fox and dog example. Now try to produce a JSON or TEI output for Woolf's To the lighthouse, that we've been using in the previous notebook. The data are stored in data/Woolf/Lighthouse-1.



In [7]:

    
from collatex import *
collation = Collation()
witness_USA = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-USA.txt", encoding='utf-8' ).read()
witness_UK = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-UK.txt", encoding='utf-8' ).read()
witness_EM = open( "../data/Woolf/Lighthouse-1/Lighthouse-1-EM.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "USA", witness_USA )
collation.add_plain_witness( "UK", witness_UK )
collation.add_plain_witness( "EM", witness_EM )
alignment_table = collate(collation, output='svg')

A	B	C
The	The	The
quick	-	bad
brown	brown	-
fox jumped over the	fox jumped over the	fox jumped over the
lazy	-	lazy
dog.	dog.	dog.