In this tutorial we will be trying different outputs for our collation, meaning different graphical representations, formats and visualizations of the result.
The visualization of the collation result is an open discussion: several possibilities have been suggested and used and new ones are always being proposed. When the output of the collation is a printed format, such as a book, it is rare to see anything other than the traditional critical apparatus. Now that output formats are more frequently digital (or at least have a digital component), collation tools tend to offer more than one visualization option. This is the case for both Juxta and CollateX. The different visualizations are not incompatible; on the contrary, they can be complementary, highlighting different aspects of the result and suitable for different users or different stages of the workflow.
In the previous tutorials we used the alignment table and the graph. The alignment table, in use since the 1960s, visualizes the information as a matrix, or table. In comparison, the graph is is able to represent the fluidity of the text and its variation. The idea of a graph-oriented model for expressing textual variance was originally developed by Desmond Schmidt (2008). You can refer to this video, for a presentation on Apparatus vs. Graph – an Interface as Scholarly Argument by Tara Andrews and Joris van Zundert.
Other outputs, such as the histogram and the side-by-side visualizations offered by Juxta, allow users to visualize the result of the comparison between two witnesses only. This reflects the way the algorithm is built, which is to say that the graphical representation is connected to the approach to collation that informs the software.
CollateX has two main ways to conceive of the collation result: as a table (with many different formatting options) and as a graph:
Even though we have already encountered some of these outputs, it is worth going through them one more time to focus on the part of the code that controls the different formats.
In this tutorial we will use some simple texts already used in the previous tutorial: the fox and dog example.
Let's start with the most simple output, for which we don't need to specify any output format (note that you can name the variable containing the output anything you like, but in this tutorial we call it alignment_table, table or graph)
In the code cell below, the lines starting with a hash (#) are comments and are not executed. They are there in this instance to help you remember what the different parts of the code do. You do not need to use them in your notebook (although sometimes it is helpful to add comments to your code so you remember what things do).
In [ ]:
# import the collatex library
from collatex import *
# create an instance a Collation object
collation = Collation()
# add witnesses to the collateX instance
collation.add_plain_witness( "A", "The quick brown fox jumped over the lazy dog.")
collation.add_plain_witness( "B", "The brown fox jumped over the dog." )
collation.add_plain_witness( "C", "The bad fox jumped over the lazy dog." )
# collate the witnesses and store the result in a variable called 'table'
# as we have not specified an output, it will default to a plain text table
table = collate(collation)
# print the collation result
print(table)
Now let's try a different output. This time we still want a table format, but instead of rendering it as plain text we would like it exported in HTML and we would like it to be displayed vertically, using color to highlight the moments of comparison. To achieve this all you need to do is add the parameter output to the collate
command and give it that value 'html2'.
In [ ]:
table = collate(collation, output='html2')
Before moving to the other outputs, try to produce the simple HTML output by changing the code above. The value required in the output keyword should be html. The colored HTML output is always oriented vertically. The regular HTML option defaults to horizontal output, which you can override by specifying a layout
parameter with a value of 'vertical'. Try that, too.
The same alignment table can be exported in a variety of formats, as we have seen, including JSON (Javascript Object Notation), a format widely used for storing and interchanging data. In order to produce JSON as output, we need to specify json as the output format.
In [ ]:
table = collate(collation, output='json')
print(table)
We can use the same procedure in order to export the table in XML or XML/TEI (the latter produces a condensed version of the table only listing witnesses at points of divergence - also called a negative apparatus). To do this you just specify a different output format. Let's start with the XML output (that you can later post-process using XSLT or other tools).
In [ ]:
table = collate(collation, output='xml')
print(table)
And, finally, you can test the XML/TEI output that produces XML following the TEI parallel segmentation encoding guidelines.
In [ ]:
table = collate(collation, output='tei')
print(table)
In [ ]:
graph = collate(collation, output='svg')
In this tutorial we have used the fox and dog example. Now try to produce a JSON or TEI output of the first paragraph of Darwin's On the origin of species, that we have already used in the first tutorial. You can find the data in fixtures/Darwin/txt (only the first paragraph: xxxx_par1).
Alternatively, or if you still have time, you can use the data in fixtures/Woolf/Lighthouse-1 and produce new outputs.
In the next tutorial, Collate outside the notebook, we will leave the notebook and learn how to create and run Python scripts using PyCharm and the terminal, and how to save the collation results in a new file.