Collation outside the notebook

Outline

  • Create a Python file
  • Run a script
    • In the notebook
    • In the terminal
  • Input files
  • Output file
  • Exercise

Running from the command line

As Tara mentioned yesterday, to run a Python file from the command line type:

python myfile.py

replacing “myfile.py” with the name of your Python program.

Where to put your Python file

You can put your Python file anywhere you want, except that it has to be able to find any files it opens. You might want to create a separate directory for this exercise on your Desktop, in your fork of the Institute repo, or in another workspace.

Create a Python file

You can create a Python file in a plain text editor, such as BBEdit or Notepad++. Don’t use a word processor! Let’s start with the following:


In [ ]:
from collatex import *
collation = Collation()
collation.add_plain_witness( "A", "The quick brown fox jumped over the lazy dog.")
collation.add_plain_witness( "B", "The brown fox jumped over the dog." )
collation.add_plain_witness( "C", "The bad fox jumped over the lazy dog.")
table = collate(collation)
print(table)

Now save the file as 'collate.py', inside your work directory (see above).

Run the script

Open the terminal and navigate to the folder where your script is. Then type

python collate.py

If you are not in the directory where your script is, you need to specify the path to that file. For example if you are in your home directory and your script is in a python_scripts subdirectory, the command would look like:

python python_scripts/collate.py

The output goes by default to stdout, which, also by default, is displayed on the screen.

Input files

In the example above we’ve supplied the full text of our witnesses, which is impractical in Real Life. We can produce the same sort of collation of input files that we read from the file system, using, in this exercise, the data in the schedule/week_2/fixtures/Darwin/txt subdirectory of the Institute repo and producing an output in XML/TEI. The code will look like this:


In [ ]:
from collatex import *
collation = Collation()
with open( "../fixtures/Darwin/txt/darwin1859_par1.txt", encoding='utf-8' ) as witness_1859, \
    open( "../fixtures/Darwin/txt/darwin1860_par1.txt", encoding='utf-8' ) as witness_1860, \
    open( "../fixtures/Darwin/txt/darwin1861_par1.txt", encoding='utf-8' ) as witness_1861, \
    open( "../fixtures/Darwin/txt/darwin1866_par1.txt", encoding='utf-8' ) as  witness_1866, \
    open( "../fixtures/Darwin/txt/darwin1869_par1.txt", encoding='utf-8' ) as witness_1869, \
    open( "../fixtures/Darwin/txt/darwin1872_par1.txt", encoding='utf-8' ) as witness_1872:
        collation.add_plain_witness( "1859", witness_1859.read())
        collation.add_plain_witness( "1860", witness_1860.read())
        collation.add_plain_witness( "1861", witness_1861.read())
        collation.add_plain_witness( "1866", witness_1866.read())
        collation.add_plain_witness( "1869", witness_1869.read())
        collation.add_plain_witness( "1872", witness_1872.read())
TEI = collate(collation, output='tei')
print(TEI)

Output file

In real life we could also specify an outfile file for our results, instead of letting them go to stdout. Here’s how that would look:


In [ ]:
from collatex import *
collation = Collation()
with open( "../fixtures/Darwin/txt/darwin1859_par1.txt", encoding='utf-8' ) as witness_1859, \
    open( "../fixtures/Darwin/txt/darwin1860_par1.txt", encoding='utf-8' ) as witness_1860, \
    open( "../fixtures/Darwin/txt/darwin1861_par1.txt", encoding='utf-8' ) as witness_1861, \
    open( "../fixtures/Darwin/txt/darwin1866_par1.txt", encoding='utf-8' ) as  witness_1866, \
    open( "../fixtures/Darwin/txt/darwin1869_par1.txt", encoding='utf-8' ) as witness_1869, \
    open( "../fixtures/Darwin/txt/darwin1872_par1.txt", encoding='utf-8' ) as witness_1872:
        collation.add_plain_witness( "1859", witness_1859.read())
        collation.add_plain_witness( "1860", witness_1860.read() )
        collation.add_plain_witness( "1861", witness_1861.read() )
        collation.add_plain_witness( "1866", witness_1866.read() )
        collation.add_plain_witness( "1869", witness_1869.read() )
        collation.add_plain_witness( "1872", witness_1872.read() )
with open('output-tei.xml', 'w', encoding='utf-8') as outfile: 
    TEI = collate(collation, output='tei')
    outfile.write(TEI)

When we run the script, the result won't appear below anymore. But a new file, 'outfile-tei.xml' has been created the directory in which we’ve run the script. You can examine it with cat or less. Note that it isn’t a complete TEI document, or even a complete well-formed XML document, so you can’t open it in \<oXygen/> without raising an error. In Real Life you could copy and paste it into a TEI wrapper, or you could modify your Python script to create that wrapper for you around the collated output.

You can create an output file also running your script in the Jupyter notebook! Depending on the path you specify, it will be created in your 'Notebook' directory or elsewhere.

Exercise

Create a new Python script that produces an output in JSON, using the data in 'fixtures/Woolf/Lighthouse-1'.