Here it is another way to run the scripts you produced in the previous tutorials (note: even if technically they mean different things, we will use interchangeably the words code, script and program). This tutorial assumes that you went already through tutorials on Collate plain texts (1 and 2) and on the different Collation ouputs. Everything that we will do here, is possible also in Jupyter notebook and certain section, as Input files is a recap of something already seen in the previous tutorials.
In the Command line tutorial, we have briefly seen how to run a Python program. In the terminal, type
python myfile.py
replacing “myfile.py” with the name of your Python program.
In this tutorial, we will create Python programs. Where to save the files that you will create? Remember that we created a directory for this workshop, called 'Workshop'. Now let's create a sub-directory, called 'Scripts', to store all our Python programs.
If you are using PyCharm for these exercises it is worth setting up a project that will automatically save the files you create to the 'Scripts' directory you just created (see above). To do this open PyCharm and from the File menu select New Project. In the dialogue box that appears navigate to the 'scripts' directory you made for this workshop by clicking the button with '...' on it, on the right of the location box. Then click create. This will create a new project that will save all of the files to the folder you have selected.
Let's do this step by step. First of all, create a python file.
In [1]:
from collatex import *
collation = Collation()
collation.add_plain_witness( "A", "The quick brown fox jumped over the lazy dog.")
collation.add_plain_witness( "B", "The brown fox jumped over the dog." )
collation.add_plain_witness( "C", "The bad fox jumped over the lazy dog.")
table = collate(collation)
print(table)
Open the terminal and navigate to the folder where your script is, using the 'cd' command (again, refer to the Command line tutorial, if you don't know what this means). Then type
python collate.py
If you are not in the directory where your script is, you should specify the path for that file. If you are in the Home directory, for example, the command would look like
python Workshop/Scripts/collate.py
The result will appear below in the terminal.
In the first tutorial, we saw how to use texts stored in files as witnesses for the collation. We used the open
command to open each text file and appoint the contents to a variable with an appropriately chosen name; and we don't forget the encoding="utf-8"
bit!
Let's try to do the same in our script 'collate.py', using the data in fixtures/Darwin/txt (only the first paragraph: _par1) and producing an output in XML/TEI. The code will look like this:
In [2]:
from collatex import *
collation = Collation()
witness_1859 = open( "../fixtures/Darwin/txt/darwin1859_par1.txt", encoding='utf-8' ).read()
witness_1860 = open( "../fixtures/Darwin/txt/darwin1860_par1.txt", encoding='utf-8' ).read()
witness_1861 = open( "../fixtures/Darwin/txt/darwin1861_par1.txt", encoding='utf-8' ).read()
witness_1866 = open( "../fixtures/Darwin/txt/darwin1866_par1.txt", encoding='utf-8' ).read()
witness_1869 = open( "../fixtures/Darwin/txt/darwin1869_par1.txt", encoding='utf-8' ).read()
witness_1872 = open( "../fixtures/Darwin/txt/darwin1872_par1.txt", encoding='utf-8' ).read()
collation.add_plain_witness( "1859", witness_1859 )
collation.add_plain_witness( "1860", witness_1860 )
collation.add_plain_witness( "1861", witness_1861 )
collation.add_plain_witness( "1866", witness_1866 )
collation.add_plain_witness( "1869", witness_1869 )
collation.add_plain_witness( "1872", witness_1872 )
table = collate(collation, output='tei')
print(table)
Looking at the result this way is not very practical, especially if we want to save it. Better store the result in a new file, that we call 'outfile' (but you can give it another name if you prefer). We need to add this chunk of code, in order to create and open 'outfile':
In [3]:
outfile = open('outfile.txt', 'w', encoding='utf-8')
If we are going to produce an output in XML/TEI, we can specify that 'outfile' will be a XML file, and the same goes for any other format. Here below there are two examples, the first for a XML output file, the second for a JSON output file:
In [4]:
outfile = open('outfile.xml', 'w', encoding='utf-8')
outfile = open('outfile.json', 'w', encoding='utf-8')
Now we add the outfile chunk to our code above. The new script is:
In [5]:
from collatex import *
collation = Collation()
witness_1859 = open( "../fixtures/Darwin/txt/darwin1859_par1.txt", encoding='utf-8' ).read()
witness_1860 = open( "../fixtures/Darwin/txt/darwin1860_par1.txt", encoding='utf-8' ).read()
witness_1861 = open( "../fixtures/Darwin/txt/darwin1861_par1.txt", encoding='utf-8' ).read()
witness_1866 = open( "../fixtures/Darwin/txt/darwin1866_par1.txt", encoding='utf-8' ).read()
witness_1869 = open( "../fixtures/Darwin/txt/darwin1869_par1.txt", encoding='utf-8' ).read()
witness_1872 = open( "../fixtures/Darwin/txt/darwin1872_par1.txt", encoding='utf-8' ).read()
outfile = open('outfile-tei.xml', 'w', encoding='utf-8')
collation.add_plain_witness( "1859", witness_1859 )
collation.add_plain_witness( "1860", witness_1860 )
collation.add_plain_witness( "1861", witness_1861 )
collation.add_plain_witness( "1866", witness_1866 )
collation.add_plain_witness( "1869", witness_1869 )
collation.add_plain_witness( "1872", witness_1872 )
table = collate(collation, output='tei')
print(table, file=outfile)
When we run the script, the result won't appear below anymore. But a new file, 'outfile-tei.xml' has been created in the directory 'Scripts'. Check what's inside!
If you want to change the location of the output file, you can specify a different path. If, for example, you want your output file in the Desktop, you would write
In [6]:
outfile = open('C:/Users/Elena/Desktop/output.xml', 'w', encoding='utf-8')
N.b.: you can create an output file also running your script in the Jupyter notebook! Depending on the path you specify, it will be created in your 'Notebook' directory or elsewhere.
Create a new Python script that produces an output in JSON, using the data in 'fixtures/Woolf/Lighthouse-1' (remember? We use the same data in another tutorial). Pay attention to indicate correctly the input files, the output file (and its extension) and the output format.
In [ ]: