This tutorial explains how to install Python, CollateX, and Jupyter notebook for use in the DiXiT Workshop “Code and collation: training textual scholars”, Amsterdam, 2–4 November 2016. To avoid delaying the start of the workshop, please install the software in advance (if you get stuck, do as much as you can and we’ll help you finish the process when you arrive).
If you have already installed CollateX, make sure that you have the most recent version by running:
pip install --upgrade collatex
If not, here are the installation instructions in a nutshell:
pip install collatex
pip install python-levenshtein
(but see the note below for Windows)pip install graphviz
If you are not sure what all that means, read on!
To run CollateX, you need first to install Python 3 and then the CollateX module, along with some other programs, packages, and modules upon which CollateX depends. Here’s how to do that in Mac OS X, Ubuntu Linux, and Windows. The process described below will probably take between thirty minutes and an hour, depending on how familiar you are with installing programs on your system. The good news is that you only have to do the installation once, and launching CollateX after that will take almost no time. This tutorial assumes that you are running Mac OS X 10.11 or later, Windows 7, 8, or 10, or Ubuntu Linux 14.04 LTS or later. In all of the steps below, if you are prompted to enter your password, you should do so.
Your system may already have some version of Python installed, but we recommend that you install and use the Anaconda Python distribution. CollateX will work with other distributions of Python 3 (not all functionality is available with Python 2), but the installation and configuration is more complicated, so for the workshop we are using Anaconda. Installing Anaconda according to the instructions on their site should not interfere with other existing Python versions on your system.
For Mac OS, Linux, and Windows, the Python installation instructions are the same: download and install Anaconda Python from http://continuum.io/downloads.html. Be sure to choose the link for Python 3.5 (not 2.7). If you are curious, there’s a useful Anaconda quick-start tutorial at https://store.continuum.io/static/img/Anaconda-Quickstart.pdf.
The Anaconda package installer on Linux is not a clickable installation program as on Mac OS and Windows. You will need to choose to save the file, and then make a note of where the installer was saved (most likely your Downloads folder). You will then open a command line window (Ctrl-Alt-T on Ubuntu) to type the command
bash Downloads/Anaconda3-4.2.0-Linux-x86_64.sh
where Downloads is replaced with the name of the folder in which you saved Anaconda, if it is different.
Some users have reported errors when trying to install from the Download directory. Should that happen to you, try moving the file to your home directory and installing from there.
When asked, say yes to everything. When the installation is finished, type exit
to close the command line window. (You need to do this, even though you will open a new one shortly!)
Once you have installed Python, as described above, you need to install CollateX, along with a few supporting files (libraries). To do this, you will need to work with a command line window. Each operating system makes a terminal available by default, without requiring special installation:
A window will open that displays a command line, a place where you can type instructions to be executed on the computer, with a prompt that might look something like this on a Mac OS Terminal:
Taras-Mac:~ tara$
or this in the Windows Powershell:
PS C:\Users\Tara L Andrews>
or this in a Linux terminal:
tla@ubuntu:~$
Now you are ready to type the commands that come next.
Windows users: Some of you may have used cmd.exe in the past to work at the command line. We recommend Powershell (or, for Windows 10 users, bash) because it uses many of the same commands that have always been in use on Unix-like systems, and so makes it easier for you to follow generic command-line instructions such as those we will be giving in the workshop. If you stick to cmd.exe you do so at your own risk, and the commands described below may not all be available.
The easiest way to install CollateX from the command line is with pip, a Python package manager. pip comes bundled with Anaconda, so you don’t have to install it separately, and you can install CollateX and the most of the libraries on which it depends by typing:
pip install collatex
CollateX relies on this library to do near (inexact) matching of words.
Type the following at the command line:
pip install python-levenshtein
Mac OS users: You may get a popup window telling you that you require the command-line developer tools. If you get this window, choose Install. When the installation is finished, run the command again.
Once this is done, you can check that everything worked by opening a terminal, typing the following command, and hitting the Enter key:
python -c "import Levenshtein; print('This works.')"
Windows users can try either of these precompiled packages depending on their Windows being 32 bit or 64 bit:
pip install http://collatex.obdurodon.org/python_Levenshtein-0.12.0-cp35-none-win32.whl
(if your system is a 32-bit one)pip install http://collatex.obdurodon.org/python_Levenshtein-0.12.0-cp35-none-win_amd64.whl
(if your system is 64-bit)These files are mirrored from http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-levenshtein. At the time we are writing this tutorial, we’re linking to the Levenshtein files for Python 3.5 (that’s what the “cp35” means in the filenames), which is the current Anaconda version.
Windows users with an installed and configured C++ compiler can try:
pip install python-levenshtein
As noted this will succeed only if you have a C++ compiler configured (most Windows users do not).
Once installed the package you can check that everything worked with the following command:
python -c "import Levenshtein; print('This works.')"
Graphviz is a program for creating graphic representations, including the variant graphs sometimes used in CollateX (see the examples at http://stemmaweb.net/stemmaweb/relation/help/Latin). Graphviz is required by CollateX only for viewing variant graphs. We recommend installing it for the workshop, but you can perform collations without it. Note that in addition to installing Graphviz, all users need to install Python bindings for Graphviz, which is a separate step, described in Section 3.5, below.
The easiest way to install Graphviz is to download the appropriate installer from the Graphviz download page (you will need to accept the license.) On Mac, this will be the mountainlion current stable release. The Graphiz page is often inaccessible; should this happen you can use the Internet Archive Wayback Machine.
If the installer refuses to run when you double-click it, then you can do the following:
This is a useful trick to remember for installing any software that you know you want, but that your Mac doesn’t trust.
Graphviz can be installed from the Terminal on Ubuntu with the command:
sudo apt-get install graphviz
The easiest way to install Graphviz on Windows is to download the appropriate installer from the Graphviz download page (you will need to accept the license.) The Graphiz page is often inaccessible; should this happen you can use the Internet Archive Wayback Machine. On Windows, use the .msi file if you can.
To confirm that the path has been set correctly, close any open Powershell or bash window you have, open a new one, and run the command where.exe dot
. Do not leave off the “.exe”! The output should look something like:
PS C:\Users\Tara L Andrews> where.exe dot
C:\Program Files (x86)\Graphviz2.38\bin\dot.exe
In addition to Graphviz itself, all users on all operating systems also need to install Python bindings (support) for Graphviz, which you can do at the command line by typing:
pip install graphviz
Note that the preceding line does not install Graphviz; what it installs is just the Python bindings for Graphviz. You also need to install Graphviz itself, as described in Section 3.4 and its subsections, above.
We will use the Jupyter (or IPython) notebook development environment in our workshop to write and test CollateX collations. Jupyter notebook is bundled with Anaconda Python and does not require any special installation.
Go to the Jupyter notebook tutorial in order to familiarize with this working environment.
We typically use Jupyter notebook for experimentation and the command line for finished Python programs. If you are not familiar with working on the command line, please read the Command line tutorial. If you are already familiar with working on the command line, you can save your Python code in a file (give it the traditional Python filename extension .py) and run it from the directory in which you’ve saved it with:
python nameofscript.py
replacing “nameofscript.py” with the filename of your script.
We’ll use Jupyter Notebook for most of our Python coding in the workshop because it provides a convenient interface for entering your code, running it, and seeing the output all in one place. For more complicated development, though, you may prefer to use an IDE (integrated development environment), which offers command completion, syntax checking, and other features that help you code more quickly, efficiently, and accurately. The IDE that your instructors use and recommend is the free “community” version of PyCharm. You do not need to install PyCharm or any other IDE for the workshop, and it does involve a steeper learning curve than Jupyter, but for complicated development tasks (such as writing your own complex tokenization or normalization routines as a preparation for collation), the investment in learning your way around the IDE will repay itself quickly.