Python Club

Meeting 4


Recap of python on the farm

There are several versions of python on the farm and several ways of installing python packages.

/usr/bin/python

The default location for python on Linux is /usr/bin/python. If you haven't put anything in your .bash_profile or .bashrc (setup files called whenever you login) this will be the python that is called by default. Currently this is version 2.7.3.

/software/

This directory contains several python versions:

dr9@farm3-head1:~$ ls /software | grep python
python-2.7.3
python-2.7.6
python-2.7.8
python-2.7.9
python-3.2.3
python-3.3.0
python-3.3.2
python-3.4.0

Note that each python version has different packages installed.

/software/hgi/

HGI maintain a collection of software modules that can be loaded. Currently, HGI maintain the following versions of python:

hgi/python/2.7.8
hgi/python/2.7.8-ucs2
hgi/python/2.7.8-ucs4(latest)

(note: ucs refers to the way unicode data is stored).

For a complete description see the HGI's wiki page http://mediawiki.internal.sanger.ac.uk/index.php/HGIProjectsoftware

Using the desired python interpreter.

In our first meeting I mentioned how /software/Python-2.7.8/bin/python was reporting to be Python-2.7.3. After discussing with Emyr, this odd behaviour occurs because though I have explicitly called python-2.7.8's binary, it's library has not been loaded. To do this, we can prepend the LD_LIBRARY_PATH environmental variable:

export PATH=/software/python-2.7.8/bin:$PATH
export LD_LIBRARY_PATH=/software/python-2.7.8/lib:$LD_LIBRARY_PATH

Now when I call python I get the correct version:

$ python
Python 2.7.8 (default, Nov 20 2014, 12:13:47)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

If you are going to stick to one version of python from /software/ on the farm, I recommend putting the PATH and LD_LIBRARY_PATH exports in your .bashrc file. Then you don't have to worry about it. Another option is to place these lines at the top of your job script or a wrapper bash script.

How to get additional packages

If you are using the HGI modules

  • ask HGI

If you are using /software/

  1. Ask ISG to install it
  2. If you don't want to wait, use pip install --user <package name>
  3. If you don't want to wait and pip install --user gave you an error (version conflict) or you want to make your software setup more repeatable, virtualenv.

pip

As mentioned in the first meeting, you can install packages yourself with pip as long as you use the --user option:

$ pip install --user <package name>

These packages will belong to only you and will go into ~/.local/.

virtualenv

Virtualenv is a tool that uses "virtual environments" to hold your python packages. Why do this:

  • As software is being updated all the time on the farm it is possible to be happily using version 1 of some python library one day then the next using version 1.1 without even knowing it. It is unlikely, though not impossible, to affect your work.
  • There may be conflicts with different package versions. For example, pandas requires numpy 1.7.0+ and the python at /usr/bin/ has numpy 1.6.1 so I would have to ask ISG to update their version before I can install pandas (with pip install --user pandas). To get around this I can create a virtual environment then install a new version of numpy to finally get pandas.
  • This is very handy in terms of reproducible science and pipelines as you can "freeze" your current package setup:
$ pip freeze > myproject_python_packages.txt
$ cat myproject_python_packages.txt
numpy==1.9.1
pysam==0.8.1
wsgiref==0.1.2

...sometime later when you or some tries to reproduce your results, the exact same packages can be loaded into your environment:

pip install -r myproject_python_packages.txt

Virtualenv allows you to specifically pick what packages and which version of the packages to have loaded into your "environment".

To setup virtualenv:
  1. If you want to use a version of python in /software/ enter the following lines at the terminal or make sure they have been placed in your .bashrc

    export PATH=/software/python-2.7.8/bin:$PATH
    export LD_LIBRARY_PATH=/software/python-2.7.8/lib:$LD_LIBRARY_PATH

    Test this by running python and making sure you have got the right version:

    dr9@farm3-head2:~$ python
    Python 2.7.8 (default, Nov 20 2014, 12:13:47) 
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>
  2. Now make a directory to hold your virtualenvironments. I would recommend using the version number as a subdirectory to help with organization.

    mkdir -p ~/virtualenvs/2.7.8
  3. To create an environment called myenv:

    virtualenv virtualenvs/2.7.8/myenv
  4. To start it:

    source virtualenvs/2.7.8/myenv/bin/activate

    You will notice your virtual environment name appears in brackets in your prompt and typing which python will point to your virtual environment directory:

    (myenv)dr9@farm3-head2:~$ which python
    /nfs/users/nfs_d/dr9/virtualenvs/2.7.8/myenv/bin/python
  5. Once activated, you can install what you like with (without the --user option):

    pip install <package-name>
  6. To view what is installed:

    pip freeze
  7. To save what is installed into a package list:

    pip freeze > myproject_python_packages.txt
  8. To install from a package list:

    pip install -r myproject_python_packages.txt
  9. To exit the virtual environment:

    deactivate

Notes:

  • When you submit a job you will want to activate your enverinoment as in step 4 in a bash script.
  • You can create as many virtual environments as you like and they won't effect each other.

Summary

If you are using python in /software/ you must place the following lines (using 2.7.8 as an example) either in a bash script when you submit your jobs or in your ~/.bashrc if you intend to primarily use this version on the farm:

export PATH=/software/python-2.7.8/bin:$PATH
export LD_LIBRARY_PATH=/software/python-2.7.8/lib:$LD_LIBRARY_PATH

When you want packages, depending on your patience/preferences either:

  1. Ask ISG/HGI to install it
  2. Use pip install --user
  3. Create a virtualenv and install what you like

Exercises

Exercise 5.a

The following uses the logical operator not with the relational operator '>'. See if you can write this using only relational operators (!=, >, <, >=, <=).


In [1]:
x = 5
if not x > 5:
    print 'x is not bigger than 5'


x is not bigger than 5

Exercise 5.b

What is the point of elif? Why not just use two if statements in a row? Write an example that demonstrates the differece between:

if <condition>
  <body>
if <condition>
  <body>

and

if <condition>
  <body>
elif <condition>
  <body>

Exercise 5.c

We have an annotation that starts at position 1000 and ends at position 2000. Write a function that returns true if a gene occupies any part of the annotation. For example:

>>> gene_in_annotation(snp=432)
False
>>> gene_in_annotation(snp=1023)
True
>>> gene_in_annotation(snp=4502)
True

Exercise 5.3

Fermat’s Last Theorem says that there are no positive integers a, b, and c such that

a**n + b**n = c**n

for any values of n greater than 2.

  1. Write a function named check_fermat that takes four parameters—a, b, c and n—and that checks to see if Fermat’s theorem holds. If n is greater than 2 and it turns out to be true that an + bn = cn the program should print, “Holy smokes, Fermat was wrong!” Otherwise the program should print, “No, that doesn’t work.”

  2. Write a function that prompts the user to input values for a, b, c and n, converts them to integers, and uses check_fermat to check whether they violate Fermat’s theorem.

Exercise 5.4

If you are given three sticks, you may or may not be able to arrange them in a triangle. For example, if one of the sticks is 12 inches long and the other two are one inch long, it is clear that you will not be able to get the short sticks to meet in the middle. For any three lengths, there is a simple test to see if it is possible to form a triangle: If any of the three lengths is greater than the sum of the other two, then you cannot form a triangle. Otherwise, you can. (If the sum of two lengths equals the third, they form what is called a “degenerate” triangle.)

  1. Write a function named is_triangle that takes three integers as arguments, and that prints either “Yes” or “No,” depending on whether you can or cannot form a triangle from sticks with the given lengths.
  2. Write a function that prompts the user to input three stick lengths, converts them to integers, and uses is_triangle to check whether sticks with the given lengths can form a triangle.

Exercise 5.c

Bonus. Sometimes we want to split a file location into its file name and directory parts. Look into the os module documentation to figure out how to split the following up:

loc = '/a/path/name/to/something/filename.ext'
==>
('/a/path/name/to/something', 'filename.ext')