Python Club

Meeting 1

Agenda

Introduction & Expectations
Chapter 1 & 2 Discussion and Questions
- Python 2 versus 3
- Why high-level programming level / interpreted languages exist?
Python in its many forms
Modules & Packages
Python on your local machine
Python on the farm
Getting help
Practical
Next session

Introduction & Expectations

Anything you want to be cover?

Is there anything (module, package, topic) in particular to cover?

Tool development

What tools could we build together?

Chapter 1 & 2 Discussion and Questions

Python 2 versus 3

Python 2 has been around for a very long time and as a consequence has a huge amount of code (particularly scientific) written for it. Python 3 is relatively young (end of 2008) and lacks the substantial code base (though it is certainly growing). To keep things simple, we'll focus on version 2, but also try to highlight any clear differences that arise. We'll also use both versions on the farm in the practical at the end.

To read more about this see https://wiki.python.org/moin/Python2orPython3/.

Why do high-level programming languages exist?

So that (normal) humans can read and write programs.

To see how readability improves as we move from the lowest (Machine Code) language to high-level languages, consider the following implementations that produce the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8,...

Machine Code (108 characters)

8B542408 83FA0077
06B80000 0000C383
FA027706 B8010000
00C353BB 01000000
B9010000 008D0419
83FA0376 078BD98B
C84AEBF1 5BC3

Assembly Language (168 characters)

fib:
    mov edx, [esp+8]
    cmp edx, 0
    ja @f
    mov eax, 0
    ret

    @@:
    cmp edx, 2
    ja @f
    mov eax, 1
    ret

    @@:
    push ebx
    mov ebx, 1
    mov ecx, 1

    @@:
        lea eax, [ebx+ecx]
        cmp edx, 3
        jbe @f
        mov ebx, ecx
        mov ecx, eax
        dec edx
    jmp @b

    @@:
    pop ebx
    ret

C (136 characters)

unsigned int fib(unsigned int n)
{
    if (n <= 0)
        return 0;
    else if (n <= 2)
        return 1;
    else {
        int a,b,c;
        a = 1;
        b = 1;
        while (true) {
            c = a + b;
            if (n <= 3) return c;
            a = b;
            b = c;
            n--;
        }
    }
}

Python (68 characters)



In [9]:

    
def fib(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)



In [11]:

    
fib(0), fib(1), fib(2), fib(3), fib(4), fib(5), fib(6)









    Out[11]:





(0, 1, 1, 2, 3, 5, 8)

Why do interpreted languages exist?

So they can be transfered from one system to another without needing to be recompiled. I could email you the above Python code and you could run it immediately. However, the C code would need to be compiled.

Python in its many forms

You can use python in several ways:

Script mode Type your code in a text editor, and save to a .py file, execute with $ python code.py (more on this later)

Interactive mode You enter a expression and you immediately get a result

$ python The barebones.
$ ipython Extends the barebones with automcompletion, inline graphics, fle saving.
$ ipython notebook What you are staring at right now. You can do inline graphics, text formatting, access the shell all within the browser. This is the best way to share things such as inspection of data, explaining an algorithm step by step, teaching material.
$ ipython qtconsole A similar experince to the notebook but within a terminal-like user interface.

Which one to use? Up to you really but the following are general guidelines:

If you are making something as a standalone piece of code, it best to use script mode.
If you want to experiment and save your work then an IPython notebook is a good way to go.
If you want to quickly enter a few lines of code to test something, plain old python is handy.

Script mode, WTP (What The Python?)

There a variety of ways to execute a python script

Method 1 -- `$ python first.py`

So WTP? Well that depends on how your system is setup. UNIX looks in the PATH environment variable and uses what is specified there. If nothing found in the PATH resort to the default /usr/bin/python. On my system I have installed Anaconda which has made entry in my PATH so that is what is used on my system:



In [18]:

    
!which python









    



/Users/dr9/anaconda/bin/python



In [19]:

    
!cat ~/.bash_profile









    



# added by Anaconda 2.1.0 installer
export PATH="/Users/dr9/homebrew/bin:$PATH"
export PATH="/Users/dr9/anaconda/bin:$PATH"
export PATH="/Users/dr9/Library/samtools/bin:$PATH"
export PATH="/Users/dr9/Library/wgsim:$PATH"
export PATH="/Users/dr9/bin:$PATH"

if [ -f `brew --prefix`/etc/bash_completion ]; then
    . `brew --prefix`/etc/bash_completion
fi

alias ff='/Applications/Firefox.app/Contents/MacOS/firefox-bin --profilemanager'
alias ss='ssh  -L3128:cache1a.internal.sanger.ac.uk:3128 ssh.sanger.ac.uk'
alias ws='ssh web-wwwsand-06'

Method 2 -- `$ /Users/dr9/anaconda/bin/python first.py`

If you want to be explicit about it you can state the path to the binary of the Python interpreter. This is particularly relevant when you want to specify exactly which version of Python you want (eg 2.7.2, 2.7.8 or 3.3.2).

Method 3 -- `$ ./first.py`

We haven't told UNIX from the command line where the Python interpreter is so we need to include it in the shebang. The shebang or hashbang is the first line of a script that begins #! and is used in many scripting languages, not just Python. Some examples of these lines are:

#!/Users/dr9/anaconda/bin/python
#!/usr/bin/env python

Note that you will also need to chmod your Python file so it is executable.

Which of these methoda is best? This is a matter of preference. As long as you have your path setup, you method one should be sufficient. On the farm, you may want to specify precisely what version so Methods 2 and 3 may be preferable.

Modules & Packages

A piece of software to perform a specific task.
The work has already been done for you, just type import <some package>.
A fantastic selling point for lazy people.
These are written by “official” python developers as well as scientists, commercial sector, anybody really.
A package is a collection of Modules. For the time being just think of these as external pieces of code that make your life easy.

Some common bioinformatics packages

Package	Description
pysam	Interface to samtools
pyvcf	VCF reader/parser

Some common general purpose packages

Package	Description
NumPy	Numerical computing, very fast because it uses compiled Fortan code
SciPy	Scientific computing: optimization, linear algebra, integration, ...
matplotlib	Plotting
pandas	Python Data Analysis Library

How to import packages

For example, let’s try out the sys package which is part of the Standard Library (an official Python packages) that tells us system-specific parameters and functions. We will use it to determine what version of python we are using in the current script by using its version parameter.



In [1]:

    
import sys
sys.version









    Out[1]:





'2.7.9 |Anaconda 2.1.0 (x86_64)| (default, Dec 15 2014, 10:37:34) \n[GCC 4.2.1 (Apple Inc. build 5577)]'



In [2]:

    
import sys as zebra_face
zebra_face.version









    Out[2]:





'2.7.9 |Anaconda 2.1.0 (x86_64)| (default, Dec 15 2014, 10:37:34) \n[GCC 4.2.1 (Apple Inc. build 5577)]'

The above methods are nice because in order to use one of the module's functions or parameters you need to preceed it with something. Why this is good is clear in the next two examples.



In [30]:

    
from sys import version
version









    Out[30]:





'2.7.9 |Anaconda 2.1.0 (x86_64)| (default, Dec 15 2014, 10:37:34) \n[GCC 4.2.1 (Apple Inc. build 5577)]'

This is OK but not ideal because what if you accidentally overwrite version with something else?



In [35]:

    
from sys import version
version = "something else"
print version









    



something else



In [31]:

    
from sys import *
version









    Out[31]:





'2.7.9 |Anaconda 2.1.0 (x86_64)| (default, Dec 15 2014, 10:37:34) \n[GCC 4.2.1 (Apple Inc. build 5577)]'

Do this only when doing some quick and dirty scripting/interactive computing. This is bad because it fills your namespace with everything from the module. The namespace is basically what can be called at that moment in time. What if two different modules were loaded with the import * style that have same function? For example, pysam (samtools for Python) and numpy both have the sort function:



In [64]:

    
from pysam import *
from numpy import *
sort([4,2,1,3])









    Out[64]:





array([1, 2, 3, 4])

What is happening here is that all of numpy's functions are overwriting pysam's, since it was called last. But if I swap the import order, then I get a different behaviour:



In [65]:

    
from numpy import *
from pysam import *
sort([4,2,1,3])









    



---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
<ipython-input-65-4743efa2d81b> in <module>()
      1 from numpy import *
      2 from pysam import *
----> 3 sort([4,2,1,3])

/Users/dr9/anaconda/lib/python2.7/site-packages/pysam/__init__.pyc in __call__(self, *args, **kwargs)
     62         '''
     63         retval, stderr, stdout = csamtools._samtools_dispatch(
---> 64             self.dispatch, args)
     65         if retval:
     66             raise SamtoolsError(

/Users/dr9/anaconda/lib/python2.7/site-packages/pysam/csamtools.so in pysam.csamtools._samtools_dispatch (pysam/csamtools.c:2808)()

/Users/dr9/anaconda/lib/python2.7/site-packages/IPython/kernel/zmq/iostream.pyc in fileno(self)
    192 
    193     def fileno(self):
--> 194         raise UnsupportedOperation("IOStream has no fileno.")
    195 
    196     def write(self, string):

UnsupportedOperation: IOStream has no fileno.

To conclude, import module and import module as something are the best, least error prone ways of importing a module.

Python on your local machine

The easiest way to install and use Python on your local machine is to use the Anaconda distribution. It is free and very convenient (it sames you from downloading numerous modules). Be sure to download version 2.7 as this is compatible with more modules than the version 3.4. Download the command line version as this bypasses security restrictions on graphical installs. When it asks you to install to your PATH go ahead and say yes.

http://continuum.io/downloads

Now that anaconda is installed, we can install modules in two ways:

With Anaconda's conda installer:

$ conda install module_name

With Python's package management system called pip:

$ pip install module_name

What is the difference? The major difference is that conda can install binary files, that is the installer doesn't need to compile source code to in order to install it on your machine. This is handy because compiling can take some time and things can sometimes go wrong. On the other hand, pip needs to compile the source code. Because conda provides binary files this requires individuals to maintain these binaries. As a consequence, some modules may not be available on conda though EVERY module will be on pip (you'll just have to build it).

Q: Why do we need to compile anything? I thought Python was interpreted?

Where are the modules installed to?

~/Anaconda/lib/python2.7/site-packages

Python on the farm

There are several versions of Python on the farm:

dr9@farm3-head4:/software$ ls | grep python
python-2.7.3
python-2.7.6
python-2.7.8
python-3.2.3
python-3.3.0
python-3.3.2
python-3.4.0

I would suggest using python-2.7.6 (python-2.7.8 claims to be python-2.7.3 so I think something is up with it, I'll submit a ticket). In order to specify this python version you can pick any of the previous methods used for local machines. For example, method 2:

dr9@farm3-head2:~$ bsub -o <output> /software/python-2.7.6/bin/python <python script>.py

Installing Packages & Modules on the Farm

There is a good Sanger wiki page on this: http://mediawiki.internal.sanger.ac.uk/index.php/Installing_Software#Python. The crux of it is to install with pip with the user option:

$ /software/python-2.7.6/bin/pip install --user modulename

Which will install software in ~/.local according to the type of file.

Type of file	Installation directory
modules	~/.local/lib/python2.7/site-packages
scripts	~/.local/bin

Documentation

There a variety of ways read about python, modules and functions.

help() to start the help browser or help("topic") where topic can be a package, module or function
topic? in IPython
The official Python docs https://docs.python.org/2/
Most packages will also have their own documentation site (eg http://pysam.readthedocs.org/en/latest/)

Practical

Locally

Install pysam locally.
Within a text editor of your choice start writing a script to perform the following
- Import pysam and inspect (a module from the Standard Library so you've already got it installed)
- We want to print out the file location of the pysam module. (hint: https://docs.python.org/2/library/inspect.html#inspect.getfile)
Run this from the command line.
Now that that is working, we are going to add to the same script some lines to print the Python version and location that is interpreting our file.
- Import sys.
- Checkout the sys docs for something that might do this. (hint: https://docs.python.org/2/library/sys.html#sys.executable)
If you've got Anaconda installed on your system you have two versions of the Python. See if you can change the version this script runs with. (hint: /usr/bin and ~/Anaconda/bin are the two default locations)

On the farm

Copy your (working) code on to the farm. Repeat above except now install pysam (/software/python-x.x.x/bin/pip install --user <modname>) for both 2.7.6 and 3.4.0.

Next meeting

What chapters are next?