Python 2 has been around for a very long time and as a consequence has a huge amount of code (particularly scientific) written for it. Python 3 is relatively young (end of 2008) and lacks the substantial code base (though it is certainly growing). To keep things simple, we'll focus on version 2, but also try to highlight any clear differences that arise. We'll also use both versions on the farm in the practical at the end.
To read more about this see https://wiki.python.org/moin/Python2orPython3/.
So that (normal) humans can read and write programs.
To see how readability improves as we move from the lowest (Machine Code) language to high-level languages, consider the following implementations that produce the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8,...
8B542408 83FA0077
06B80000 0000C383
FA027706 B8010000
00C353BB 01000000
B9010000 008D0419
83FA0376 078BD98B
C84AEBF1 5BC3
fib:
mov edx, [esp+8]
cmp edx, 0
ja @f
mov eax, 0
ret
@@:
cmp edx, 2
ja @f
mov eax, 1
ret
@@:
push ebx
mov ebx, 1
mov ecx, 1
@@:
lea eax, [ebx+ecx]
cmp edx, 3
jbe @f
mov ebx, ecx
mov ecx, eax
dec edx
jmp @b
@@:
pop ebx
ret
unsigned int fib(unsigned int n)
{
if (n <= 0)
return 0;
else if (n <= 2)
return 1;
else {
int a,b,c;
a = 1;
b = 1;
while (true) {
c = a + b;
if (n <= 3) return c;
a = b;
b = c;
n--;
}
}
}
In [9]:
def fib(n):
if n == 0:
return 0
elif n == 1:
return 1
else:
return fib(n - 1) + fib(n - 2)
In [11]:
fib(0), fib(1), fib(2), fib(3), fib(4), fib(5), fib(6)
Out[11]:
You can use python in several ways:
Script mode Type your code in a text editor, and save to a .py file, execute with $ python code.py
(more on this later)
Interactive mode You enter a expression and you immediately get a result
$ python
The barebones.$ ipython
Extends the barebones with automcompletion, inline graphics, fle saving.$ ipython notebook
What you are staring at right now. You can do inline graphics, text formatting, access the shell all within the browser. This is the best way to share things such as inspection of data, explaining an algorithm step by step, teaching material.$ ipython qtconsole
A similar experince to the notebook but within a terminal-like user interface.There a variety of ways to execute a python script
$ python first.py
So WTP? Well that depends on how your system is setup. UNIX looks in the PATH environment variable and uses what is specified there. If nothing found in the PATH
resort to the default /usr/bin/python
. On my system I have installed Anaconda which has made entry in my PATH
so that is what is used on my system:
In [18]:
!which python
In [19]:
!cat ~/.bash_profile
$ ./first.py
We haven't told UNIX from the command line where the Python interpreter is so we need to include it in the shebang. The shebang or hashbang is the first line of a script that begins #!
and is used in many scripting languages, not just Python. Some examples of these lines are:
#!/Users/dr9/anaconda/bin/python
#!/usr/bin/env python
Note that you will also need to chmod your Python file so it is executable.
Which of these methoda is best? This is a matter of preference. As long as you have your path setup, you method one should be sufficient. On the farm, you may want to specify precisely what version so Methods 2 and 3 may be preferable.
import <some package>
.Some common bioinformatics packages
Package | Description |
---|---|
pysam | Interface to samtools |
pyvcf | VCF reader/parser |
Some common general purpose packages
Package | Description |
---|---|
NumPy | Numerical computing, very fast because it uses compiled Fortan code |
SciPy | Scientific computing: optimization, linear algebra, integration, ... |
matplotlib | Plotting |
pandas | Python Data Analysis Library |
For example, let’s try out the sys package which is part of the Standard Library (an official Python packages) that tells us system-specific parameters and functions. We will use it to determine what version of python we are using in the current script by using its version parameter.
In [1]:
import sys
sys.version
Out[1]:
In [2]:
import sys as zebra_face
zebra_face.version
Out[2]:
The above methods are nice because in order to use one of the module's functions or parameters you need to preceed it with something. Why this is good is clear in the next two examples.
In [30]:
from sys import version
version
Out[30]:
This is OK but not ideal because what if you accidentally overwrite version with something else?
In [35]:
from sys import version
version = "something else"
print version
In [31]:
from sys import *
version
Out[31]:
Do this only when doing some quick and dirty scripting/interactive computing. This is bad because it fills your namespace with everything from the module. The namespace is basically what can be called at that moment in time. What if two different modules were loaded with the import *
style that have same function? For example, pysam (samtools for Python) and numpy both have the sort function:
In [64]:
from pysam import *
from numpy import *
sort([4,2,1,3])
Out[64]:
What is happening here is that all of numpy's functions are overwriting pysam's, since it was called last. But if I swap the import order, then I get a different behaviour:
In [65]:
from numpy import *
from pysam import *
sort([4,2,1,3])
To conclude, import module
and import module as something
are the best, least error prone ways of importing a module.
The easiest way to install and use Python on your local machine is to use the Anaconda distribution. It is free and very convenient (it sames you from downloading numerous modules). Be sure to download version 2.7 as this is compatible with more modules than the version 3.4. Download the command line version as this bypasses security restrictions on graphical installs. When it asks you to install to your PATH
go ahead and say yes.
Now that anaconda is installed, we can install modules in two ways:
$ conda install module_name
$ pip install module_name
What is the difference? The major difference is that conda can install binary files, that is the installer doesn't need to compile source code to in order to install it on your machine. This is handy because compiling can take some time and things can sometimes go wrong. On the other hand, pip needs to compile the source code. Because conda provides binary files this requires individuals to maintain these binaries. As a consequence, some modules may not be available on conda though EVERY module will be on pip (you'll just have to build it).
Q: Why do we need to compile anything? I thought Python was interpreted?
~/Anaconda/lib/python2.7/site-packages
There are several versions of Python on the farm:
dr9@farm3-head4:/software$ ls | grep python
python-2.7.3
python-2.7.6
python-2.7.8
python-3.2.3
python-3.3.0
python-3.3.2
python-3.4.0
I would suggest using python-2.7.6
(python-2.7.8
claims to be python-2.7.3
so I think something is up with it, I'll submit a ticket). In order to specify this python version you can pick any of the previous methods used for local machines. For example, method 2:
dr9@farm3-head2:~$ bsub -o <output> /software/python-2.7.6/bin/python <python script>.py
There is a good Sanger wiki page on this: http://mediawiki.internal.sanger.ac.uk/index.php/Installing_Software#Python. The crux of it is to install with pip with the user option:
$ /software/python-2.7.6/bin/pip install --user modulename
Which will install software in ~/.local
according to the type of file.
Type of file | Installation directory |
---|---|
modules | ~/.local/lib/python2.7/site-packages |
scripts | ~/.local/bin |
There a variety of ways read about python, modules and functions.
help()
to start the help browser or help("topic")
where topic can be a package, module or functiontopic?
in IPythonpysam
locally.pysam
and inspect
(a module from the Standard Library so you've already got it installed)sys
./software/python-x.x.x/bin/pip install --user <modname>
) for both 2.7.6 and 3.4.0.What chapters are next?