These first steps are not very methodical. Programming language classes take time, start with basic concepts and gradually improve and expand on them. Universities offer more relaxed programming classes for students, but this is a different type of learning. Students have time for many things, for the rest of us working people there are two or perhaps three steps to learning a new language or a new library, and we learn by doing. The tutorial is a simple exposure that takes you through some key aspects, next comes the programming guide (or any decent book) where concepts are explained in greater detail and at last there is the reference library, where every detail is supposed to be documented in concise format. Good programmers learn to read the tutorial, read some key aspects from the guide that make their library stand out and only check the reference guide when needed. Beginners in programming have a hard time knowing where to begin and are overwhelmed.
No matter how good I would become at teaching basic Python in one or two hours, it fails to compare with reading the default tutorial, which would take much longer time than we have at our disposal. So please understand that there is a trade-off between time and quality. As we learn by doing I would like to invite you to check whenever possible the documentation for Python and for the libraries we are using.
In [3]:
# This is a line comment.
"""
A multi-line
comment.
"""
a = None #Just declared an empty object
print(a)
a = 1
print(a)
a = 'abc'
print(a)
a = [a, 2, b, 1., 1.2e-5, True] #This is a list.
print(a)
In [1]:
## Python is a dynamic language
a = 1
print(type(a))
print(a)
a = "spam"
print(type(a))
print(a)
In [1]:
a = 1
a
b = 'abc'
print(b)
#b
In [12]:
b
Out[12]:
Now let us switch the values of two variables.
In [4]:
print(a, b, c)
t = c
c = b
b = t
print(a, b, c)
In [4]:
a = 2
b = 1
b = a*(5 + b) + 1/0.5
print(b)
d = 1/a
print(d)
In [11]:
a = True
b = 3
print(b == 5)
print(a == False)
print(b < 6 and not a)
print(b < 6 or not a)
print(b < 6 and (not a or not b == 3))
In [14]:
print(False and True)
In [7]:
True == 1
Out[7]:
Functions are a great way to separate code into readable chunks. The exact size and number of functions needed to solve a problem will affect readability.
New concepts: indentation, namespaces, global and local scope, default parameters, passing arguments by value or by reference is meaningless in Python, what are mutable and imutable types?
In [17]:
## Indentation and function declaration, parameters of a function
def operation(a, b):
c = 2*(5 + b) + 1/0.5
a = 1
return a, c
a = None
mu = 2
operation(mu, 1)
a, op = operation(a, 1)
print(a, op)
In [20]:
# Function scope, program workflow
def f(a):
print("inside the scope of f():")
a = 4
print("a =", a)
return a
a = 1
print("f is called")
f(a)
print("outside the scope of f, a=", a)
print("also outside the scope of f, f returns", f(a))
In [9]:
## Defining default parameters for a function
def f2(a, b=1):
return a + b
print(f2(5))
print(f2(5, b=2))
Task:
In [37]:
## Strings of characters are immutable, x did not changed its value
x = 'foo'
y = x
print(x, y) # foo
y += 'bar'
print(x, y) # foo
In [52]:
## A list however is mutable datatype in Python
x = [1, 2, 3]
y = x
print(x, y) # [1, 2, 3]
y += [3, 2, 1]
print(x, y) # [1, 2, 3, 3, 2, 1]
In [10]:
## String mutable? No
def func(val):
val += 'bar'
return val
x = 'foo'
print(x) # foo
print(func(x))
print(x) # foo
In [13]:
## List mutable? Yes.
def func(val):
val += [3, 2, 1]
return val
x = [1, 2, 3]
print(x) # [1, 2, 3]
print(func(x))
print(x) # [1, 2, 3, 3, 2, 1]
In [14]:
## Globals. Never use it, always abuse it!
g = 0
def f1():
# Comment bellow to spot the diference
global g # Needed to modify global copy of g
g = 1
def f2():
print("f2:",g)
print(g)
f1()
print(g)
f2()
Control flow
There are two major types of programming languages, procedural and functional. Python is mostly procedural, with very simple functional elements. Procedural languages typicaly have very strong control flow specifications. Programmers spend time specifying how a program should run. In functional languages the time is spent defining the program while how to run it is left to the computer. Scala is the most used functional language in Bioinformatics.
In [54]:
# for loops
for b in [1, 2, 3]:
print(b)
In [12]:
# while, break and continue
b = 0
while b < 10:
b += 1
a = 2
if b%a == 0:
#break
continue
print(b)
# Now do the same, but using the for loop
In [56]:
## if else: use different logical operators and see if it makes sense
a = 1
if a == 3:
print('3')
elif a == 4:
print('4')
else:
print('something else..')
In [64]:
## error handling - use sparingly!
## python culture: better to apologise than to verify!
def divide(x, y):
"""catches an exception"""
try:
result = x / y
except ZeroDivisionError:
print("division by zero!")
#raise ZeroDivisionError
#pass
else:
print("result is", result)
finally:
print("executing finally code block..")
divide(1,0)
import xls
"How can you simply import Excel !?!"
Packages are the way code libraries are distributed. Libraries contain one or several modules. Each module can contain object classes, functions and submodules.
It happens often that some Python code that you require is not well documented. To understand how to use the code one can interogate any object during runtime. Aditionally the code is always located somewhere on your computer.
In [13]:
import math
print(dir())
print(dir(math))
print(help(math.log))
a = 3
print(type(a))
import numpy
print(numpy.__version__)
import os
print(os.getcwd())
Task:
d(p1, p2)=sqrt((x1-x2)**2+(y1-y2)**2), where pi(xi,yi)
In [32]:
"""
%run full(relative)path/distance.py
or
os.setcwd(path)
"""
import distance
print(distance.euclidian(1, 2, 4.5 , 6))
from distance import euclidian
print(euclidian(1, 2, 4.5 , 6))
import distance as d
print(d.euclidian(1, 2, 4.5 , 6))
In [17]:
import sys
print(sys.path)
sys.path.append('/my/custom/path')
print(sys.path)
In [33]:
#String declarations
statement = "Gene IDs are great. My favorite gene ID is"
name = "At5G001024"
statement = statement + " " + name
print(statement)
statement2 = 'Genes names \n \'are great. My favorite gene name is ' + 'Afldtjahd'
statement3 = """
Gene IDs are great.
My favorite genes are {} and {}.""".format(name, 'ksdyfngusy')
print(statement2)
print(statement3)
print('.\n'.join(statement.split(". ")))
print (statement.split(". "))
In [84]:
#String methods
name = "At5G001024"
print(name.lower())
print(name.index('G00'))
print(name.rstrip('402'))
print(name.strip('Add34'))
In [85]:
#Splits, joins
statement = "Gene IDs are great. My favorite gene ID is At5G001024"
words = statement.split()
print("Splitting a string:", words)
print("Joining into a string:", "\t ".join(words))
import random
random.shuffle(words)
print("Fun:", " ".join(words))
In [86]:
#Strings are lists of characters!
print(statement)
print(statement[0:5] + " blabla " + statement[-10:-5])
In [19]:
#a tupple is an immutable list
a = (1, "spam", 5)
#a.append("eggs")
print(a[1])
b = (1, "one")
c = (a, b, 3)
print(c)
#unpacking a collection into positional arguments
def sum(a, b):
return a + b
values = (5, 2)
s = sum(*values)
print(s)
In [88]:
a = [1,"one",(2,"two")]
print(a[0])
print(a)
a.append(3)
print(a)
b = a + a[:2]
print(b)
In [89]:
## slicing and indexing
print(b[2:5])
del a[-1]
print(a)
print(a.index("one"))
print(len(a))
In [92]:
## not just list size but list elements too are scoping free! (list is mutable)
def f(a, b):
a[1] = "changed"
b = [1,2]
return
a = [(2, 'two'), 3, 1]
b = [2, "two"]
f(a, b)
print(a, b)
In [36]:
## matrix
matrix = [
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]]
print(matrix)
print(matrix[0][1])
print(list(range(2,10,3)))
for x in range(len(matrix)):
for y in range(len(matrix[x])):
print(x,y, matrix[x][y])
In [94]:
## ranges
r = range(0, 5)
for i in r: print("step", i)
In [3]:
## list comprehensions
def f(i):
return 2*i
a = [2*i for i in range(10)]
a = [f(i) for i in range(10)]
print(a)
b = [str(e) for e in a[4:] if e%3==0]
print(b)
In [2]:
## sorting a list of tupples
a = [(str(i), str(j)) for i in a for j in range(3)]
print(a)
a.sort(key=lambda tup: tup[1])
a.sort(key=lambda tup: len(tup[1]), reverse = True)
print(a)
In [40]:
#zipping and enumerating
y = zip('abc', 'def')
print(list(y)) # y is a generator
print(list(y)) # second cast to list, content is empty!
print(list(zip(['one', 'two', 'three'], [1, 2, 3])))
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
#print(type(zipped))
print(zipped)
x2, y2 = zip(*zipped)
print (x == list(x2) and y == list(y2))
print (x2, y2)
alist = ['a1', 'a2', 'a3']
for i, e in enumerate(alist): print (i, e) #this is called a one liner
for i in range(len(alist)):
print(i, alist[i])
print(list(range(len(alist))))
In [108]:
# mapping
a = [1, 2, 3, 4, 5]
b = [2, 2, 9, 0, 9]
print(list(map(lambda x: max(x), zip(a, b))))
print(list(zip(a, b)))
In [110]:
# deep and shallow copies on mutable objects or collections of mutable objects
lst1 = ['a','b',['ab','ba']]
lst2 = lst1 #this is a shallow copy of the entire list
lst2[0]='e'
print(lst1)
lst1 = ['a','b',['ab','ba']]
lst2 = lst1[:] #this is a shallow copy of each element
lst2[0] = 'e'
lst2[2][1] = 'd'
print(lst1)
from copy import deepcopy
lst1 = ['a','b',['ab','ba']]
lst2 = deepcopy(lst1) #this is a deep copy
lst2[2][1] = "d"
lst2[0] = "c";
print(lst2)
print(lst1)
Sets have no order and cannot include identical elements. Use them when the position of elements is not relevant. Finding elements is faster than in a list. Also set operations are more straightforward. A frozen set has a hash value.
In [111]:
# set vs. frozenset
s = set()
#s = frozenset()
s.add(1)
s = s | set([2,"three"])
s |= set([2,"three"])
s.add(2)
s.remove(1)
print(s)
print("three" in s)
In [113]:
s1 = set(range(10))
s2 = set(range(5,15))
s3 = s1 & s2
print(s1, s2, s3)
s3 = s1 - s2
print(s1, s2, s3)
print(s3 <= s1)
s3 = s1 ^ s2
print(s1, s2, s3)
In [114]:
d = {'geneid9': 100, 'geneid8': 90, 'geneid7': 80, 'geneid6': 70, 'geneid5': 60, 'geneid4': 50}
d
Out[114]:
In [116]:
d = {}
d['geneid10'] = 110
d
Out[116]:
In [117]:
#Creation: dict(list)
genes = ['geneid1', 'geneid2', 'geneid3']
values = [20, 30, 40]
d = dict(zip(genes, values))
print(d)
In [16]:
#Creation: dictionary comprehensions
d2 = { 'geneid'+str(i):10*(i+1) for i in range(4, 10) }
print(d2)
#Keys and values
print(d2.keys())
print(d2.values())
for k in d2.keys(): print(k, d2[k])
In [122]:
d = {'geneid9': 100, 'geneid8': 90, 'geneid7': 90, 'geneid6': 70, 'geneid5': 60, 'geneid4': 50}
def getkey(value):
ks = set()
# .. your code here
return ks
print(getkey(90))
Everything is an object in Python and every variable is a reference to an object. References map the adress in memory where an object lies. However this is kept hidden in Python. C was famous for not cleaning up automatically the adress space after alocating memory for its data structures. This was causing memory leaks that makes some programs gain more and more RAM space. Modern languages cleanup dynamically after the scope of a variable ended, something called "garbage collecting". However this is afecting their speed of computation.
New concepts:
In [4]:
class Dog(object):
def __init__(self, name):
self.name = name
return
def bark_if_called(self, call):
if call[:-1]==self.name:
print("Woof Woof!")
else:
print("*sniffs..")
return
def get_ball(self):
print(self.name + " brings back ball")
d = Dog("Buffy")
print(d.name, "was created from Ether!") #name is an attribute
d.bark_if_called("Bambi!") #bark_if_called is a method
#dog.bark_if_called("Buffy!")
In [135]:
class PitBull(Dog):
def get_ball(self):
super(PitBull, self).get_ball()
print("*hates you")
return
def chew_boots(self):
print("*drools")
return
d2 = PitBull("Georgie")
d2.bark_if_called("Loopie!")
d2.bark_if_called("Georgie!")
d2.chew_boots()
#d.chew_boots()
d2.get_ball()
print(d2.name)
In [17]:
from time import sleep
def sleep_decorator(function):
"""
Limits how fast the function is
called.
"""
def wrapper(*args, **kwargs):
sleep(2)
return function(*args, **kwargs)
return wrapper
@sleep_decorator
def print_number(num):
return num
print(print_number(222))
for num in range(1, 6):
print(print_number(num))
https://docs.python.org/3/library/
Reccomendation: Take time to explore the Python module of the week. It is a very good way to learn why Python comes "with batteries included".
In [137]:
import sys
print(sys.argv)
sys.exit()
##getopt, sys.exit()
##getopt.getopt(args, options[, long_options])
# import getopt
# try:
# opts, args = getopt.getopt(sys.argv[1:],"hi:o:",["ifile=","ofile="])
# except getopt.GetoptError:
# print 'test.py -i <inputfile> -o <outputfile>'
# sys.exit(2)
# for opt, arg in opts:
# if opt == '-h':
# print 'test.py -i <inputfile> -o <outputfile>'
# sys.exit()
# elif opt in ("-i", "--ifile"):
# inputfile = arg
# elif opt in ("-o", "--ofile"):
# outputfile = arg
# print inputfile, outputfile
In [139]:
import os
print(os.getcwd())
#os.chdir(newpath)
os.system('mkdir testdir')
f = open('testfile.txt','wt')
f.write('One line of text\n')
f.write('Another line of text\n')
f.close()
import shutil
#shutil.copy('testfile.txt', 'testdir/')
shutil.copyfile('testfile.txt', 'testdir/testfile1.txt')
shutil.copyfile('testfile.txt', 'testdir/testfile2.txt')
with open('testdir/testfile1.txt','rt') as f:
for l in f: print(l)
for fn in os.listdir("testdir/"):
print(fn)
#fpath = os.path.join(dirpath,filename)
os.rename('testdir/'+fn, 'testdir/file'+fn[-5]+'.txt')
import glob
print (glob.glob('testdir/*'))
os.remove('testdir/file2.txt')
#os.rmdir('testdir')
#shutil.rmtree(path)
In [43]:
from datetime import datetime
startTime = datetime.now()
n = 10**8
for i in range(n):
continue
print datetime.now() - startTime
In [46]:
import os
#print os.system('/path/yourshellscript.sh args')
subprocess.run(["ls", "-l", "/dev/null"], stdout=subprocess.PIPE)
subprocess.run("exit 1", shell=True, check=True)
from subprocess import call
call(["ls", "-l"])
In [ ]:
args = ['/path/yourshellscript.sh', '-arg1', 'value1', '-arg2', 'value2']
p = Popen(args, shell=True, bufsize=bufsize,
stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
p.wait()
(child_stdin, child_stdout, child_stderr) = (p.stdin, p.stdout, p.stderr)
In [ ]:
# def child():
# print 'A new child ', os.getpid( )
# os._exit(0)
# def parent():
# while True:
# newpid = os.fork()
# if newpid == 0:
# child()
# else:
# pids = (os.getpid(), newpid)
# print "parent: %d, child: %d" % pids
# if raw_input( ) == 'q': break
# parent()
How to do the equivalent of shell piping in Python? This is the basic step of an automated pipeline.
cat test.txt | grep something
Task:
p1.stdout.close(). Why is it not working?
In [ ]:
p1 = Popen(["cat", "test.txt"], stdout=PIPE)
p2 = Popen(["grep", "something"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()
output = p2.communicate()[0]
Questions:
Task. Explain why this happens:
In [17]:
def run(l=[]):
l.append(len(l))
return l
print(run())
print(run())
print(run())
Task.
dic = {'a':[1,2,3], 'b':[4,5,6,7]}
Using list comprehension, return:
[1, 2, 3, 4, 5, 6, 7] ['a', 'a', 'a', 'b', 'b', 'b', 'b']