Materials by: John Blischak, Anthony Scopatz, and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy and many more)
Computers are very useful for doing the same operation over and over. When you know you will be performing the same operation many times, it is best to encapsulate this similar code into a function or a method. Programming functions are related to mathematical functions, e.g. $f(x)$, and it is helpful to think of them as abstract operators that produce output given some input. Let's look at some examples to solidify this concept.
The base distribution comes with many useful functions. When a function is a property of, or owned by, on a specific type (ints, floats, lists, strings, dictionaries, etc.) it is called a method. First let's look at some basic string methods since they are very useful for reading data into Python.
In [ ]:
# Find the start codon of a gene
dna = 'CTGTTGACATGCATTCACGCTACGCTAGCT'
dna.find('ATG')
In [ ]:
# parsing a line from a comma-delimited file
lotto_numbers = '4,8,15,16,23,42\n'
lotto_numbers = lotto_numbers.strip()
print lotto_numbers.split(',')
In [ ]:
question = '%H%ow%z%d%@d%z%th%ez$%@p%ste%rzb%ur%nz%$%@szt%on%gue%?%'
question = question.replace('%', '')
question = question.replace('@', 'i')
question = question.replace('$', 'h')
question = question.replace('z', ' ')
print question
In [ ]:
answer = '=H=&!dr=a=nk!c=~ff&&!be=f~r&=!i=t!w=as!c=~~l.='
print answer.replace('=', '').replace('&', 'e').replace('~', 'o').replace('!', ' ')
Because the binding strength of guanine (G) to cytosine (C) is different from the binding strength of adenine (A) to thymine (T) (and many other differences), it is often useful to know the fraction of a DNA sequence that is G's or C's. Go to the string method section of the Python documentation and find the string method that will allow you to calculate this fraction.
In [ ]:
# Calculate the fraction of G's and C's in this DNA sequence
seq1 = 'ACGTACGTAGCTAGTAGCTACGTAGCTACGTA'
gc =
Check your work:
In [ ]:
round(gc, ndigits = 2) == .47
In [ ]:
def do_nothing():
s = "I don't do much"
However, this often isn't very useful since we haven't returned any values from this function. Note: that if you don't return anything from a function in Python, you implicitly have returned the special None
singleton. To return values that you computed locally in the function body, use the return keyword.
def <function name>():
<function body>
return <local variable 1>
Functions, may be defined to take parameters or arguments.
def <function name>(<argument>):
<function body>
return <local variable 1>
The function name, arguments, and return are jointly known as the function signature since the uniquely define the function's interface.
In [ ]:
def square(x):
sqr = x * x
return sqr
Using a function is done by placing parentheses ()
after the function name after you have defined it. This is known as calling the function. If the function requires arguments, the values for these arguments are inside of the parentheses.
In [ ]:
square(2)
Like mathematical functions, you can compose a function with other functions or with itself!
In [ ]:
square(square(2))
Functions may be defined such that they have multiple arguments or multiple return values:
def <function name>(<arg1>, <arg2>, ...):
<function body>
return <var1> , <var2>, ...
In [ ]:
def hello(time, name):
print 'Good ' + time + ', ' + name + '!'
In [ ]:
hello('afternoon', 'Software Carpentry')
In [ ]:
# return both the quotient and remainder
def quorem(a, b):
quo = a / b
rem = a % b
return quo, rem
In [ ]:
quorem(42, 16)
Note that when you return multiple values you may unpack these into individual variables or into a single variable which is a tuple of both values:
In [ ]:
q, r = quorem(42, 16)
print q
print r
both = quorem(42, 16)
print both
In Python, functions also support default values for arguments. Arguments with an associated default are called keyword arguments. If this function is then called without one of these arguments being present the default value is used. All keyword arguments must come after normal arguments in the function definition:
def <function name>(<arg1>, <arg2>, <arg3>=<arg3 default>, <arg4>=<arg4 default>, ...):
<function body>
return <rtn>
In [ ]:
def add_space(s, t="Mom"):
return s + " " + t
In [ ]:
print add_space("Hello")
print add_space("Morning", "Dad")
You can also call any functions with their arguments, regular and keyword, with their argument names explicitly in the call. This uses equal signs in the same way that keyword arguments are defined.
In [ ]:
print add_space(s="Hello")
print add_space(s="Morning", t="Dad")
If you have many keyword arguments, then they may be out of order in the function call as long as they are explicit.
In [ ]:
def f(x=1, y=2, z=3):
return 2*x**3 + 42*y - z
In [ ]:
f(y=17, z=15, x=2)
Warning: be careful with mutable containers as default values. The container will remember its value from previous function calls.
In [ ]:
def add_to_list(val, seq=[]):
seq.append(val)
return seq
In [ ]:
add_to_list(42)
In [ ]:
add_to_list(16)
In [ ]:
add_to_list(65)
In [ ]:
def myadd(x, y):
return x + y
In [ ]:
addme = (42, 65)
myadd(addme[0], addme[1])
However, this becomes very tedious very quickly. To solve this, Python allows you to call a function with the original list prepended by an asterisk *
. This expands, or unpacks, the values of the list into the function call.
In [ ]:
myadd(*addme)
In [ ]:
myadd(*[14, 18])
Not just lists, but any sequence is allowed to be expanded in this way. While this works for normal arguments, keyword arguments are more similar to dictionaries. Therefore any mapping is also able to be expanded into keyword arguments. To distinguish keyword argument unpacking from unpacking normal arguments, you must use a double asterisk **
to unpack a dictionary.
In [ ]:
def f(x=1, y=2, z=3):
return 2*x**3 + 42*y - z
In [ ]:
v = {'x': 1, 'y': 2, 'z': 3}
f(**v)
This allows you to build up dictionaries of values with which to call function before calling it.
In addition to calling functions with lists or tuples and dicts, you may also define functions such that any extra arguments are put into a tuple and any extra keyword arguments are placed into a dictionary. These special catch-all containers must come after all other arguments and all other keyword arguments. This is known as argument packing and uses the single and double asterisk as above, but this time in the function definition:
def <function name>(<arg1>, <arg2>=<arg2 default>, *<other args>, **<other keyword args>):
<function body>
return <rtn>
In [ ]:
def what(x, y, *args):
print "x = ", x
print "y = ", y
print "extra argument = ", args
In [ ]:
what(1, 10, 178.0, "hello", [42, 42])
In [ ]:
def classroom(teacher="Anthony", **kwargs):
print "Teacher's name is ", teacher
print "Teacher has a lot to deal with:", kwargs
In [ ]:
classroom(lesson="functions", lesson="Python")
In [ ]:
def sweet_nothings(x, y, *args, **kwargs):
print "Not doing anything with x, y, z, f"
print args
print kwargs
In [ ]:
sweet_nothings(0, 0.0, 42, "Nothing", zero="zero", z="", f=True, empty="de nada")
So the most general function signature that you can define is:
In [ ]:
def func(*arg, **kwargs):
print args
print kwargs
In [ ]:
func() # try me out!
In [ ]:
def count_backwards(x):
print x
count_backwards(x-1)
In [ ]:
count_backwards(10)
Well, that was too much. Thus it is important to ensure that recursive functions have some case which does not call itself. This will terminate the recursion.
In [ ]:
def count_backwards(x):
print x
if 0 < x:
count_backwards(x-1)
In [ ]:
count_backwards(10)
One of the most famous recursive sequences is the Fibonacci sequence. This can be defined as a single recursive function.
In [ ]:
def fib(n):
if n == 0 or n == 1:
return n
else:
return fib(n - 1) + fib(n - 2)
In [ ]:
fib(10)
In Python, functions are first class objects. This means that anything any other variable can do, a function can also do. This is because they are normal variables in the language.
In [ ]:
def myfunc(x):
return x
In [ ]:
print myfunc
In [ ]:
yourfunc = myfunc
yourfunc("hello")
In [ ]:
dir(myfunc)
This allows functions be passed to other functions as arguments:
In [ ]:
def just_one():
return 1
def add_one(f):
return f() + 1
add_one(just_one)
Functions may also then define other functions and return them:
In [ ]:
def outer():
def inner(*args, **kwargs):
return True
return inner
In [ ]:
outer()
When you define a function which takes a function and (usually) returns a function, this is called a decorator. Decorators are used to wrap other functions.
In [ ]:
def loud(f):
def newfunc(*args, **kwargs):
print "calling with:", args, kwargs
rtn = f(*args, **kwargs)
print "got", rtn
return rtn
return newfunc
In [ ]:
def myadd(x, y):
return x + y
loudadd = loud(myadd)
v = loudadd(2, 4)
Most of the time such functions may be used such that the original function name is preserved.
In [ ]:
myadd = loud(myadd)
myadd(42, 65)
Python also has a shortcut for using decorators which preserve the same name. The 'at' symbol @
followed by the decorator name is placed on the line above the function definition.
In [ ]:
@loud
def mysub(x, y):
return x - y
In [ ]:
v = mysub(42, 65)
For more advanced users, the standard library functools
module has some really powerful and great utilities. This includes the functools.wraps() function.
Lambdas are small, single expression functions that are anonymous (they have no name). They come from functional programming languages and the Lambda Calculus. Since they are so small they may be written on a single line.
lambda <args>: <expr>
In [ ]:
lambda x: x + 1
Note that just because they are implicitly anonymous, doesn't mean that you can't name them.
In [ ]:
f = lambda x, y: x + y +1
This is much more useful than it might seem at first glance.
In [ ]:
# Calculates the GC content of DNA sequence x.
# x: a string composed only of A's, T's, G's, and C's.
def calculate_gc(x):
Check your work:
In [ ]:
print round(calculate_gc('ATGC'), ndigits = 2) == 0.50
print round(calculate_gc('AGCGTCGTCAGTCGT'), ndigits = 2) == 0.60
print round(calculate_gc('ATaGtTCaAGcTCgATtGaATaGgTAaCt'), ndigits = 2) == 0.34
Python has a lot of useful data type and functions built into the language, some of which you have already seen. For a full list, you can type dir(__builtins__)
. However, there are even more functions stored in modules. An example is the sine function, which is stored in the math module. In order to access mathematical functions, like sin, we need to import the math module. Lets take a look at a simple example:
In [ ]:
print sin(3) # Error! Python doesn't know what sin is...yet
In [ ]:
import math # Import the math module
math.sin(3)
In [ ]:
dir(math) # See a list of everything in the math module
In [ ]:
help(math) # Get help information for the math module
It is not very difficult to use modules - you just have to know the module name and import it. There are a few variations on the import statement that can be used to make your life easier. Lets take a look at an example:
In [ ]:
from math import * # import everything from math into the global namespace (A BAD IDEA IN GENERAL)
print sin(3) # notice that we don't need to type math.sin anymore
print tan(3) # the tangent function was also in math, so we can use that too
In [ ]:
reset # Clear everything from IPython
In [ ]:
from math import sin # Import just sin from the math module. This is a good idea.
print sin(3) # We can use sin because we just imported it
print tan(3) # Error: We only imported sin - not tan
In [ ]:
reset # Clear everything
In [ ]:
import math as m # Same as import math, except we are renaming the module m
print m.sin(3) # This is really handy if you have module names that are long
Python comes with a huge number of modules available as part of the standard library (batteries included). It has a gargantuan number of third party modules as well. This ecosystem is what makes scientific software development in Python great!
Now that you can write your own functions, you too will experience the dilemma of deciding whether to spend the extra time to make your code more general, and therefore more easily reused in the future.
For this exercise we will return to the cochlear implant data first introduced in the section on the shell. In order to analyze the data, we need to import the data into Python. Furthermore, since this is something that would have to be done many times, we will write a function to do this. As before, beginners should aim to complete Part 1 and more advanced participants should try to complete Part 2 and Part 3 as well.
In [ ]:
def view_cochlear(filename):
"""Write your docstring here.
"""
Test it out:
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/alexander/data_216.DATA')
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/Lawrence/Data0525')
In [ ]:
def view_cochlear(filename):
"""Write your docstring here.
"""
Test it out:
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/alexander/data_216.DATA')
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/Lawrence/Data0525')
In [ ]:
def save_cochlear(filename):
"""Write your docstring here.
"""
Check your work:
In [ ]:
data_216 = save_cochlear("/home/swc/boot-camps/shell/data/alexander/data_216.DATA")
print data_216["Subject"]
In [ ]:
Data0525 = save_cochlear("/home/swc/boot-camps/shell/data/alexander/data_216.DATA")
print Data0525["CI type"]
During transcription, an enzyme called RNA Polymerase reads the DNA sequence and creates a complementary RNA sequence. Furthermore, RNA has the nucleotide uracil (U) instead of thymine (T).
Write a function that mimics transcription. The input argument is a string that contains the letters A, T, G, and C. Create a new string following these rules:
Convert A to U
Convert T to A
Convert G to C
Convert C to G
Hint: You can iterate through a string using a for loop similarly to how you loop through a list.
In [ ]:
def transcribe(seq):
"""Write your docstring here.
"""
Check your work:
In [ ]:
transcribe('ATGC') == 'UACG'
In [ ]:
transcribe('ATGCAGTCAGTGCAGTCAGT') == 'UACGUCAGUCACGUCAGUCA'