Materials by: John Blischak, Anthony Scopatz, and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy and many more)
Computers are very useful for doing the same operation over and over. When you know you will be performing the same operation many times, it is best to encapsulate this similar code into a function or a method. Programming functions are related to mathematical functions, e.g. $f(x)$, and it is helpful to think of them as abstract operators that produce output given some input. Let's look at some examples to solidify this concept.
In [ ]:
# Find the start codon of a gene
dna = 'CTGTTGACATGCATTCACGCTACGCTAGCT'
dna.find('ATG')
In [ ]:
# parsing a line from a comma-delimited file
lotto_numbers = '4,8,15,16,23,42\n'
lotto_numbers = lotto_numbers.strip()
print lotto_numbers.split(',')
In [ ]:
question = '%H%ow%z%d%@d%z%th%ez$%@p%ste%rzb%ur%nz%$%@szt%on%gue%?%'
question = question.replace('%', '')
question = question.replace('@', 'i')
question = question.replace('$', 'h')
question = question.replace('z', ' ')
print question
In [ ]:
answer = '=H=&!dr=a=nk!c=~ff&&!be=f~r&=!i=t!w=as!c=~~l.='
print answer.replace('=', '').replace('&', 'e').replace('~', 'o').replace('!', ' ')
Because the binding strength of guanine (G) to cytosine (C) is different from the binding strength of adenine (A) to thymine (T) (and many other differences), it is often useful to know the fraction of a DNA sequence that is G's or C's. Go to the string method section of the Python documentation and find the string method that will allow you to calculate this fraction.
In [ ]:
# Calculate the fraction of G's and C's in this DNA sequence
seq1 = 'ACGTACGTAGCTAGTAGCTACGTAGCTACGTA'
gc =
Check your work:
In [ ]:
round(gc, ndigits = 2) == .47
In [ ]:
def do_nothing():
s = "I don't do much"
However, this often isn't very useful since we haven't returned any values from this function. Note: that if you don't return anything from a function in Python, you implicitly have returned the special None
singleton. To return values that you computed locally in the function body, use the return keyword.
def <function name>():
<function body>
return <local variable 1>
In [ ]:
def square(x):
sqr = x * x
return sqr
Using a function is done by placing parentheses ()
after the function name after you have defined it. This is known as calling the function. If the function requires arguments, the values for these arguments are inside of the parentheses.
In [ ]:
print square(2)
Like mathematical functions, you can compose a function with other functions or with itself!
In [ ]:
print square(square(2))
Functions may be defined such that they have multiple arguments or multiple return values:
def <function name>(<arg1>, <arg2>, ...):
<function body>
return <var1> , <var2>, ...
In [ ]:
def hello(time, name):
"""Print a nice message. Time and name should both be strings.
Example: hello('morning', 'Software Carpentry')
"""
print 'Good ' + time + ', ' + name + '!'
In [ ]:
hello('afternoon', 'Software Carpentry')
The description right below the function name is called a docstring. For best practices on composing docstrings, read PEP 257 -- Docstring Conventions.
In [ ]:
def calculate_gc(x):
"""Calculates the GC content of DNA sequence x.
x: a string composed only of A's, T's, G's, and C's."""
Check your work:
In [ ]:
print round(calculate_gc('ATGC'), ndigits = 2) == 0.50
print round(calculate_gc('AGCGTCGTCAGTCGT'), ndigits = 2) == 0.60
print round(calculate_gc('ATaGtTCaAGcTCgATtGaATaGgTAaCt'), ndigits = 2) == 0.34
Python has a lot of useful data type and functions built into the language, some of which you have already seen. For a full list, you can type dir(__builtins__)
. However, there are even more functions stored in modules. An example is the sine function, which is stored in the math module. In order to access mathematical functions, like sin, we need to import
the math module. Lets take a look at a simple example:
In [ ]:
print sin(3) # Error! Python doesn't know what sin is...yet
In [ ]:
import math # Import the math module
math.sin(3)
In [ ]:
dir(math) # See a list of everything in the math module
In [ ]:
help(math) # Get help information for the math module
It is not very difficult to use modules - you just have to know the module name and import it. There are a few variations on the import statement that can be used to make your life easier. Lets take a look at an example:
In [ ]:
from math import * # import everything from math into the global namespace (A BAD IDEA IN GENERAL)
print sin(3) # notice that we don't need to type math.sin anymore
print tan(3) # the tangent function was also in math, so we can use that too
In [ ]:
reset # Clear everything from IPython
In [ ]:
from math import sin # Import just sin from the math module. This is a good idea.
print sin(3) # We can use sin because we just imported it
print tan(3) # Error: We only imported sin - not tan
In [ ]:
reset # Clear everything
In [ ]:
import math as m # Same as import math, except we are renaming the module m
print m.sin(3) # This is really handy if you have module names that are long
Now that you can write your own functions, you too will experience the dilemma of deciding whether to spend the extra time to make your code more general, and therefore more easily reused in the future.
For this exercise we will return to the cochlear implant data first introduced in the section on the shell. In order to analyze the data, we need to import the data into Python. Furthermore, since this is something that would have to be done many times, we will write a function to do this. As before, beginners should aim to complete Part 1 and more advanced participants should try to complete Part 2 and Part 3 as well.
In [ ]:
def view_cochlear(filename):
"""Write your docstring here.
"""
Test it out:
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/alexander/data_216.DATA')
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/Lawrence/Data0525')
In [ ]:
def view_cochlear(filename):
"""Write your docstring here.
"""
Test it out:
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/alexander/data_216.DATA')
In [ ]:
view_cochlear('/home/swc/boot-camps/shell/data/Lawrence/Data0525')
In [ ]:
def save_cochlear(filename):
"""Write your docstring here.
"""
Check your work:
In [ ]:
data_216 = save_cochlear("/home/swc/boot-camps/shell/data/alexander/data_216.DATA")
print data_216["Subject"]
In [ ]:
Data0525 = save_cochlear("/home/swc/boot-camps/shell/data/alexander/data_216.DATA")
print Data0525["CI type"]
During transcription, an enzyme called RNA Polymerase reads the DNA sequence and creates a complementary RNA sequence. Furthermore, RNA has the nucleotide uracil (U) instead of thymine (T).
Write a function that mimics transcription. The input argument is a string that contains the letters A, T, G, and C. Create a new string following these rules:
Convert A to U
Convert T to A
Convert G to C
Convert C to G
Hint: You can iterate through a string using a for loop similarly to how you loop through a list.
In [ ]:
def transcribe(seq):
"""Write your docstring here.
"""
Check your work:
In [ ]:
transcribe('ATGC') == 'UACG'
In [ ]:
transcribe('ATGCAGTCAGTGCAGTCAGT') == 'UACGUCAGUCACGUCAGUCA'