In the previous lesson, we learned about how to store information in variables, and store instructions in functions for later use. In this lesson, we'll learn the basic tools for making our programs smarter: loops, which allow our programs to repeat themselves many times, and conditions, which allow our programs to make simple decisions for themselves. First though, we'll start by learning about a new type of variable, called a list.
In :shopping = ['cheese', 'bananas', 'circuitboards']
Key things to note:
Once we've created our list, we can ask for individual elements of it like so:
In :print( shopping ) print( shopping ) print( shopping )
cheese bananas circuitboards
Notice that the first element in the list is referred to by
0; we call these numbers the 'index' of the array element, and they always count starting at zero for the first element.
If instead we want to count from the back of the array, we start with
-1 and go down from there:
In :print( shopping[-1] ) print( shopping[-2] ) print( shopping[-3] )
circuitboards bananas cheese
We can ask our array how long it is:
In :print( len(shopping) )
And we can even sort our array:
In :sorted_shopping = sorted(shopping) print( sorted_shopping )
['bananas', 'cheese', 'circuitboards']
Lists are useful when we have a whole lot of conceptually similar data, or data that has a meaningful order; if you have a sensor that takes the same reading every second, you would probably want to store that data in a list, so that you can preserve what order those measurements came in.
Write a function that takes a list of numbers as an argument, and returns another list; this returned list should have the largest number in the original list as its first element, and the length of the original list as its second element. So, if the input list is
[5, 7, 1, 3], the output list should be
In :def getLeadingBase(read): ''' input: a string representing a read of a genome output: the leading base of the input read. ''' return read myReads = ['GGATC', 'AAACC', 'TTCGT'] print(getLeadingBase(myReads)) print(getLeadingBase(myReads)) print(getLeadingBase(myReads))
G A T
This works fine, but it's a bit tedious; just like last time when we got sick of cutting and pasting our temperature conversion code, it's impractical to cut and paste that print statement for everything in the list - what if there were 3 billion reads in our list, instead of only 3? We can ask Python to repeat the same block of code over and over again, only changing the element of
myReads that we're looking at by using a for loop:
In :for read in myReads: print(getLeadingBase( read ))
G A T
Python has run the stuff inside the for loop once for every value in the list provided after the
in keyword. A common task is often to loop over a range of numbers; for this, Python provides the helper function
range one number, and it returns an iterator that from 0 up to but not including that number; give range two numbers, and it reutns an iterator counting from the first (inclusive), up to but not including the last. Another common idiom is to use a range of indices to do the same thing we did above:
In :for i in range(len(myReads)): print(getLeadingBase( myReads[i] ))
G A T
This does the exact same thing as above, but gives us a numerical index
i, which we could use for something else (referring to another list, doing something special every thrid item...).
Lists have a handy helper function
append(x), which adds the argument to the end of the list. So for example, if I had
myList = [1,2,3] myList.append(4)
myListwould now be
[1,2,3,4]. Write a function called
addPrefixthat takes a list of strings and a prefix as an argument, and returns another list the same as the original, but with prefix added to the front of every string. So for example,
addPrefix(['GA', 'TC', 'GC'], 'CC')would reurn
['CCGA', 'CCTC', 'CCGC'].
So far, we've learned a lot about how to get Python to repeat itself, using functions and for loops. But in real science, while we may do many similar things in an analysis, they aren't usually all completely identical; based on circumstances, we often have to make decisions and adapt to our observations. The fundamental tool for doing that in Python is the conditional statement, and it's the last tool we need before we can dive into our future lessons.
Suppose we had some genetic reads, but we only wanted to consider ones that were more than 10 bases long. We could check with a condition:
In :myReads = ['ATGTC', 'G', 'ATG', 'ATGC'] for read in myReads: if len(read) > 3: print(read)
So while we looped through the entire list, we only printed out reads that passed our condition of being longer than 3 bases. We can also add alternative conditions to check for other cases:
In :for read in myReads: if len(read) > 3: print(read) elif len(read) == 3: print(read, 'is just barely long enough')
ATGTC ATG is just barely long enough ATGC
Finally, we can add a catch all statement to the end to do something with all the items that didn't satisfy any condition:
In :for read in myReads: if len(read) > 3: print(read) elif len(read) == 3: print(read, 'is just barely long enough') else: print(read, 'is too short.')
ATGTC G is too short. ATG is just barely long enough ATGC
All conditions start with an
if statement, but the number of
elifs afterwards is up to you - you can check as many alternate conditions as you like (including none). Similarly, a catchall
else can do something for all the leftovers, but it isn't required.
Above we saw a couple examples of making logical expressions to check in a condition; these are conditions that evaluate to True or False, like
7 < 3 (False), or
0 == 0 (True) - notice the double equals sign asks the question 'are these two things equal?'.
Finally, we can combine conditions together using the words
In :for read in myReads: if len(read) > 2 and len(read) < 5: print(read, 'length is greater than 2 and less than 5') elif len(read) < 3 or len(read) == 4: print(read, 'length is either less than 3 or exactly 4') else: print(read, 'didnt match any conditions.')
ATGTC didnt match any conditions. G length is either less than 3 or exactly 4 ATG length is greater than 2 and less than 5 ATGC length is greater than 2 and less than 5
Strings can be indexed the same way as lists - so if you have
myword = 'Python', then
t. Write a function
geneComplementthat takes a genome as an argument, and returns its genetic complement - ie, A is swapped with T, and G is swapped with C, so
geneComplement('GGCATT')would return CCGTAA.
In [ ]: