Introduction to Python: Loops and Conditions

In the previous lesson, we learned about how to store information in variables, and store instructions in functions for later use. In this lesson, we'll learn the basic tools for making our programs smarter: loops, which allow our programs to repeat themselves many times, and conditions, which allow our programs to make simple decisions for themselves. First though, we'll start by learning about a new type of variable, called a list.

Part 1: Lists

So far, we've put strings and numbers into our variables. Another type of information we can handle in Python is called a list. We can create a list like so:


In [18]:
shopping = ['cheese', 'bananas', 'circuitboards']

Key things to note:

  • we start with a variable name and an equals sign, like before.
  • the list is surrounded by []
  • the elements of the list are separated by ,

Once we've created our list, we can ask for individual elements of it like so:


In [19]:
print( shopping[0] )
print( shopping[1] )
print( shopping[2] )


cheese
bananas
circuitboards

Notice that the first element in the list is referred to by 0; we call these numbers the 'index' of the array element, and they always count starting at zero for the first element.

If instead we want to count from the back of the array, we start with -1 and go down from there:


In [20]:
print( shopping[-1] )
print( shopping[-2] )
print( shopping[-3] )


circuitboards
bananas
cheese

We can ask our array how long it is:


In [21]:
print( len(shopping) )


3

And we can even sort our array:


In [22]:
sorted_shopping = sorted(shopping)
print( sorted_shopping )


['bananas', 'cheese', 'circuitboards']

Lists are useful when we have a whole lot of conceptually similar data, or data that has a meaningful order; if you have a sensor that takes the same reading every second, you would probably want to store that data in a list, so that you can preserve what order those measurements came in.

Challenge Problem #1

Write a function that takes a list of numbers as an argument, and returns another list; this returned list should have the largest number in the original list as its first element, and the length of the original list as its second element. So, if the input list is [5, 7, 1, 3], the output list should be [7, 4].

Part 2: Loops

Now that we understand lists, we can learn about one of the most fundamental tools in programming: the for loop. Suppose you had a list of data, and a function that you wanted to apply to each one:


In [23]:
def getLeadingBase(read):
    '''
    input: a string representing a read of a genome
    output: the leading base of the input read.
    '''
    
    return read[0]

myReads = ['GGATC', 'AAACC', 'TTCGT']

print(getLeadingBase(myReads[0]))
print(getLeadingBase(myReads[1]))
print(getLeadingBase(myReads[2]))


G
A
T

This works fine, but it's a bit tedious; just like last time when we got sick of cutting and pasting our temperature conversion code, it's impractical to cut and paste that print statement for everything in the list - what if there were 3 billion reads in our list, instead of only 3? We can ask Python to repeat the same block of code over and over again, only changing the element of myReads that we're looking at by using a for loop:


In [24]:
for read in myReads:
    print(getLeadingBase( read ))


G
A
T

Python has run the stuff inside the for loop once for every value in the list provided after the in keyword. A common task is often to loop over a range of numbers; for this, Python provides the helper function range:


In [25]:
range(6)


Out[25]:
range(0, 6)

In [26]:
range(2,4)


Out[26]:
range(2, 4)

Give range one number, and it returns an iterator that from 0 up to but not including that number; give range two numbers, and it reutns an iterator counting from the first (inclusive), up to but not including the last. Another common idiom is to use a range of indices to do the same thing we did above:


In [27]:
for i in range(len(myReads)):
    print(getLeadingBase( myReads[i] ))


G
A
T

This does the exact same thing as above, but gives us a numerical index i, which we could use for something else (referring to another list, doing something special every thrid item...).

Challenge Problem #2

Lists have a handy helper function append(x), which adds the argument to the end of the list. So for example, if I had

myList = [1,2,3]
myList.append(4)

myList would now be [1,2,3,4]. Write a function called addPrefix that takes a list of strings and a prefix as an argument, and returns another list the same as the original, but with prefix added to the front of every string. So for example,

addPrefix(['GA', 'TC', 'GC'], 'CC') would reurn ['CCGA', 'CCTC', 'CCGC'].

Part 3: Conditions

So far, we've learned a lot about how to get Python to repeat itself, using functions and for loops. But in real science, while we may do many similar things in an analysis, they aren't usually all completely identical; based on circumstances, we often have to make decisions and adapt to our observations. The fundamental tool for doing that in Python is the conditional statement, and it's the last tool we need before we can dive into our future lessons.

Suppose we had some genetic reads, but we only wanted to consider ones that were more than 10 bases long. We could check with a condition:


In [28]:
myReads = ['ATGTC', 'G', 'ATG', 'ATGC']

for read in myReads:
    if len(read) > 3:
        print(read)


ATGTC
ATGC

So while we looped through the entire list, we only printed out reads that passed our condition of being longer than 3 bases. We can also add alternative conditions to check for other cases:


In [29]:
for read in myReads:
    if len(read) > 3:
        print(read)
    elif len(read) == 3:
        print(read, 'is just barely long enough')


ATGTC
ATG is just barely long enough
ATGC

Finally, we can add a catch all statement to the end to do something with all the items that didn't satisfy any condition:


In [30]:
for read in myReads:
    if len(read) > 3:
        print(read)
    elif len(read) == 3:
        print(read, 'is just barely long enough')
    else:
        print(read, 'is too short.')


ATGTC
G is too short.
ATG is just barely long enough
ATGC

All conditions start with an if statement, but the number of elifs afterwards is up to you - you can check as many alternate conditions as you like (including none). Similarly, a catchall else can do something for all the leftovers, but it isn't required.

Above we saw a couple examples of making logical expressions to check in a condition; these are conditions that evaluate to True or False, like 7 < 3 (False), or 0 == 0 (True) - notice the double equals sign asks the question 'are these two things equal?'.

Finally, we can combine conditions together using the words and and or:


In [31]:
for read in myReads:
    if len(read) > 2 and len(read) < 5:
        print(read, 'length is greater than 2 and less than 5')
    elif len(read) < 3 or len(read) == 4:
        print(read, 'length is either less than 3 or exactly 4')
    else:
        print(read, 'didnt match any conditions.')


ATGTC didnt match any conditions.
G length is either less than 3 or exactly 4
ATG length is greater than 2 and less than 5
ATGC length is greater than 2 and less than 5

Challenge Problem #3

Strings can be indexed the same way as lists - so if you have myword = 'Python', then myword[2] will be t. Write a function geneComplement that takes a genome as an argument, and returns its genetic complement - ie, A is swapped with T, and G is swapped with C, so geneComplement('GGCATT') would return CCGTAA.


In [ ]: