Python: Flow Control

Materials by: John Blischak and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy, Anthony Scopatz, and many more)

In this lesson we will cover how to write code that will execute only if specified conditions are met and also how to automate repetitive tasks using loops.

Boolean operators

Python comes with literal comparison operators. Namely, < > <= >= == !=. All comparisons return the literal Boolean values: True or False. These can be used to test values against one another. For example,


In [ ]:
2 + 2 == 4

Comparisons between strings are defined, but may not give the answers you expect:


In [ ]:
'big' > 'small'

Comparisons can be chained together with the the and & or Python keywords.


In [ ]:
1 == 1.0 and 'hello' == 'hello'

In [ ]:
1 > 10 or False

In [ ]:
42 < 24 or True and 'wow' != 'mom'

Comparisons may also be negated using the not keyword.


In [ ]:
not 2 + 2 == 5

Be careful; comparisons will sometimes return a value even when comparing things of different type:


In [ ]:
'1' < 2          # comparing a string and an int

In [ ]:
True == 'True'   # comparing a Boolean and a string

Comparisons between strings return results in lexiographical order, in which the leftmost characters are compared first.


In [ ]:
'Bears' > 'Packers'

As two special cases, the Boolean values match 0 and 1, but no other integers:


In [ ]:
True == 0

In [ ]:
True == 1

If statements

Comparisons can be used to control program behavior by placing them in an if statement. Such statements have the following form:

if <condition>:
    <indented block of code>

The indented code will only be execute if the condition evaluates to True, which is a special Boolean value.


In [ ]:
x = 5
if x < 0:
    print "x is negative"

In [ ]:
x = -5
if x < 0:
    print "x is negative"

The if statement can be combined to great effect with a corresponding else clause.

if <condition>:
    <if-block>
else:
    <else-block>

When the condition is True the if-block is executed. When the condition is False the else-block is executed instead.


In [ ]:
x = 5
if x < 0:
    print "x is negative"
else:
    print "x in non-negative"

Many cases may be tested by using the elif statement. These come between all the if and else statements:

if <if-condition>:
    <if-block>
elif <elif-condition>:
    <elif-block>
else:
    <else-block>

When if-condition is true then only the if-block is executed. When elif-condition is true then only the elif-block is executed. When neither of these are true then the else-block is executed.


In [ ]:
x = 5
if x < 0:
    print "x is negative"
elif x == 0:
    print "x is zero"
else:
    print "x is positive"

While there must be one if statement, and there may be at most one else statement, there may be as many elif statements as are desired.

if <if-condition>:
    <if-block>
elif <elif-condition1>:
    <elif-block1>
elif <elif-condition2>:
    <elif-block2>
elif <elif-condition3>:
    <elif-block3>
...
else:
    <else-block>

Only the block for top most condition that is true is executed.


In [ ]:
x = 5
if x < 0:
    print "x is negative"
elif x == 0:
    print "x is zero"
elif x == 1:
    print "x is zero"
elif x == 2:
    print "x is two"
else:
    print "x is positive and is not equal to 0, 1, or 2"

Aside About Indentation

The indentation is a feature of Python syntax. Some other programming languages use brackets to denote a command block. Python uses indentation to recognize blocks of code that are part of conditional statements and loops. The end of the indentation (that is, the first line that has the same indentation level as the if) indicates the end of the block subject to the if's control. This if has more than one statement in each block:


In [ ]:
x = 5
if x < 0:
    print "x is negative"
    print "x is still negative"
else:
    print "x is nonnegative"
    print "x is still nonnegative"
print "But what is x?"

The last statement always prints, because it is not inside of either of the two indented blocks controlled by the if.

Python will recognize an indented block as long as everything in the same block is indented the same amount. The Python Style Guide (PEP8) recommends that you use four spaces.

Exercise

Write an if statement that prints whether x is even or odd.

Hint: % is the modular division operator. a % b returns the remainder after a is divided by b:


In [ ]:
print 3 % 2 
print 4 % 2
print 5 % 2
print 6 % 2

In [ ]:
print x
#  Your code goes here

Loops

Loops come in two flavors: while and for. While loops have the following structure:

while <condition>:
    <indented block of code>

As long as the condition is True, the code in the block will continue to execute. This may lead to infinitely executing loops!


In [ ]:
fruits = ['apples', 'oranges', 'pears', 'bananas']
i = 0
while i < len(fruits):
    print fruits[i]
    i = i + 1

Meanwhile, for-loops have the following structure:

for <loop variable name> in <iterable>:
     <indented block of code>

Here <loop variable name> is the name of a new variable that the for loop will define on each iteration, and <iterable> is an object that knows how to give us its items in succession. Lists, sets, and file handles can be iterated over like this. The for loop assigns <loop variable name> to the first value of <iterable> and runs the block of code. It then gets the next value of <iterable> and runs the block again until <iterable> is exhausted.


In [ ]:
for fruit in fruits:
    print fruit

range(6) returns a list of integers from 0 to 5.


In [ ]:
print range(6)
for i in range(6):
    print i

Since range returns a list, we can use it to count from 0 to len(fruits)-1, and use these numbers to access the contents of the list fruits. This loop prints both the items and their indexes.


In [ ]:
# Use range for a range on integers
for i in range(len(fruits)):
    print i, fruits[i]

zip is a function that takes two lists and returns a list of tuples.


In [ ]:
fruits = ['apples', 'oranges', 'pears', 'bananas']
prices = [0.49, 0.99, 1.49, 0.32]
print zip(fruits, prices)

We can use this list to go through both lists at the same time:


In [ ]:
# Use zip to iterate over two lists at once
fruits = ['apples', 'oranges', 'pears', 'bananas']
prices = [0.49, 0.99, 1.49, 0.32]
for fruit, price in zip(fruits, prices):
    print fruit, "cost", price, "each"

.items() is a method defined for dictionaries that allows us to easily iterate over key, value pairs. It returns a list of tuples [(key1, value1), (key2, value2)...]


In [ ]:
# Use "items" to iterate over a dictionary
# Note the order is non-deterministic
prices = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}
for fruit, price in prices.items():
    print fruit, "cost", price, "each"

We can even use the data to modify variables:


In [ ]:
# Calculating a sum
values = [1254, 95818, 61813541, 1813, 4]
total = 0
for x in values:
    total = total + x
print total

Short Exercise

Using a loop, calculate the factorial of 7 (the product of all positive integers up to and including 7).


In [ ]:

break, continue, and else

A break statement causes a loop to stop iterating immediately, without finishing the current block. It helps avoid infinite loops by cutting off loops when they're clearly going nowhere.


In [ ]:
reasonable = 10
for n in range(1,2000):
    if n == reasonable:
        break
    print n

continue causes the loop to skip immediately to the beginning of the next iteration, without completing the rest of the commands in the indented block.


In [ ]:
reasonable = 10
for n in range(1,20):
    if n == reasonable:
      continue
    print n

What is the difference between the output of these two?

We can combine loops and flow control to take actions that are more complex, and that depend on the data. First, let us define a dictionary with some names and titles, and a list with a subset of the names we will treat differently.


In [ ]:
knights = {"Sir Belvedere":"the Wise", 
           "Sir Lancelot":"the Brave", 
           "Sir Galahad":"the Pure", 
           "Sir Robin":"the Brave", 
           "The Black Knight":"John Cleese"} # create a dict with names and titles
favorites = knights.keys()      # create a list of favorites with all the knights
favorites.remove("Sir Robin") # change favorites to include all but one.
print knights
print favorites

We can loop through the dict of names and titles and do one of two different things for each by putting an if statement inside the for loop:


In [ ]:
for name, title in knights.items(): 
    string = name + ", "
    if name in favorites:   # this returns True if any of the values in favorites match.
        string = string + title
    else:
        string = string + title + ", but not quite so brave as Sir Lancelot." 
    print string

Reading from a file


In [ ]:
my_file = open("example.txt")
for line in my_file:
    print line.strip()
my_file.close()

Writing to a file


In [ ]:
new_file = open("example2.txt", "w")
lines = ['Does this lion eat ants?', 'have you confused your cat recently?', "It's just a flesh wound!"]
for i in lines:
    new_file.write(i + '\n')
new_file.close()

Longer Exercise: Convert genotypes

If most of this material is brand new to you, your goal is to complete Part 1. If you are more experienced, please move on to Part 2 and Part 3. And don't forget to talk to your neighbor!

Motivation:

A biologist is interested in the genetic basis of height. She measures the heights of many subjects and sends off their DNA samples to a core for genotyping arrays. These arrays determine the DNA bases at the variable sites of the genome (known as single nucleotide polymorphisms, or SNPs). Since humans are diploid, i.e. have two of each chromosome, each data point will be two DNA bases corresponding to the two chromosomes in each individual. At each SNP, there will be only three possible genotypes, e.g. AA, AG, GG for an A/G SNP. In order to test the correlation between a SNP genotype and height, she wants to perform a regression with an additive genetic model. However, she cannot do this with the data in the current form. She needs to convert the genotypes, e.g. AA, AG, and GG, to the numbers 0, 1, and 2, respectively (in the example the number corresponds the number of G bases the person has at that SNP). Since she has too much data to do this manually, e.g. in Excel, she comes to you for ideas of how to efficiently transform the data.

Part 1:

Create a new list which has the converted genotype for each subject ('AA' -> 0, 'AG' -> 1, 'GG' -> 2).


In [ ]:
genos = ['AA', 'GG', 'AG', 'AG', 'GG']
genos_new = []  # define an empty list.  Your code needs to fill in the values.
# Your code goes here

Check your work:


In [ ]:
genos_new == [0, 2, 1, 1, 2]

Part 2:

Sometimes there are errors and the genotype cannot be determined. Adapt your code from above to deal with this problem (in this example missing data is assigned NA for "Not Available"). Note: the input and output variables have different names here and in the previous exercise. Don't read from genos_new.


In [ ]:
genos_w_missing = ['AA', 'NA', 'GG', 'AG', 'AG', 'GG', 'NA']
genos_w_missing_new = []   # define an empty list.  Your code needs to fill in the values.
# The missing data should not be converted to a number, but remain 'NA' in the new list

Check your work:


In [ ]:
genos_w_missing_new == [0, 'NA', 2, 1, 1, 2, 'NA']

Part 3:

The file genos.txt has a column of genotypes. Read in the data and convert the genotypes as above. Hint: You'll need to use the built-in string method strip to remove the new-line characters (See the example of reading in a file above. We will cover string methods in the next section).


In [ ]:
# Store the genotypes from genos.txt in this list
genos_from_file = []

Check your work:


In [ ]:
genos_from_file[:15] == [2, 2, 1, 1, 0, 0, 2, 2, 2, 0, 'NA', 1, 0, 0, 2]

Bonus material: List comprehensions

Python has another way to perform iteration called list comprehensions. These are succinct ways to perform routine initialization or setting of variables.


In [ ]:
# Multiply every number in a list by 2 using a for loop
nums1 = [5, 1, 3, 10]
nums2 = []
for i in range(len(nums1)):
    nums2.append(nums1[i] * 2)
    
print nums2

In [ ]:
# Multiply every number in a list by 2 using a list comprehension
nums2 = [x * 2 for x in nums1]

print nums2

In [ ]:
# Multiply every number in a list by 2, but only if the number is greater than 4
nums1 = [5, 1, 3, 10]
nums2 = []
for i in range(len(nums1)):
    if nums1[i] > 4:
        nums2.append(nums1[i] * 2)
    
print nums2

In [ ]:
# And using a list comprehension
nums2 = [x * 2 for x in nums1 if x > 4]

print nums2