Python: Flow Control

Materials by: John Blischak and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy and many more)

In this lesson we will cover howto automate repetitive tasks using loops.

Loops

Loops come in two flavors: while and for.


In [2]:
fruits = ['apples', 'oranges', 'pears', 'bananas']
i = 0
while i < len(fruits):
    print fruits[i]
    i = i + 1


apples
oranges
pears
bananas

In [ ]:
# What is the final value of i ?

# what happens if you initialize i=10? What will print?

In [ ]:
# a for loop will pull the items out of your container, (one at a time)
# and put their values into the temporary variable fruit
for fruit in fruits:
    print fruit

In [ ]:
# What is the current value of fruit?

In [ ]:
## Just to be explicit
print fruits

for fruit in fruits:
    print fruit
print 'in the end fruit is ', fruit

In [6]:
# While you could use range to get the index values for each fruit like this
for val in range(len(fruits)):
    print val, fruits[val]
    
## python has a better way using enumerate, which counts your items for you 
# and is more readable
print ## This will print a blank line
print 'Now using enumerate!!'
for val, current_fruit in enumerate(fruits):
    print current_fruit, val


0 apples
1 oranges
2 pears
3 bananas

Now using enumerate!!
apples 0
oranges 1
pears 2
bananas 3

In [ ]:
# Use zip to iterate over two lists at once
fruits = ['apples', 'oranges', 'pears', 'bananas']
prices = [0.49, 0.99, 1.49, 0.32]
for fruit, price in zip(fruits, prices):
    print fruit, "cost", price, "each"

In [7]:
# Use "items" to iterate over a dictionary
# Note the order is non-deterministic 
# (eg which fruit:price will print first?)
prices = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}
for fruit, price in prices.items():
    print fruit, "cost", price, "each"


pears cost 1.49 each
apples cost 0.49 each
oranges cost 0.99 each
bananas cost 0.32 each

In [ ]:
# Calculating a sum
values = [1254, 95818, 61813541, 1813, 4]
sum = 0
for x in values:
    sum = sum + x
sum

Short Exercise

Using a loop, calculate the factorial of 42 (the product of all integers up to and including 42).


In [9]:
## You have been seeing how tests can help you...so lets write a test

def test_myfactorial():
    input_val = 6
    ## we know 6! = 6 X 5 X 4 X 3 X 2 X 1
    expected_result = 6 * 5 * 4 * 3 * 2 * 1
    res = myfactorial(6)
    assert res == expected_result

In [10]:
## since we have not defined myfactorial, this will FAIL, and raise an Exception
test_myfactorial()


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-4fdfeed1a3ee> in <module>()
      1 ## since we have not defined myfactorial, this will FAIL, and raise an Exception
----> 2 test_myfactorial()

<ipython-input-9-9e663bf42d73> in test_myfactorial()
      5     ## we know 6! = 6 X 5 X 4 X 3 X 2 X 1
      6     expected_result = 6 * 5 * 4 * 3 * 2 * 1
----> 7     res = myfactorial(6)
      8     assert res == expected_result

NameError: global name 'myfactorial' is not defined

In [ ]:
## Now you need to create myfactorial (any input) so our test doesn't raise an exception
## NOTE: `pass` below is just a placeholder, you need to delete it and replace with code
##       to calculate the factorial of a given integer
##  HINT:
## range(2)
## [0,1]

def myfactorial(x):
    """ calculates x! (x factorial) and returns result"""
    pass

In [ ]:
## now watch your test pass!!!
test_myfactorial()

Woo Hoo!!

You just did test-driven development! Good job!

break, continue, and else

A break statement cuts off a loop from within an inner loop. It helps avoid infinite loops by cutting off loops when they're clearly going nowhere.


In [ ]:
reasonable = 10
for n in range(1,2000):
    if n == reasonable :
        break
    print n

Something you might want to do instead of breaking is to continue to the next iteration of a loop, giving up on the current one.


In [ ]:
reasonable = 10
for n in range(1,20):
    if n == reasonable :
      continue
    print n

What is the difference between the output of these two?

Working with files isn't covered, for example, until Lesson 12 in Codecademy. However, in the interest of getting to interesting ways to deal with data, we introduce the basics here!

Reading from a file


In [ ]:
less example.txt

In [ ]:
my_file = open("example.txt")
for line in my_file:
    print line.strip()
my_file.close()

Writing to a file


In [ ]:
new_file = open("example2.txt", "w")
dwight = ['bears', 'beets', 'Battlestar Galactica']
for i in dwight:
    new_file.write(i + '\n')
new_file.close()

In [ ]:
less example2.txt

Longer Exercise: Convert genotypes

If most of this material is brand new to you, your goal is to complete Part 1. If you are more experienced, please move on to Part 2 and Part 3. And don't forget to talk to your neighbor!

Motivation:

A biologist is interested in the genetic basis of height. She measures the heights of many subjects and sends off their DNA samples to a core for genotyping arrays. These arrays determine the DNA bases at the variable sites of the genome (known as single nucleotide polymorphisms, or SNPs). Since humans are diploid, i.e. have two of each chromosome, each data point will be two DNA bases corresponding to the two chromosomes in each individual. At each SNP, there will be only three possible genotypes, e.g. AA, AG, GG for an A/G SNP. In order to test the correlation between a SNP genotype and height, she wants to perform a regression with an additive genetic model. However, she cannot do this with the data in the current form. She needs to convert the genotypes, e.g. AA, AG, and GG, to the numbers 0, 1, and 2, respectively (in the example the number corresponds the number of G bases the person has at that SNP). Since she has too much data to do this manually, e.g. in Excel, she comes to you for ideas of how to efficiently transform the data.

Part 1:

Create a new list which has the converted genotype for each subject ('AA' -> 0, 'AG' -> 1, 'GG' -> 2).


In [ ]:
genos = ['AA', 'GG', 'AG', 'AG', 'GG']
genos_new = []
# Use your knowledge of if/else statements and loop structures below.

Check your work:


In [ ]:
genos_new == [0, 2, 1, 1, 2]

Part 2:

Sometimes there are errors and the genotype cannot be determined. Adapt your code from above to deal with this problem (in this example missing data is assigned NA for "Not Available").


In [ ]:
genos_w_missing = ['AA', 'NA', 'GG', 'AG', 'AG', 'GG', 'NA']
genos_w_missing_new = []
# The missing data should not be converted to a number, but remain 'NA' in the new list

Check your work:


In [ ]:
genos_w_missing_new == [0, 'NA', 2, 1, 1, 2, 'NA']