Materials by: John Blischak and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy and many more)
In this lesson we will cover howto automate repetitive tasks using loops.
In [2]:
fruits = ['apples', 'oranges', 'pears', 'bananas']
i = 0
while i < len(fruits):
print fruits[i]
i = i + 1
In [ ]:
# What is the final value of i ?
# what happens if you initialize i=10? What will print?
In [ ]:
# a for loop will pull the items out of your container, (one at a time)
# and put their values into the temporary variable fruit
for fruit in fruits:
print fruit
In [ ]:
# What is the current value of fruit?
In [ ]:
## Just to be explicit
print fruits
for fruit in fruits:
print fruit
print 'in the end fruit is ', fruit
In [6]:
# While you could use range to get the index values for each fruit like this
for val in range(len(fruits)):
print val, fruits[val]
## python has a better way using enumerate, which counts your items for you
# and is more readable
print ## This will print a blank line
print 'Now using enumerate!!'
for val, current_fruit in enumerate(fruits):
print current_fruit, val
In [ ]:
# Use zip to iterate over two lists at once
fruits = ['apples', 'oranges', 'pears', 'bananas']
prices = [0.49, 0.99, 1.49, 0.32]
for fruit, price in zip(fruits, prices):
print fruit, "cost", price, "each"
In [7]:
# Use "items" to iterate over a dictionary
# Note the order is non-deterministic
# (eg which fruit:price will print first?)
prices = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}
for fruit, price in prices.items():
print fruit, "cost", price, "each"
In [ ]:
# Calculating a sum
values = [1254, 95818, 61813541, 1813, 4]
sum = 0
for x in values:
sum = sum + x
sum
In [9]:
## You have been seeing how tests can help you...so lets write a test
def test_myfactorial():
input_val = 6
## we know 6! = 6 X 5 X 4 X 3 X 2 X 1
expected_result = 6 * 5 * 4 * 3 * 2 * 1
res = myfactorial(6)
assert res == expected_result
In [10]:
## since we have not defined myfactorial, this will FAIL, and raise an Exception
test_myfactorial()
In [ ]:
## Now you need to create myfactorial (any input) so our test doesn't raise an exception
## NOTE: `pass` below is just a placeholder, you need to delete it and replace with code
## to calculate the factorial of a given integer
## HINT:
## range(2)
## [0,1]
def myfactorial(x):
""" calculates x! (x factorial) and returns result"""
pass
In [ ]:
## now watch your test pass!!!
test_myfactorial()
In [ ]:
reasonable = 10
for n in range(1,2000):
if n == reasonable :
break
print n
Something you might want to do instead of breaking is to continue to the next iteration of a loop, giving up on the current one.
In [ ]:
reasonable = 10
for n in range(1,20):
if n == reasonable :
continue
print n
What is the difference between the output of these two?
In [ ]:
less example.txt
In [ ]:
my_file = open("example.txt")
for line in my_file:
print line.strip()
my_file.close()
In [ ]:
new_file = open("example2.txt", "w")
dwight = ['bears', 'beets', 'Battlestar Galactica']
for i in dwight:
new_file.write(i + '\n')
new_file.close()
In [ ]:
less example2.txt
If most of this material is brand new to you, your goal is to complete Part 1. If you are more experienced, please move on to Part 2 and Part 3. And don't forget to talk to your neighbor!
A biologist is interested in the genetic basis of height. She measures the heights of many subjects and sends off their DNA samples to a core for genotyping arrays. These arrays determine the DNA bases at the variable sites of the genome (known as single nucleotide polymorphisms, or SNPs). Since humans are diploid, i.e. have two of each chromosome, each data point will be two DNA bases corresponding to the two chromosomes in each individual. At each SNP, there will be only three possible genotypes, e.g. AA, AG, GG for an A/G SNP. In order to test the correlation between a SNP genotype and height, she wants to perform a regression with an additive genetic model. However, she cannot do this with the data in the current form. She needs to convert the genotypes, e.g. AA, AG, and GG, to the numbers 0, 1, and 2, respectively (in the example the number corresponds the number of G bases the person has at that SNP). Since she has too much data to do this manually, e.g. in Excel, she comes to you for ideas of how to efficiently transform the data.
In [ ]:
genos = ['AA', 'GG', 'AG', 'AG', 'GG']
genos_new = []
# Use your knowledge of if/else statements and loop structures below.
Check your work:
In [ ]:
genos_new == [0, 2, 1, 1, 2]
In [ ]:
genos_w_missing = ['AA', 'NA', 'GG', 'AG', 'AG', 'GG', 'NA']
genos_w_missing_new = []
# The missing data should not be converted to a number, but remain 'NA' in the new list
Check your work:
In [ ]:
genos_w_missing_new == [0, 'NA', 2, 1, 1, 2, 'NA']