Materials by: John Blischak and other Software Carpentry instructors (Joshua R. Smith, Milad Fatenejad, Katy Huff, Tommy Guy, Anthony Scopatz, and many more)
In this lesson we will cover how to write code that will execute only if specified conditions are met and also how to automate repetitive tasks using loops.
In [ ]:
2 + 2 == 4
Comparisons between strings are defined, but may not give the answers you expect:
In [ ]:
'big' > 'small'
Comparisons can be chained together with the the and & or Python keywords.
In [ ]:
1 == 1.0 and 'hello' == 'hello'
In [ ]:
1 > 10 or False
In [ ]:
42 < 24 or True and 'wow' != 'mom'
Comparisons may also be negated using the not keyword.
In [ ]:
not 2 + 2 == 5
Be careful; comparisons will sometimes return a value even when comparing things of different type
:
In [ ]:
'1' < 2 # comparing a string and an int
In [ ]:
True == 'True' # comparing a Boolean and a string
Comparisons between strings return results in lexiographical order, in which the leftmost characters are compared first.
In [ ]:
'Bears' > 'Packers'
As two special cases, the Boolean values match 0 and 1, but no other integers:
In [ ]:
True == 0
In [ ]:
True == 1
In [ ]:
x = 5
if x < 0:
print "x is negative"
In [ ]:
x = -5
if x < 0:
print "x is negative"
The if statement can be combined to great effect with a corresponding else clause.
if <condition>:
<if-block>
else:
<else-block>
When the condition is True
the if-block is executed. When the condition is False
the else-block is executed instead.
In [ ]:
x = 5
if x < 0:
print "x is negative"
else:
print "x in non-negative"
Many cases may be tested by using the elif statement. These come between all the if
and else
statements:
if <if-condition>:
<if-block>
elif <elif-condition>:
<elif-block>
else:
<else-block>
When if-condition is true then only the if-block is executed. When elif-condition is true then only the elif-block is executed. When neither of these are true then the else-block is executed.
In [ ]:
x = 5
if x < 0:
print "x is negative"
elif x == 0:
print "x is zero"
else:
print "x is positive"
While there must be one if
statement, and there may be at most one else
statement, there may be as many elif
statements as are desired.
if <if-condition>:
<if-block>
elif <elif-condition1>:
<elif-block1>
elif <elif-condition2>:
<elif-block2>
elif <elif-condition3>:
<elif-block3>
...
else:
<else-block>
Only the block for top most condition that is true is executed.
In [ ]:
x = 5
if x < 0:
print "x is negative"
elif x == 0:
print "x is zero"
elif x == 1:
print "x is zero"
elif x == 2:
print "x is two"
else:
print "x is positive and is not equal to 0, 1, or 2"
The indentation is a feature of Python syntax. Some other programming languages use brackets to denote a command block. Python uses indentation to recognize blocks of code that are part of conditional statements and loops. The end of the indentation (that is, the first line that has the same indentation level as the if
) indicates the end of the block subject to the if
's control. This if
has more than one statement in each block:
In [ ]:
x = 5
if x < 0:
print "x is negative"
print "x is still negative"
else:
print "x is nonnegative"
print "x is still nonnegative"
print "But what is x?"
The last statement always prints, because it is not inside of either of the two indented blocks controlled by the if
.
Python will recognize an indented block as long as everything in the same block is indented the same amount. The Python Style Guide (PEP8) recommends that you use four spaces.
In [ ]:
print 3 % 2
print 4 % 2
print 5 % 2
print 6 % 2
In [ ]:
print x
# Your code goes here
In [ ]:
fruits = ['apples', 'oranges', 'pears', 'bananas']
i = 0
while i < len(fruits):
print fruits[i]
i = i + 1
Meanwhile, for-loops have the following structure:
for <loop variable name> in <iterable>:
<indented block of code>
Here <loop variable name>
is the name of a new variable that the for loop will define on each iteration, and <iterable>
is an object that knows how to give us its items in succession. Lists, sets, and file handles can be iterated over like this. The for
loop assigns <loop variable name>
to the first value of <iterable>
and runs the block of code. It then gets the next value of <iterable>
and runs the block again until <iterable>
is exhausted.
In [ ]:
for fruit in fruits:
print fruit
range(6)
returns a list of integers from 0 to 5.
In [ ]:
print range(6)
for i in range(6):
print i
Since range
returns a list, we can use it to count from 0 to len(fruits)-1
, and use these numbers to access the contents of the list fruits
. This loop prints both the items and their indexes.
In [ ]:
# Use range for a range on integers
for i in range(len(fruits)):
print i, fruits[i]
zip
is a function that takes two lists and returns a list of tuples.
In [ ]:
fruits = ['apples', 'oranges', 'pears', 'bananas']
prices = [0.49, 0.99, 1.49, 0.32]
print zip(fruits, prices)
We can use this list to go through both lists at the same time:
In [ ]:
# Use zip to iterate over two lists at once
fruits = ['apples', 'oranges', 'pears', 'bananas']
prices = [0.49, 0.99, 1.49, 0.32]
for fruit, price in zip(fruits, prices):
print fruit, "cost", price, "each"
.items()
is a method defined for dictionaries that allows us to easily iterate over key, value pairs. It returns a list of tuples [(key1, value1), (key2, value2)...]
In [ ]:
# Use "items" to iterate over a dictionary
# Note the order is non-deterministic
prices = {'apples': 0.49, 'oranges': 0.99, 'pears': 1.49, 'bananas': 0.32}
for fruit, price in prices.items():
print fruit, "cost", price, "each"
We can even use the data to modify variables:
In [ ]:
# Calculating a sum
values = [1254, 95818, 61813541, 1813, 4]
total = 0
for x in values:
total = total + x
print total
In [ ]:
In [ ]:
reasonable = 10
for n in range(1,2000):
if n == reasonable:
break
print n
continue
causes the loop to skip immediately to the beginning of the next iteration,
without completing the rest of the commands in the indented block.
In [ ]:
reasonable = 10
for n in range(1,20):
if n == reasonable:
continue
print n
What is the difference between the output of these two?
We can combine loops and flow control to take actions that are more complex, and that depend on the data. First, let us define a dictionary with some names and titles, and a list with a subset of the names we will treat differently.
In [ ]:
knights = {"Sir Belvedere":"the Wise",
"Sir Lancelot":"the Brave",
"Sir Galahad":"the Pure",
"Sir Robin":"the Brave",
"The Black Knight":"John Cleese"} # create a dict with names and titles
favorites = knights.keys() # create a list of favorites with all the knights
favorites.remove("Sir Robin") # change favorites to include all but one.
print knights
print favorites
We can loop through the dict of names and titles and do one of two different things for each by putting an if
statement inside the for
loop:
In [ ]:
for name, title in knights.items():
string = name + ", "
if name in favorites: # this returns True if any of the values in favorites match.
string = string + title
else:
string = string + title + ", but not quite so brave as Sir Lancelot."
print string
In [ ]:
my_file = open("example.txt")
for line in my_file:
print line.strip()
my_file.close()
In [ ]:
new_file = open("example2.txt", "w")
lines = ['Does this lion eat ants?', 'have you confused your cat recently?', "It's just a flesh wound!"]
for i in lines:
new_file.write(i + '\n')
new_file.close()
A biologist is interested in the genetic basis of height. She measures the heights of many subjects and sends off their DNA samples to a core for genotyping arrays. These arrays determine the DNA bases at the variable sites of the genome (known as single nucleotide polymorphisms, or SNPs). Since humans are diploid, i.e. have two of each chromosome, each data point will be two DNA bases corresponding to the two chromosomes in each individual. At each SNP, there will be only three possible genotypes, e.g. AA, AG, GG for an A/G SNP. In order to test the correlation between a SNP genotype and height, she wants to perform a regression with an additive genetic model. However, she cannot do this with the data in the current form. She needs to convert the genotypes, e.g. AA, AG, and GG, to the numbers 0, 1, and 2, respectively (in the example the number corresponds the number of G bases the person has at that SNP). Since she has too much data to do this manually, e.g. in Excel, she comes to you for ideas of how to efficiently transform the data.
In [ ]:
genos = ['AA', 'GG', 'AG', 'AG', 'GG']
genos_new = [] # define an empty list. Your code needs to fill in the values.
# Your code goes here
Check your work:
In [ ]:
genos_new == [0, 2, 1, 1, 2]
Sometimes there are errors and the genotype cannot be determined. Adapt your code from above to deal with this problem (in this example missing data is assigned NA for "Not Available"). Note: the input and output variables have different names here and in the previous exercise. Don't read from genos_new
.
In [ ]:
genos_w_missing = ['AA', 'NA', 'GG', 'AG', 'AG', 'GG', 'NA']
genos_w_missing_new = [] # define an empty list. Your code needs to fill in the values.
# The missing data should not be converted to a number, but remain 'NA' in the new list
Check your work:
In [ ]:
genos_w_missing_new == [0, 'NA', 2, 1, 1, 2, 'NA']
In [ ]:
# Store the genotypes from genos.txt in this list
genos_from_file = []
Check your work:
In [ ]:
genos_from_file[:15] == [2, 2, 1, 1, 0, 0, 2, 2, 2, 0, 'NA', 1, 0, 0, 2]
Python has another way to perform iteration called list comprehensions. These are succinct ways to perform routine initialization or setting of variables.
In [ ]:
# Multiply every number in a list by 2 using a for loop
nums1 = [5, 1, 3, 10]
nums2 = []
for i in range(len(nums1)):
nums2.append(nums1[i] * 2)
print nums2
In [ ]:
# Multiply every number in a list by 2 using a list comprehension
nums2 = [x * 2 for x in nums1]
print nums2
In [ ]:
# Multiply every number in a list by 2, but only if the number is greater than 4
nums1 = [5, 1, 3, 10]
nums2 = []
for i in range(len(nums1)):
if nums1[i] > 4:
nums2.append(nums1[i] * 2)
print nums2
In [ ]:
# And using a list comprehension
nums2 = [x * 2 for x in nums1 if x > 4]
print nums2