When an operation needs to be repeated multiple times, for example on all of the items in a list, we avoid having to type (or copy and paste) repetitive code by creating a loop. There are two ways of creating loops in Python, the for loop and the while loop.
The for loop in Python iterates over each item in a sequence (such as a list or tuple) in the order that they appear in the sequence. What this means is that a variable (code in the below example) is set to each item from the sequence of values in turn, and each time this happens the indented block of code is executed again.
In [ ]:
codeList = ['NA06984', 'NA06985', 'NA06986', 'NA06989', 'NA06991']
for code in codeList:
print(code)
A for loop can iterate over the individual characters in a string:
In [ ]:
dnaSequence = 'ATGGTGTTGCC'
for base in dnaSequence:
print(base)
And also over the keys of a dictionary:
In [ ]:
rnaMassDict = {"G":345.21, "C":305.18, "A":329.21, "U":302.16}
for x in rnaMassDict:
print(x, rnaMassDict[x])
Any variables that are defined before the loop can be accessed from inside the loop. So for example to calculate the summation of the items in a list of values we could define the total initially to be zero and add each value to the total in the loop:
In [ ]:
total = 0
values = [1, 2, 4, 8, 16]
for v in values:
total = total + v
# total += v
print(total)
print(total)
Naturally we can combine a for loop with an if statement, noting that we need two indentation levels, one for the outer loop and another for the conditional blocks:
In [ ]:
geneExpression = {
'Beta-Catenin': 2.5,
'Beta-Actin': 1.7,
'Pax6': 0,
'HoxA2': -3.2
}
for gene in geneExpression:
if geneExpression[gene] < 0:
print(gene, "is downregulated")
elif geneExpression[gene] > 0:
print(gene, "is upregulated")
else:
print("No change in expression of ", gene)
In addition to the for loop that operates on a collection of items, there is a while loop that simply repeats while some statement evaluates to True and stops when it is False. Note that if the tested expression never evaluates to False then you have an “infinite loop”, which is not good.
In this example we generate a series of numbers by doubling a value after each iteration, until a limit is reached:
In [ ]:
value = 0.25
while value < 8:
value = value * 2
print(value)
print("final value:", value)
Whats going on here is that the value is doubled in each iteration and once it gets to 8 the while test fails (8 is not less than 8) and that last value is preserved. Note that if the test were instead value <= 8
then we would get one more doubling and the value would reach 16.
Python has two ways of affecting the flow of the for or while loop inside the block. The continue statement means that the rest of the code in the block is skipped for this particular item in the collection, i.e. jump to the next iteration. In this example negative numbers are left out of a summation:
In [ ]:
values = [10, -5, 3, -1, 7]
total = 0
for v in values:
if v < 0:
continue # Skip this iteration
total += v
print(total)
The other way of affecting a loop is with the break statement. In contrast to the continue statement, this immediately causes all looping to finish, and execution is resumed at the next statement after the loop.
In [ ]:
geneticCode = {'TAT': 'Tyrosine', 'TAC': 'Tyrosine',
'CAA': 'Glutamine', 'CAG': 'Glutamine',
'TAG': 'STOP'}
sequence = ['CAG','TAC','CAA','TAG','TAC','CAG','CAA']
for codon in sequence:
if geneticCode[codon] == 'STOP':
break # Quit looping at this point
else:
print(geneticCode[codon])
An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if you delete the current item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if you insert an item in a sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence.
If you would like to iterate over a numeric sequence then this is possible by combining the range()
function and a for loop.
In [ ]:
print(list(range(10)))
print(list(range(5, 10)))
print(list(range(0, 10, 3)))
print(list(range(7, 2, -2)))
Looping through ranges
In [ ]:
for x in range(8):
print(x*x)
In [ ]:
squares = []
for x in range(8):
s = x*x
squares.append(s)
print(squares)
Given a sequence, enumerate()
allows you to iterate over the sequence generating a tuple containing each value along with a corresponding index.
In [ ]:
letters = ['A','C','G','T']
for index, letter in enumerate(letters):
print(index, letter)
In [ ]:
numbered_letters = list(enumerate(letters))
print(numbered_letters)
In [ ]:
city_pops = {
'London': 8200000,
'Cambridge': 130000,
'Edinburgh': 420000,
'Glasgow': 1200000
}
big_cities = []
for city in city_pops:
if city_pops[city] >= 1000000:
big_cities.append(city)
print(big_cities)
In [ ]:
total = 0
for city in city_pops:
total += city_pops[city]
print("total population:", total)
In [ ]:
pops = list(city_pops.values())
print("total population:", sum(pops))
Constructing more complex strings from a mix of variables of different types can be cumbersome, and sometimes you want more control over how values are interpolated into a string. Python provides a powerful mechanism for formatting strings using built-in .format()
function using "replacement fields" surrounded by curly braces {}
which starts with an optional field name followed by a colon :
and finishes with a format specification.
There are lots of these specifiers, but here are 3 useful ones:
d: decimal integer
f: floating point number
s: string
You can specify the number of decimal points to use in a floating point number with, e.g. .2f
to use 2 decimal places or +.2f
to use 2 decimal with always showing its associated sign.
In [ ]:
print('{:.2f}'.format(0.4567))
In [ ]:
geneExpression = {
'Beta-Catenin': 2.5,
'Beta-Actin': 1.7,
'Pax6': 0,
'HoxA2': -3.2
}
for gene in geneExpression:
print('{:s}\t{:+.2f}'.format(gene, geneExpression[gene])) # s is optional
# could also be written using variable names
#print('{gene:s}\t{exp:+.2f}'.format(gene=gene, exp=geneExpression[gene]))
gc
, which we will use to count the number of Gs or Cs in our sequence.gc
variable.Go to our next notebook: python_basic_2_3