Review of String work, and moving on to lists

Let's start off with a small challenge to refresh our skills from yesterday. Below is some broken code/incomplete; complete the challenge by fixing it so that we print generate the double-stranded DNA sequence of the hiv 'nef' gene


In [ ]:
# store the hiv genome as a variable
hiv_genome = uggaagggcuaauucacucccaacgaagacaagauauccuugaucuguggaucuaccacacacaaggcuacuucccugauuagcagaacuacacaccagggccagggaucagauauccacugaccuuuggauggugcuacaagcuaguaccaguugagccagagaaguuagaagaagccaacaaaggagagaacaccagcuuguuacacccugugagccugcauggaauggaugacccggagagagaaguguuagaguggagguuugacagccgccuagcauuucaucacauggcccgagagcugcauccggaguacuucaagaacugcugacaucgagcuugcuacaagggacuuuccgcuggggacuuuccagggaggcguggccugggcgggacuggggaguggcgagcccucagauccugcauauaagcagcugcuuuuugccuguacugggucucucugguuagaccagaucugagccugggagcucucuggcuaacuagggaacccacugcuuaagccucaauaaagcuugccuugagugcuucaaguagugugugcccgucuguugugugacucugguaacuagagaucccucagacccuuuuagucaguguggaaaaucucuagcaguggcgcccgaacagggaccugaaagcgaaagggaaaccagaggagcucucucgacgcaggacucggcuugcugaagcgcgcacggcaagaggcgaggggcggcgacuggugaguacgccaaaaauuuugacuagcggaggcuagaaggagagagaugggugcgagagcgucaguauuaagcgggggagaauuagaucgaugggaaaaaauucgguuaaggccagggggaaagaaaaaauauaaauuaaaacauauaguaugggcaagcagggagcuagaacgauucgcaguuaauccuggccuguuagaaacaucagaaggcuguagacaaauacugggacagcuacaaccaucccuucagacaggaucagaagaacuuagaucauuauauaauacaguagcaacccucuauugugugcaucaaaggauagagauaaaagacaccaaggaagcuuuagacaagauagaggaagagcaaaacaaaaguaagaaaaaagcacagcaagcagcagcugacacaggacacagcaaucaggucagccaaaauuacccuauagugcagaacauccaggggcaaaugguacaucaggccauaucaccuagaacuuuaaaugcauggguaaaaguaguagaagagaaggcuuucagcccagaagugauacccauguuuucagcauuaucagaaggagccaccccacaagauuuaaacaccaugcuaaacacaguggggggacaucaagcagccaugcaaauguuaaaagagaccaucaaugaggaagcugcagaaugggauagagugcauccagugcaugcagggccuauugcaccaggccagaugagagaaccaaggggaagugacauagcaggaacuacuaguacccuucaggaacaaauaggauggaugacaaauaauccaccuaucccaguaggagaaauuuauaaaagauggauaauccugggauuaaauaaaauaguaagaauguauagcccuaccagcauucuggacauaagacaaggaccaaaggaacccuuuagagacuauguagaccgguucuauaaaacucuaagagccgagcaagcuucacaggagguaaaaaauuggaugacagaaaccuuguugguccaaaaugcgaacccagauuguaagacuauuuuaaaagcauugggaccagcggcuacacuagaagaaaugaugacagcaugucagggaguaggaggacccggccauaaggcaagaguuuuggcugaagcaaugagccaaguaacaaauucagcuaccauaaugaugcagagaggcaauuuuaggaaccaaagaaagauuguuaaguguuucaauuguggcaaagaagggcacacagccagaaauugcagggccccuaggaaaaagggcuguuggaaauguggaaaggaaggacaccaaaugaaagauuguacugagagacaggcuaauuuuuuagggaagaucuggccuuccuacaagggaaggccagggaauuuucuucagagcagaccagagccaacagccccaccagaagagagcuucaggucugggguagagacaacaacucccccucagaagcaggagccgauagacaaggaacuguauccuuuaacuucccucaggucacucuuuggcaacgaccccucgucacaauaaagauaggggggcaacuaaaggaagcucuauuagauacaggagcagaugauacaguauuagaagaaaugaguuugccaggaagauggaaaccaaaaaugauagggggaauuggagguuuuaucaaaguaagacaguaugaucagauacucauagaaaucuguggacauaaagcuauagguacaguauuaguaggaccuacaccugucaacauaauuggaagaaaucuguugacucagauugguugcacuuuaaauuuucccauuagcccuauugagacuguaccaguaaaauuaaagccaggaauggauggcccaaaaguuaaacaauggccauugacagaagaaaaaauaaaagcauuaguagaaauuuguacagagauggaaaaggaagggaaaauuucaaaaauugggccugaaaauccauacaauacuccaguauuugccauaaagaaaaaagacaguacuaaauggagaaaauuaguagauuucagagaacuuaauaagagaacucaagacuucugggaaguucaauuaggaauaccacaucccgcaggguuaaaaaagaaaaaaucaguaacaguacuggaugugggugaugcauauuuuucaguucccuuagaugaagacuucaggaaguauacugcauuuaccauaccuaguauaaacaaugagacaccagggauuagauaucaguacaaugugcuuccacagggauggaaaggaucaccagcaauauuccaaaguagcaugacaaaaaucuuagagccuuuuagaaaacaaaauccagacauaguuaucuaucaauacauggaugauuuguauguaggaucugacuuagaaauagggcagcauagaacaaaaauagaggagcugagacaacaucuguugagguggggacuuaccacaccagacaaaaaacaucagaaagaaccuccauuccuuuggauggguuaugaacuccauccugauaaauggacaguacagccuauagugcugccagaaaaagacagcuggacugucaaugacauacagaaguuaguggggaaauugaauugggcaagucagauuuacccagggauuaaaguaaggcaauuauguaaacuccuuagaggaaccaaagcacuaacagaaguaauaccacuaacagaagaagcagagcuagaacuggcagaaaacagagagauucuaaaagaaccaguacauggaguguauuaugacccaucaaaagacuuaauagcagaaauacagaagcaggggcaaggccaauggacauaucaaauuuaucaagagccauuuaaaaaucugaaaacaggaaaauaugcaagaaugaggggugcccacacuaaugauguaaaacaauuaacagaggcagugcaaaaaauaaccacagaaagcauaguaauauggggaaagacuccuaaauuuaaacugcccauacaaaaggaaacaugggaaacaugguggacagaguauuggcaagccaccuggauuccugagugggaguuuguuaauaccccucccuuagugaaauuaugguaccaguuagagaaagaacccauaguaggagcagaaaccuucuauguagauggggcagcuaacagggagacuaaauuaggaaaagcaggauauguuacuaauagaggaagacaaaaaguugucacccuaacugacacaacaaaucagaagacugaguuacaagcaauuuaucuagcuuugcaggauucgggauuagaaguaaacauaguaacagacucacaauaugcauuaggaaucauucaagcacaaccagaucaaagugaaucagaguuagucaaucaaauaauagagcaguuaauaaaaaaggaaaaggucuaucuggcauggguaccagcacacaaaggaauuggaggaaaugaacaaguagauaaauuagucagugcuggaaucaggaaaguacuauuuuuagauggaauagauaaggcccaagaugaacaugagaaauaucacaguaauuggagagcaauggcuagugauuuuaaccugccaccuguaguagcaaaagaaauaguagccagcugugauaaaugucagcuaaaaggagaagccaugcauggacaaguagacuguaguccaggaauauggcaacuagauuguacacauuuagaaggaaaaguuauccugguagcaguucauguagccaguggauauauagaagcagaaguuauuccagcagaaacagggcaggaaacagcauauuuucuuuuaaaauuagcaggaagauggccaguaaaaacaauacauacugacaauggcagcaauuucaccggugcuacgguuagggccgccuguuggugggcgggaaucaagcaggaauuuggaauucccuacaauccccaaagucaaggaguaguagaaucuaugaauaaagaauuaaagaaaauuauaggacagguaagagaucaggcugaacaucuuaagacagcaguacaaauggcaguauucauccacaauuuuaaaagaaaaggggggauugggggguacagugcaggggaaagaauaguagacauaauagcaacagacauacaaacuaaagaauuacaaaaacaaauuacaaaaauucaaaauuuucggguuuauuacagggacagcagaaauccacuuuggaaaggaccagcaaagcuccucuggaaaggugaaggggcaguaguaauacaagauaauagugacauaaaaguagugccaagaagaaaagcaaagaucauuagggauuauggaaaacagauggcaggugaugauuguguggcaaguagacaggaugaggauuagaacauggaaaaguuuaguaaaacaccauauguauguuucagggaaagcuaggggaugguuuuauagacaucacuaugaaagcccucauccaagaauaaguucagaaguacacaucccacuaggggaugcuagauugguaauaacaacauauuggggucugcauacaggagaaagagacuggcauuugggucagggagucuccauagaauggaggaaaaagagauauagcacacaaguagacccugaacuagcagaccaacuaauucaucuguauuacuuugacuguuuuucagacucugcuauaagaaaggccuuauuaggacacauaguuagcccuaggugugaauaucaagcaggacauaacaagguaggaucucuacaauacuuggcacuagcagcauuaauaacaccaaaaaagauaaagccaccuuugccuaguguuacgaaacugacagaggauagauggaacaagccccagaagaccaagggccacagagggagccacacaaugaauggacacuagagcuuuuagaggagcuuaagaaugaagcuguuagacauuuuccuaggauuuggcuccauggcuuagggcaacauaucuaugaaacuuauggggauacuugggcaggaguggaagccauaauaagaauucugcaacaacugcuguuuauccauuuucagaauugggugucgacauagcagaauaggcguuacucgacagaggagagcaagaaauggagccaguagauccuagacuagagcccuggaagcauccaggaagucagccuaaaacugcuuguaccaauugcuauuguaaaaaguguugcuuucauugccaaguuuguuucauaacaaaagccuuaggcaucuccuauggcaggaagaagcggagacagcgacgaagagcucaucagaacagucagacucaucaagcuucucuaucaaagcaguaaguaguacauguaacgcaaccuauaccaauaguagcaauaguagcauuaguaguagcaauaauaauagcaauaguugugugguccauaguaaucauagaauauaggaaaauauuaagacaaagaaaaauagacagguuaauugauagacuaauagaaagagcagaagacaguggcaaugagagugaaggagaaauaucagcacuuguggagauggggguggagauggggcaccaugcuccuugggauguugaugaucuguagugcuacagaaaaauugugggucacagucuauuaugggguaccuguguggaaggaagcaaccaccacucuauuuugugcaucagaugcuaaagcauaugauacagagguacauaauguuugggccacacaugccuguguacccacagaccccaacccacaagaaguaguauugguaaaugugacagaaaauuuuaacauguggaaaaaugacaugguagaacagaugcaugaggauauaaucaguuuaugggaucaaagccuaaagccauguguaaaauuaaccccacucuguguuaguuuaaagugcacugauuugaagaaugauacuaauaccaauaguaguagcgggagaaugauaauggagaaaggagagauaaaaaacugcucuuucaauaucagcacaagcauaagagguaaggugcagaaagaauaugcauuuuuuuauaaacuugauauaauaccaauagauaaugauacuaccagcuauaaguugacaaguuguaacaccucagucauuacacaggccuguccaaagguauccuuugagccaauucccauacauuauugugccccggcugguuuugcgauucuaaaauguaauaauaagacguucaauggaacaggaccauguacaaaugucagcacaguacaauguacacauggaauuaggccaguaguaucaacucaacugcuguuaaauggcagucuagcagaagaagagguaguaauuagaucugucaauuucacggacaaugcuaaaaccauaauaguacagcugaacacaucuguagaaauuaauuguacaagacccaacaacaauacaagaaaaagaauccguauccagagaggaccagggagagcauuuguuacaauaggaaaaauaggaaauaugagacaagcacauuguaacauuaguagagcaaaauggaauaacacuuuaaaacagauagcuagcaaauuaagagaacaauuuggaaauaauaaaacaauaaucuuuaagcaauccucaggaggggacccagaaauuguaacgcacaguuuuaauuguggaggggaauuuuucuacuguaauucaacacaacuguuuaauaguacuugguuuaauaguacuuggaguacugaagggucaaauaacacugaaggaagugacacaaucacccucccaugcagaauaaaacaaauuauaaacauguggcagaaaguaggaaaagcaauguaugccccucccaucaguggacaaauuagauguucaucaaauauuacagggcugcuauuaacaagagauggugguaauagcaacaaugaguccgagaucuucagaccuggaggaggagauaugagggacaauuggagaagugaauuauauaaauauaaaguaguaaaaauugaaccauuaggaguagcacccaccaaggcaaagagaagaguggugcagagagaaaaaagagcagugggaauaggagcuuuguuccuuggguucuugggagcagcaggaagcacuaugggcgcagccucaaugacgcugacgguacaggccagacaauuauugucugguauagugcagcagcagaacaauuugcugagggcuauugaggcgcaacagcaucuguugcaacucacagucuggggcaucaagcagcuccaggcaagaauccuggcuguggaaagauaccuaaaggaucaacagcuccuggggauuugggguugcucuggaaaacucauuugcaccacugcugugccuuggaaugcuaguuggaguaauaaaucucuggaacagauuuggaaucacacgaccuggauggagugggacagagaaauuaacaauuacacaagcuuaauacacuccuuaauugaagaaucgcaaaaccagcaagaaaagaaugaacaagaauuauuggaauuagauaaaugggcaaguuuguggaauugguuuaacauaacaaauuggcugugguauauaaaauuauucauaaugauaguaggaggcuugguagguuuaagaauaguuuuugcuguacuuucuauagugaauagaguuaggcagggauauucaccauuaucguuucagacccaccucccaaccccgaggggacccgacaggcccgaaggaauagaagaagaagguggagagagagacagagacagauccauucgauuagugaacggauccuuggcacuuaucugggacgaucugcggagccugugccucuucagcuaccaccgcuugagagacuuacucuugauuguaacgaggauuguggaacuucugggacgcagggggugggaagcccucaaauauugguggaaucuccuacaguauuggagucaggaacuaaagaauagugcuguuagcuugcucaaugccacagccauagcaguagcugaggggacagauaggguuauagaaguaguacaaggagcuuguagagcuauucgccacauaccuagaagaauaagacagggcuuggaaaggauuuugcuauaagauggguggcaaguggucaaaaaguagugugauuggauggccuacuguaagggaaagaaugagacgagcugagccagcagcagauagggugggagcagcaucucgagaccuggaaaaacauggagcaaucacaaguagcaauacagcagcuaccaaugcugcuugugccuggcuagaagcacaagaggaggaggagguggguuuuccagucacaccucagguaccuuuaagaccaaugacuuacaaggcagcuguagaucuuagccacuuuuuaaaagaaaaggggggacuggaagggcuaauucacucccaaagaagacaagauauccuugaucuguggaucuaccacacacaaggcuacuucccugauuagcagaacuacacaccagggccaggggucagauauccacugaccuuuggauggugcuacaagcuaguaccaguugagccagauaagauagaagaggccaauaaaggagagaacaccagcuuguuacacccugugagccugcaugggauggaugacccggagagagaaguguuagaguggagguuugacagccgccuagcauuucaucacguggcccgagagcugcauccggaguacuucaagaacugcugacaucgagcuugcuacaagggacuuuccgcuggggacuuuccagggaggcguggccugggcgggacuggggaguggcgagcccucagauccugcauauaagcagcugcuuuuugccuguacugggucucucugguuagaccagaucugagccugggagcucucuggcuaacuagggaacccacugcuuaagccucaauaaagcuugccuugagugcuucaaguagugugugcccgucuguugugugacucugguaacuagagaucccucagacccuuuuagucaguguggaaaaucucuagca'

#translate hiv DNA to RNA

hiv_genome = hiv_genome.rep('u', t)

# isolate the nef gene (start:8797, end:9417)

nef_gene = hiv_genome[8797]

# the nef gene as a fasta file using the header 'nef type 1 (HXB2)'

fasta_header = '>nef type 1 (HXB2)'
print fasta_heade, nef_gene

#caculate and report the GC content of the nef gene
nef_gc_content = (nef_gene.count('c') + nef_gene.count('g')) / len(nef_gene)
print "The GC content of the nef gene is: ", nef_gc_content * 100, "%"

Introducing lists

Now that we have played a bit with strings, its time to introduce the next variable type. So far we have worked with several types of variables and data including:

  • integers
  • floats
  • strings

The next data type is a list. Lists are just what you would expect, a collection. Lists have a few special properties we'll need to understand, lists are:

  • ordered
  • indexed
  • itterable

Let's explore these properties by creating our on list, which in Python is done using the [] brackets.


In [ ]:
my_list = []

Perhaps it seems nothing much has happened, but you should be able to verify that Python thinks that my_list is a list; please try:


In [ ]:
type(my_list)

So far, we have created [] - the empty list, and assigned it the name my list. We can start adding thing to my_list using the .append method. For example:


In [ ]:
my_list =[]
# We can add a string
my_list.append('gag')
print my_list
# We can add another string
my_list.append('pol')
print my_list
# We can yet add another string - please add the string 'env'



# We can also declare lists by naming all its members

my_other_list = ['DNA',
                 'mRNA',
                 'Protein',]
print my_other_list

A list, maintains the order of every element of that list. Lists are indexed (starting at 0) in a way that was similar to strings.

Index List Element
0 'gag'
1 'pol'
2 'env'

In [ ]:
# Print the list of these HIV genes in order given the list below
# The correct order is 
# gag, pol, vif, vpr, vpu, env, nef

hiv_gene_names = ['env',
                  'gag',
                  'vif',
                  'pol',
                  'vpr',
                  'vpu',
                  'nef']

Itteration and 'for' loops

This topic is important enough to get its own section! Not only are we going to talk about itteration, but we are going to introduce a very important concept in computing - a loop. In a loop, we are able to get the computer to repeat a set of instructions without us having to write out every command. This is at the heart of what makes computers useful - being able to carry out repetitive tasks without our input.

Let's look at our first for loop; to start we will use a list of nucleic acids:


In [ ]:
nucleic_acids = ['adenine',
                 'thymine',
                 'cytosine',
                 'guanine',
                 'uracil']
print nucleic_acids

If we wanted to, we could print the items in this list one by one using several print statements


In [ ]:
print nucleic_acids[0]
print nucleic_acids[1]
print nucleic_acids[2]
print nucleic_acids[3]
print nucleic_acids[4]

In [ ]:
#Alternatively, we can do this using a for loop:

for nucleotide in nucleic_acids:
    print nucleotide

A for loop has the following structure:

for temporary_variable in itterable :

(indent)instruction[temporary_variable]

Let's break this down a bit...

  • for - a for loop must start with a for statement
  • temporary_variable - the next character(s) right after the for are actually the name of a special, variable. This variable is a placeholder for the objects that will come next in the loop.
  • in - this in must be included and tells Python what itterable it should execute the for loop on
  • itterable: The itterable is any ordered collection (such as a string or a list. A : must come after the interable.
  • (indent) - the next line of a for loop must always be indented. The best practice is to use 4 spaces (not the tab key)
  • instruction - these are the instructions you want Python to execute. If your instructions make use of the variable (they don't have to) you will use temporary_variable (whatever you have named it)

In [ ]:
# Try the following with for loops

nucleic_acids = ['adenine',
                 'thymine',
                 'cytosine',
                 'guanine',
                 'uracil']


# Write a for loop that prints the names of the nucleotides

# Write a for loop that prints 'nucleotide!' for each of the nucleotides

# Write a for loop prints nucleotide name 

# and its one-letter abbreviation

Conditionals

One of the key functionalities in computing is the ability to make comparisons and choices. In Python, we have several ways to use this. In each case, the answer to a conditional statement is a simple binary result: True or False. Run the following cells and also make some changes to see that you understand how Python is evaluating the statement.


In [ ]:
1 > 0

How about 1 > 0 + 1 ?


In [ ]:

How about 99 >= 99 ?


In [ ]:

What about 0 <= 1 ?


In [ ]:

And try 1 == 1


In [ ]:

The conditionals above all use the comparison operators, a more complete list is as follows:

Operator Description
== Comaprison - True if both operands are equal
!= Not equal - True if both operands are not equal
<> Not equal - True if both operands are not equal
> Greater than - True if left operand is greater than right
< Less than - True if left operand is less than right
>= Less than or equal to - True if left operand is less than or equal to right
<= Greater than or equal to - True if left operand is greater than or equal to right

Random number and conditionals - Heads or Tails

Now, let's combine randomness with our conditional operators to make a simple simulation: flipping a coin.

Python has a Module call NumPy. NumPy contains a number of useful functions including the ability to generate 'random' numbers. Generating a truely random number is a science in itelf, but the NumPy random module will be sufficent for our purpose. See how we use this function in the next cell:


In [ ]:
# Using the from xxx import xxx statement, we tell Python we want to use a package that 
# is not part of the default set of Python packages
# NumPy happens to be installed already for us, otherwise we would have to download it

from numpy import random

# We create a variable and then use the . notation to get the random number
# in this case, we are requesting a random int that is between 1 and 10

my_random_int = random.randint(1,10)

print 'My random int is %d' % my_random_int

# rerun this cell a few times to see that you get only number 1-9

Notice a new feature in the printing statement. We havent used it before, but this string formatting feature allows us to print a variable in a string without using a variable just put %d in the string where you want an integer to appear, then after closing the string, put another % sign followed by the variable name.

You can also generate floats:


In [ ]:
# returns a float between 0.0 and 1.0)
my_random_float = random.ranf()

print 'My random float is %f' % my_random_float

# You can also control precision of the float
print 'My random float is %0.3f to 3 digits' % my_random_float
print 'My random float is %0.9f to 9 digits' % my_random_float
print 'My random float is %0.30f to 30 digits' % my_random_float

# You can do this multiple times in the same string
print 'My random float is %0.3f or %0.9f' % (my_random_float, my_random_float)

if else statements

We are now ready to combine the conditions and random number generator to do our first simulation. To do so we will need to make an if else statement:


In [ ]:
if 1 == 1:
    print '1 is equal to 1'

The if statement uses the following pattern:

if conditional_to_evaluate:

(Indent) instruction

  • if - if statements begin with an if
  • conditional_to_evaluate - this is some conditional statemnt that Python will evaluate as True or False. This statement will be followed by a :
  • (indent) - the next line of a for loop must always be indented. The best practice is to use 4 spaces (not the tab key)
  • instruction - these are the instructions you want Python to execute. The instructions will also be executed iff the conditional statement is True

Write a few conditional statements and see what happens when the statement is True or False


In [ ]:


In [ ]:


In [ ]:

We can supplement the if statement by telling Python what to do if the conditional is false, using the else statement:


In [ ]:
if 1 == 2:
    print 'one is now equal to two'
else:
    print 'one is NOT equal to two'

Remembering that indenting is important, try writing a few if else statements yourself:


In [ ]:


In [ ]:


In [ ]:

As powerful as if/else statements can be, we sometimes wish to let Python explore several contigencies. We do this using elif (else if) which allows us to use another if statement iff the preceding if statement is False. Complete the next two cells to see an example:


In [ ]:
# What day is today, enter this as a string below

today = ''

# Things to do

if today == 'Monday':
    print 'Walk the dog'
elif today == 'Tuesday':
    print 'Pick up the laundry'
elif today == 'Wednesday':
    print 'Go shopping'
elif today == 'Tuesday':
    print 'Call mom'
elif today == 'Tuesday':
    print 'Plan for the weekend'
else:
    print 'It must be the weekend, nothing to do'

To recap: The above if/else statement covered several explicit contigencies (If the day of the week was Monday-Friday) as one as a final contigency if none of the above were True (the final else statement). Write a statement below using the if/elif/else chain of conditionals. Remember to pay attention to indenting.

Putting it all together

Using what you have learned so far, write some code to simulate flipping a coin.


In [ ]:
# Use the random number function of NumPy to generate a float

# If the float is greater than or equal to 0.5 consider that 'Heads' otherwise 'Tails'

# work with your partner to write out the steps in pseudocode first, then use this cell to try.

Simulating mutation of the HIV genome

Mutations are (at least in part) a random process that drives the change of a genome. Virus in particular use this to their advantage. Mutations in viruses can allow them to evade their hosts immume responses, concur drug resistance, or even the acquisition of new functions.

According to Abrahm et.al. 2010 the mutation rate for the HIV-1 genome is about 4.4E-05 or 0.000044 mutations per single cell infection cycle. The most common mutation type are single nulecotide polymorphisims SNPs.

In our toy simulation we will use Python to simulate the following:

  • flip a coin weighted to the probability of the HIV-1 mutation (genome size * mutation rate)
  • Choose a random nuclotide in the HIV-1 genome to mutate (using the .randint() method)
  • flip a weightd coin to choose what type of mutation the mutation should be (using the following information, and assuming the genome size is 9181 nuclotides)

Here are some code examples that will help


In [ ]:
# unfair coin

from numpy import random

# Coins have two sides (states) - heads or tails; use these as a list
coin_state = ['Heads','Tails']
              
# A fair coin would have a 50/50 chance of being heads or tails. Represent these probabilities as
# floats which sum to 1.0
fair_coin_probabilities = [0.5,0.5]

#flip the fair coin using numpy's random.choice method
fair_flip = random.choice(coin_state,p = fair_coin_probabilities)

#print the result
print "My fair coin is %s" %fair_flip

# An unfair coin could be weighted like this
unfair_coin_probabilities = [0.1,0.9]

# Therefore...
unfair_flip = random.choice(coin_state,p = unfair_coin_probabilities)
print "My unfair coin is %s" %unfair_flip

1. Write the simulation which determine if in one round of replication HIV will mutate or not


In [ ]:
# Set the states (mutation,no_mutation)

# Set the probabilities

# flip the coin (make the choice)

2. Determine how often would HIV mutate in 20 rounds of replication

We will use a for loop to repeat the coin flip 20 times. We can use a special function range() to tell Python how many times to execute the for loop. Use the following coin flipping example, to improve your HIV simulation.


In [ ]:
from numpy import random
coin_state = ['Heads','Tails']
fair_coin_probabilities = [0.5,0.5]

for flip in range(1,21):
    fair_flip = random.choice(coin_state,p = fair_coin_probabilities)
    print fair_flip

You can take this even further by saving the result as a list:


In [ ]:
from numpy import random
coin_state = ['Heads','Tails']
fair_coin_probabilities = [0.5,0.5]

# tip: notice how the list is created before the for loop. If you declared 
# flip_results = [] in the for loop, it would be reset 20 times

flip_results = []

for flip in range(1,21):
    fair_flip = random.choice(coin_state,p = fair_coin_probabilities)
    flip_results.append(fair_flip)

Dont' forget you can print the result to see the list:


In [ ]:
print flip_results

3. If HIV is in the mutation state, determine which nuclotide to mutate

Let's use our coin to determine if I should walk the dog on Monday or Tuesday:


In [ ]:
from numpy import random
coin_state = ['Heads','Tails']
fair_coin_probabilities = [0.5,0.5]
flip_results = []

for flip in range(1,21):
    fair_flip = random.choice(coin_state,p = fair_coin_probabilities)
    flip_results.append(fair_flip)

# Tip - pay attention to the indenting in this for loop that contains an if/else statement
for result in flip_results:
    if result == 'Heads':
        print "Walk the dog Monday"
    elif result == 'Tails':
        print "Walk the dog Tuesday"

Besides using the print instruction I can also place my results into a new list based on the conditional outcome:


In [ ]:
from numpy import random
coin_state = ['Heads','Tails']
fair_coin_probabilities = [0.5,0.5]
flip_results = []

# Initialize some new lists for my conditional outcomes

monday_results = []
tuesday_results = []

for flip in range(1,21):
    fair_flip = random.choice(coin_state,p = fair_coin_probabilities)
    flip_results.append(fair_flip)
    
for result in flip_results:
    if result == 'Heads':
        monday_results.append("Walk the dog Monday")
    elif result == 'Tails':
        tuesday_results.append("Walk the dog Tuesday")

        
# We can print how many times we had each type of result stored in our lists

print "My coin said to walk the dog Monday %d times" % len(monday_results)
print "My coin said to walk the dog Tuesday %d times" % len(tuesday_results)

Using the above examples, and your knowledge of how to slice strings to:

  • determine which nuclotide in the HIV-1 genome to mutate
  • flip a coin weighted to the probabilities of mutation given in the 'Class 1: single nt substitution' chart above. In each the number of observed mutations of a nuclotide on the y-axis changing to one on the x-axis is shown.
  • use the replace() function to mutate your HIV-1 genome

Bonus

  • determine and report in which gene your mutations arrise (ignore genes less than 200nt)
  • determine and report if the mutation in any particular gene introduces a stop codon in reading frame one
  • determine and report if the mutation in any particular gene introduces a stop codon in the actual reading frame of that gene

In [ ]: