Review of lists, loops, and more...

Here is another broken piece of code. Using what you learned from yesterday's lessons

  • fix what is broken
  • make comments to explain what is going on line-by-line (talk to the duck)
  • if you have time make some improvements!

In [ ]:
# build a random dna sequence...

from numpy import random
final_sequence_length = eighty
initial_sequence_length = 81
dna_sequence = ''
my_nucleotides = [a,t,g,c]
my_nucleotide_probs = [0.25,0.25,0.25,0.3]
while initial_sequence_length < final_sequence_length:
nucleotide = random.choice(my_nucleotides,p=my_nucleotide_p)
dna_sequence =  dna_sequence + nucleotide
initial_sequence_length = initial_sequence_length + 1
print '>random_sequence (length:%d)\n%s' % (len(dna_sequence), dna_sequence)

Moving on to dictionaries

We are just about done with the data structures we will use in this Python course. There are other Python data structures, and many more concepts, but for now we will consider dictonaries:

Check the type of my_dictionary in the cell below


In [ ]:
my_dictionary = {}

As with lists, a dictionary can be initialized empty, this time with the braces {}. Dictionaries have some properties in common with lists and string, but there are some key differences. Dictionaries are:

  • itterable
  • unordered
  • indexed (by keys)

Try printing the following dictionary based on some of the data recorded in a chart we used earlier

Group Number of Mice Average Mass(g) Group Id
alpha 3 17.0 CGJ28371
beta 5 16.4 SJW99399
gamma 6 17.8 PWS29382

In [ ]:
my_mouse_exp = {'alpha_id':'CGJ28371',
                'alpha_avr_mass':17.0,
                'alpha_no_mice':'3'}

print my_mouse_exp

Based on the chart above, add the values for Group Id,Average Mass(g), and Number of Mice for the beta, an gamma groups using parallel variable names (e.g. group_id...):


In [ ]:
my_mouse_exp = {'alpha_id':'CGJ28371',
                'alpha_avr_mass':17.0,
                'alpha_no_mice':'3',}

You can also explicitly add inidividual entries to your dictionary:


In [ ]:
my_mouse_exp['alpha_experimenter'] = 'CGJ'
print my_mouse_exp

You can also use variables and other string slicing methods we used earlier:


In [ ]:
beta_group_id = 'SJW99399'

my_mouse_exp['beta_experimenter'] = beta_group_id[0:3]
print my_mouse_exp

One important property of a dictionary is that you can call entries explicitly (rather than referencing indicies like 0, 1, or 2). First, here is some terminology for a dictionary object:

dictionary = { key:value }

A dictionary consists of some key (this is the name you choose for your entry) and some value (this is the entry itself). Generally keys are strings, but could be almost anything except a list. A value can be just about anything.

You can call a specific value from a dictionary by giving its key:


In [ ]:
my_mouse_exp['alpha_id']

You can also see a list of the keys a dictionary has:


In [ ]:
my_mouse_exp.keys()

You can also check the values


In [ ]:
my_mouse_exp.values()

Translating RNA > Protein

RNA codons are translated into aminio acids according to a standard genetic code (see chart below); amino acids are represented here by their one-letter abbreviations.


In [ ]:
amino_acids = {
    'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
    'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
    'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
    'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
    'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
    'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
    'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
    'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
    'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
    'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
    'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
    'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
    'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
    'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
    'UAC':'Y', 'UAU':'Y', 'UAA':'_', 'UAG':'_',
    'UGC':'C', 'UGU':'C', 'UGA':'_', 'UGG':'W'
}

Using what you have learned so far

Write the appropriate code to translate an RNA string to a protein sequence:


In [ ]:
rna = 'AUGCAUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'


#This may or may not be helpful, but remeber
#you can itterate over an arbitrary range of elements/numbers
#using the range() function

Does your code work on the following RNA sequence?


In [ ]:
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'

Bonus: Can you translate this sequence in all 3 reading frames?


In [ ]:
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'

Translating DNA to RNA TO PROTEIN

Now that we have done each of the major biological sequences, you should be able to translate a DNA sequence into an RNA, and a protein


In [ ]:
dna = 'ACGTCGTTTACGTACGGGAGTCGTACGATCCTCCCGTAGCTCGGGATCGTTTTATCGTAGCGGGAT'

FUNctions

Now that we have learned how to do several thing together, lets wap them into a function. We have already been using several functions so we can write our own:


In [ ]:
def print_double():
    print "Hello world"
    print "Hello world"
    
print_double()

Two things are happening in this function, let's examine some of its elements:

def function_name( ):
### (indent) instruction_block

  • def - this special word indicates you are defining a new Python function
  • function_name(): - this is the arbitrary name for your function, followed by parentheses
  • instruction_block - following an ident, this is a block of instructions. Everything included in this function must be indented to this level.

There is also one other element

function_name( ):

This line is the function call. As long as a function is defined above this call, the function will be run.

fix this code block, and call the function twice:


In [ ]:
print_tripple()

def print_tripple():
    print "Hello world"
    print "Hello world"
    print "Hello world"

Local vs global

One other important element of functions is that variables included in the function, are not defined outside of the function:


In [ ]:
def prints_dna_len():
    dna = 'gatgcattatcgtgagc'
    

prints_dna_len()
  
print dna
print len(dna)

Variables defined inside the function are local to that function. Conversely, variables defined outside of the function, are global and are defined everywhere in the block of code. This concept is refered to as namespace.


In [ ]:
more_dna = 'aaatcgatttttttt'

def prints_dna_twice():
    print more_dna
    print more_dna
    
prints_dna_twice()

Returning values

Functions can also themslves return a value for use in other parts of your code. In this case the return keyword explicitly returns the value rna_1.


In [ ]:
def dna_to_rna():
    dna_1 = 'agcttttacgtcgatcctgcta'
    rna_1 = dna_1.replace('t','u')
    return rna_1


print dna_to_rna()
print type(dna_to_rna())

Challenge: Write some functions to do the following:

Write a function that calculates the GC content of a DNA string


In [ ]:

Write a function that generates a random string of DNA of random a random length


In [ ]:

Parameters

Functions can also accept one or more parameters, we could expand our definition like this:

def function_name(parameter1, parameter2, parameterN):
### (indent) instruction_block

The parameter can have any name: the name of the parmeter becomes the name of a local variable for used within the function:


In [ ]:
def prints_rna_sequence(rna):
    if 't' not in rna:
        print rna
    else:
        print 'this is not rna!'
    
prints_rna_sequence('agaucgagcuacgua')
prints_rna_sequence('atcgcgcatcgatct')

In the statement above, we tell the function that it should be called with one parameter, and that parameter should be assigned the value rna within the function.


In [ ]:
prints_rna_sequence()

In [ ]:
#.find method returns the string index (e.g. string[x]) if the search string is found
# my_string = abc
# my_string.find('a') would have the value 0 (e.g. string[0])
# If there is no match to the search, the .find() function returns -1


def print_dna_and_rna(dna,rna):
    if dna.find('t')!= -1:
        print 'here is your dna %s' % dna
    elif dna.find('u')!= -1:
        print 'This is RNA!: %s' % dna
    if rna.find('t')!= -1:
        print 'This is DNA!: %s' % rna
    elif rna.find('u')!= -1:
        print 'here is your rna %s' % rna


print_dna_and_rna('agatccgtcg','uagcugacug')
print_dna_and_rna('uagcugacug','agatccgtcg')

Function paramters can also be made optional. To make a paramter optional, you must give it a default value. That value could be the keyword None, an empty value like '', or a default value:


In [ ]:
def print_dna_and_rna(dna, rna='', number_of_times_to_print=1):
    if dna.find('t')!= -1:
        print 'here is your dna %s \n' % dna * number_of_times_to_print 
    elif dna.find('u')!= -1:
        print 'This is RNA!: %s \n' % dna * number_of_times_to_print 
    if rna.find('t')!= -1:
        print 'This is DNA!: %s \n' % rna * number_of_times_to_print 
    elif rna.find('u')!= -1:
        print 'here is your rna %s \n'% rna * number_of_times_to_print 

print_dna_and_rna('agatccgtcg',)
print_dna_and_rna('uagcugacug','agatccgtcg',2)
print_dna_and_rna('agatccgtcg', number_of_times_to_print=6)

Challenge: Write a function that generates a random string of DNA of random a random length: use optional parameters to set the length of the strings and the probabilities of the nuclotides


In [ ]: