In [ ]:
# build a random dna sequence...
from numpy import random
final_sequence_length = eighty
initial_sequence_length = 81
dna_sequence = ''
my_nucleotides = [a,t,g,c]
my_nucleotide_probs = [0.25,0.25,0.25,0.3]
while initial_sequence_length < final_sequence_length:
nucleotide = random.choice(my_nucleotides,p=my_nucleotide_p)
dna_sequence = dna_sequence + nucleotide
initial_sequence_length = initial_sequence_length + 1
print '>random_sequence (length:%d)\n%s' % (len(dna_sequence), dna_sequence)
In [ ]:
my_dictionary = {}
As with lists, a dictionary can be initialized empty, this time with the braces {}
. Dictionaries have some properties in common with lists and string, but there are some key differences. Dictionaries are:
Try printing the following dictionary based on some of the data recorded in a chart we used earlier
Group | Number of Mice | Average Mass(g) | Group Id |
---|---|---|---|
alpha | 3 | 17.0 | CGJ28371 |
beta | 5 | 16.4 | SJW99399 |
gamma | 6 | 17.8 | PWS29382 |
In [ ]:
my_mouse_exp = {'alpha_id':'CGJ28371',
'alpha_avr_mass':17.0,
'alpha_no_mice':'3'}
print my_mouse_exp
Based on the chart above, add the values for Group Id
,Average Mass(g)
, and Number of Mice
for the beta
, an gamma
groups using parallel variable names (e.g. group_id...):
In [ ]:
my_mouse_exp = {'alpha_id':'CGJ28371',
'alpha_avr_mass':17.0,
'alpha_no_mice':'3',}
You can also explicitly add inidividual entries to your dictionary:
In [ ]:
my_mouse_exp['alpha_experimenter'] = 'CGJ'
print my_mouse_exp
You can also use variables and other string slicing methods we used earlier:
In [ ]:
beta_group_id = 'SJW99399'
my_mouse_exp['beta_experimenter'] = beta_group_id[0:3]
print my_mouse_exp
One important property of a dictionary is that you can call entries explicitly (rather than referencing indicies like 0, 1, or 2). First, here is some terminology for a dictionary object:
A dictionary consists of some key (this is the name you choose for your entry) and some value (this is the entry itself). Generally keys are strings, but could be almost anything except a list. A value can be just about anything.
You can call a specific value from a dictionary by giving its key:
In [ ]:
my_mouse_exp['alpha_id']
You can also see a list of the keys a dictionary has:
In [ ]:
my_mouse_exp.keys()
You can also check the values
In [ ]:
my_mouse_exp.values()
In [ ]:
amino_acids = {
'AUA':'I', 'AUC':'I', 'AUU':'I', 'AUG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACU':'T',
'AAC':'N', 'AAU':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGU':'S', 'AGA':'R', 'AGG':'R',
'CUA':'L', 'CUC':'L', 'CUG':'L', 'CUU':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCU':'P',
'CAC':'H', 'CAU':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGU':'R',
'GUA':'V', 'GUC':'V', 'GUG':'V', 'GUU':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCU':'A',
'GAC':'D', 'GAU':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGU':'G',
'UCA':'S', 'UCC':'S', 'UCG':'S', 'UCU':'S',
'UUC':'F', 'UUU':'F', 'UUA':'L', 'UUG':'L',
'UAC':'Y', 'UAU':'Y', 'UAA':'_', 'UAG':'_',
'UGC':'C', 'UGU':'C', 'UGA':'_', 'UGG':'W'
}
In [ ]:
rna = 'AUGCAUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
#This may or may not be helpful, but remeber
#you can itterate over an arbitrary range of elements/numbers
#using the range() function
Does your code work on the following RNA sequence?
In [ ]:
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
Bonus: Can you translate this sequence in all 3 reading frames?
In [ ]:
rna = 'AUGCAAGACAGGGAUCUAUUUACGAUCAGGCAUCGAUCGAUCGAUGCUAGCUAGCGGGAUCGCACGAUACUAGCCCGAUGCUAGCUUUUAUGCUCGUAGCUGCCCGUACGUUAUUUAGCCUGCUGUGCGAAUGCAGCGGCUAGCAGACUGACUGUUAUGCUGGGAUCGUGCCGCUAG'
In [ ]:
dna = 'ACGTCGTTTACGTACGGGAGTCGTACGATCCTCCCGTAGCTCGGGATCGTTTTATCGTAGCGGGAT'
In [ ]:
def print_double():
print "Hello world"
print "Hello world"
print_double()
Two things are happening in this function, let's examine some of its elements:
def
- this special word indicates you are defining a new Python functionfunction_name():
- this is the arbitrary name for your function, followed by parenthesesinstruction_block
- following an ident, this is a block of instructions. Everything included in this function must be indented to this level. There is also one other element
This line is the function call. As long as a function is defined above this call, the function will be run.
fix this code block, and call the function twice:
In [ ]:
print_tripple()
def print_tripple():
print "Hello world"
print "Hello world"
print "Hello world"
In [ ]:
def prints_dna_len():
dna = 'gatgcattatcgtgagc'
prints_dna_len()
print dna
print len(dna)
Variables defined inside the function are local to that function. Conversely, variables defined outside of the function, are global and are defined everywhere in the block of code. This concept is refered to as namespace.
In [ ]:
more_dna = 'aaatcgatttttttt'
def prints_dna_twice():
print more_dna
print more_dna
prints_dna_twice()
In [ ]:
def dna_to_rna():
dna_1 = 'agcttttacgtcgatcctgcta'
rna_1 = dna_1.replace('t','u')
return rna_1
print dna_to_rna()
print type(dna_to_rna())
Challenge: Write some functions to do the following:
Write a function that calculates the GC content of a DNA string
In [ ]:
Write a function that generates a random string of DNA of random a random length
In [ ]:
The parameter can have any name: the name of the parmeter becomes the name of a local variable for used within the function:
In [ ]:
def prints_rna_sequence(rna):
if 't' not in rna:
print rna
else:
print 'this is not rna!'
prints_rna_sequence('agaucgagcuacgua')
prints_rna_sequence('atcgcgcatcgatct')
In the statement above, we tell the function that it should be called with one parameter,
and that parameter should be assigned the value rna
within the function.
In [ ]:
prints_rna_sequence()
In [ ]:
#.find method returns the string index (e.g. string[x]) if the search string is found
# my_string = abc
# my_string.find('a') would have the value 0 (e.g. string[0])
# If there is no match to the search, the .find() function returns -1
def print_dna_and_rna(dna,rna):
if dna.find('t')!= -1:
print 'here is your dna %s' % dna
elif dna.find('u')!= -1:
print 'This is RNA!: %s' % dna
if rna.find('t')!= -1:
print 'This is DNA!: %s' % rna
elif rna.find('u')!= -1:
print 'here is your rna %s' % rna
print_dna_and_rna('agatccgtcg','uagcugacug')
print_dna_and_rna('uagcugacug','agatccgtcg')
Function paramters can also be made optional. To make a paramter optional, you must give it a default value. That value could be the keyword None
, an empty value like ''
, or a default value:
In [ ]:
def print_dna_and_rna(dna, rna='', number_of_times_to_print=1):
if dna.find('t')!= -1:
print 'here is your dna %s \n' % dna * number_of_times_to_print
elif dna.find('u')!= -1:
print 'This is RNA!: %s \n' % dna * number_of_times_to_print
if rna.find('t')!= -1:
print 'This is DNA!: %s \n' % rna * number_of_times_to_print
elif rna.find('u')!= -1:
print 'here is your rna %s \n'% rna * number_of_times_to_print
print_dna_and_rna('agatccgtcg',)
print_dna_and_rna('uagcugacug','agatccgtcg',2)
print_dna_and_rna('agatccgtcg', number_of_times_to_print=6)
Challenge: Write a function that generates a random string of DNA of random a random length: use optional parameters to set the length of the strings and the probabilities of the nuclotides
In [ ]: