Dictionaries, lists and looping structures

Overview

Recap:

  • Dictionary

    • Structure for storing values with keys
  • List

    • Structure for storing ordered values
  • Loop (for/while)

    • Allows iterative processing (e.g. line-by-line)

Recap on strings, printing etc...


In [10]:
message = "Hello world"
print ("My message is:", message)


My message is: Hello world

Vanilla use of print


In [37]:
message = "Hello world"
print ("My 1st message is:", message)
print ("My 2nd message is: %s" % message)


My 1st message is: Hello world
My 2nd message is: Hello world

The second print uses a “formatter” (%s, %r and %d) which provide an alternative way to print by replacing the symbol with the desired variable after a %. When printing many variables in a single statement they can be much more efficient.


In [13]:
message = "Hello world"
print ("My 1st message is:", message, end=(". "))
print ("My 2nd message is: %s " % message)


My 1st message is: Hello world. My 2nd message is: Hello world 

Python adds a newline character (\n) after every print statement by default. You can suppress this by adding the argument: end=""

More on strings, printing etc...


In [39]:
Name = "Romissa"
Donut_number = 30
print("%s has %d donuts" % (Name, Donut_number))


Romissa has 30 donuts

%s is intended for string substitution whereas %d is for integers. r% converts using the repr() function rather than str() or int() and is useful when you want to return something in valid Python syntax (see next)


In [40]:
import datetime
date = datetime.date.today()
a = str(date)
b = repr(date)
print(a, end=" ")
print(b)
print("%s %r" % (date, date))


2017-02-09 datetime.date(2017, 2, 9)
2017-02-09 datetime.date(2017, 2, 9)

Don’t worry about datetime, it’s just to illustrate the difference between repr() and str() or %s and %d

Think about why there are three print statements but only two lines of output


In [14]:
seq = input("Enter a DNA sequence:").upper()
print(seq + seq)


Enter a DNA sequence:agtagcagcatcagct
AGTAGCAGCATCAGCTAGTAGCAGCATCAGCT

Recap on maths


In [41]:
x = 5
y = 10
print("Result 1 = " , x * y )
print("Result 2 = " , x ** y ) # ** - meaning “to the power of”
print("Result 3 =  , x ** y  ")


Result 1 =  50
Result 2 =  9765625
Result 3 =  , x ** y  

Small mistakes can be difficult to spot. You will avoid and spot these mistakes with practice.

Recap on lists

  • A list stores an ordered list of values
  • Loop through all elements using for
  • Useful list functions:

    • extend(list) - adds a list to the end of a list

    • append(value) - adds a value to the end of a list

    • Also insert(index, value) and pop(index)

    • sort

      • sorted_names = sorted(names) #using the function
      • names.sort() #using the method

Recap on list methods


In [42]:
values = [5, 7, 4, 6, 1, 2]

print (values)

print(values[3]) #By the position

for temp_value in values: #Using a loop
    print (temp_value)


[5, 7, 4, 6, 1, 2]
6
5
7
4
6
1
2

Two common ways to access elements in an list:

By the position (a.k.a. index) - values(2)

or

Use a loop to grab every value and put it into a temporary variable (here called temp_value

Recap on list methods


In [43]:
names = ["Andy", "Bob", "Chris"]

removed_name = names.pop(0)
print (removed_name)
print(names)


Andy
['Bob', 'Chris']

The opposite of list.pop(index) is list.insert(index,value)

Adding to lists in aloop


In [44]:
numbers = []

for i in range(1,24,3):
    numbers.append(i)

print(numbers)


[1, 4, 7, 10, 13, 16, 19, 22]

A list needs to be created (a.k.a. initialised) before you can apply methods to it.

Initialising the list before running the loop is important. If we had tried to initialise within the loop structure the list would be reset after each iteration of the loop!

Dictionaries

  • Dictionaries are a lookup key-value system:

    • Very similar to a list except the index is replaced by a defined key
    • The keys and values can be anything you like.
    • However, keys must be unique.
  • Simple to declare:

    • A variable: Value = 10
    • A list: Values = [10,11,12,13]
    • A dictionary: Value_lookup = {"Apple" : "Pie", "Banana" : "split", 3 : 60}

You can retrieve elements from a list using their numerical index whereas a dictionary can use any value specified by the programmer.

Declaring a dictionary requires curly braces {} and a colon : between key and value

Dictionary example


In [45]:
student_records = {20071213: "Alistair Darby"} # declaring a dictionary 
student_records[20081423] = "John Smith"   # adding some more records
student_records[20096137] = "Jane Doe"
student_records[20109334] = "Fred Blogs"

print(student_records[20081423])           # printing a specific record


John Smith

Dictionary advanced


In [46]:
student_records = {20071213: ["Alistair Darby", "A.C. Darby"]}
student_records[20081423] = ["John Smith", "J. L. Smith"]
student_records[20096137] = ["Jane Doe", "J. P. Doe"]
student_records[20109334] = ["Fred Blogs","Frederick Blogs","F. J. Blogs"]

print(student_records[20109334])        # print the list for this specific key
for name in student_records[20109334]:  # iterate through the list and print 
    print(name, end=" ")                # each value


['Fred Blogs', 'Frederick Blogs', 'F. J. Blogs']
Fred Blogs Frederick Blogs F. J. Blogs 

Sometimes you’ll find you need a data structure more complex than a dictionary More advanced data structures can be created by combining simpler structures e.g. a dictionary of lists

Codon table

TTT F Phe TTC F Phe TTA L Leu TTG L Leu TCT S Ser TCC S Ser TCA S Ser TCG S Ser TAT Y Tyr TAC Y Tyr TAA * Ter TAG * Ter TGT C Cys TGC C Cys TGA * Ter TGG W Trp CTT L Leu CTC L Leu CTA L Leu CTG L Leu CCT P Pro CCC P Pro CCA P Pro CCG P Pro CAT H His CAC H His CAA Q Gln CAG Q Gln CGT R Arg CGC R Arg CGA R Arg CGG R Arg ATT I Ile ATC I Ile ATA I Ile ATG M Met ACT T Thr ACC T Thr ACA T Thr ACG T Thr AAT N Asn AAC N Asn AAA K Lys AAG K Lys AGT S Ser AGC S Ser AGA R Arg AGG R Arg GTT V Val GTC V Val GTA V Val GTG V Val GCT A Ala GCC A Ala GCA A Ala GCG A Ala GAT D Asp GAC D Asp GAA E Glu GAG E Glu GGT G Gly GGC G Gly GGA G Gly GGG G Gly

In [47]:
codons = {
       "ATT" : "Ile",
       "ATC" : "Ile",
       "ATA" : "Ile",
       "CTT" : "Leu",
       "CTC" : "Leu",
       "CTA" : "Leu",
       "CTG" : "Leu",
     #... And so on...
        }

Codon_1 = "CTC"
print("%s encodes %s" % (Codon_1, codons[Codon_1]))
print(Codon_1, "encodes", codons[Codon_1])


CTC encodes Leu
CTC encodes Leu

The two print statements illustrate alternatives

When using multiple formatters, variables are provided in order within brackets

Be careful with multiple parentheses!

Get dictionary size – keys method


In [6]:
student_records = {20071213: "Andy Jones"}
student_records[20081423] = "John Smith"
student_records[20096137] = "Jane Doe"

print("Total number of students =" , len(student_records.keys()))
print(student_records.keys()) 
list(student_records.keys())


Total number of students = 3
dict_keys([20096137, 20071213, 20081423])
Out[6]:
[20096137, 20071213, 20081423]

The keys() method returns a view of the keys which can be looped through like a list or turned into a list

delete a record from a dictionary


In [1]:
student_records = {20071213: "Andy Jones", 
                   20081423 : "John Smith", 
                   20096137 :"Jane Doe"}

del student_records[20081423]
print("Total number of students = " , len(student_records.keys()))


Total number of students =  2

In [2]:
student_records = {20071213: "Andy Jones", 
                    20081423 : "John Smith", 
                    20096137 :"Jane Doe"}

for name in student_records.keys():
     print(name, ":", student_records[name])


20096137 : Jane Doe
20071213 : Andy Jones
20081423 : John Smith

Here we setup a temporary key (called name) and then loop through every key, retrieving the corresponding value for each key

Loop through a dictionary with for


In [3]:
student_records = {20071213: "Andy Jones", 
                    20081423 : "John Smith", 
                    20096137 :"Jane Doe"}

for name in student_records.keys():
     print(name, ":", student_records[name])


20096137 : Jane Doe
20071213 : Andy Jones
20081423 : John Smith

Here we setup a temporary key (called name) and then loop through every key, retrieving the corresponding value for each key

Manipulating strings

Slicing is very useful for manipulating strings and lists


In [10]:
DNA = "ACTGATCGACTGATCGATCGA"

for index in range(0, len(DNA), 3): # Remember - range requires the arguments (start, stop,  step). Step has a default value of 1 

    chunk = DNA[index:index+3] # Remember - you can slice a string by giving [start:stop] indexes
    print(chunk,end=" ")


ACT GAT CGA CTG ATC GAT CGA 
  • The index method can be used to find the position of specific substrings

In [5]:
DNA = "ACTGATCGACTGATCGATCGA"
print("Index for CGA : ", DNA.index( "CGA" ))
print("Index for GAT : ", DNA.index( "GAT" ))


Index for CGA :  6
Index for GAT :  3

Remember – the numbering starts from 0!

  • index() and slicing can also be used with lists

In [17]:
codons = ["ATG", "GAC", "TTG"]
print("Index for TTG : ", codons.index( "TTG" ))
print("Index for ATG : ", codons.index( "ATG" ))


Index for TTG :  2
Index for ATG :  0

Remember – the numbering starts from 0!

Incrementally adding to a string


In [13]:
import random                        #Import the random module
dna = "GCTAGCTACGTACGATCGT"          #Starting string

for i in range(0,10):                #Loop to 10 starting from 0
    dna += random.choice("CGTA")     #Choose a random base and add it

print(dna)


GCTAGCTACGTACGATCGTGTACGCGTTT

In [14]:
Output will be different every time


  File "<ipython-input-14-bd7dfec14172>", line 1
    Output will be different every time
              ^
SyntaxError: invalid syntax

Don’t worry about the random bit. It’s just an example of some useful code that’s already been written. It’s more important to understand what the rest of the code is doing than how random.choice() works at this stage.

Randomly generated DNA sequence can be very useful – e.g. if you are trying to establish whether a certain pattern occurs more often than expected by chance


In [16]:
s = "-"
seq = ["a", "b", "c"] # This is sequence of strings.
print (seq)
print (s.join( seq ))


('a', 'b', 'c')
a-b-c

Summary

  • Printing with formatters
  • Dictionaries
    • Very useful and flexible structure for storing keys and values
  • index() method and slicing
  • Use of methods inside loops