Unit 2: Programming Design

Lesson 11: More Sequences

Notebook Authors

(fill in your two names here)

Facilitator: (fill in name)
Spokesperson: (fill in name)
Process Analyst: (fill in name)
Quality Control: (fill in name)

If there are only three people in your group, have one person serve as both spokesperson and process analyst for the rest of this activity.

At the end of this Lesson, you will be asked to record how long each Model required for your team. The Facilitator should keep track of time for your team.

Computational Focus: Python Sequences

Model 1: Tuples

Another type of sequence is a tuple, which is an immutable sequence separated by parenthesis rather than square brackets.

Type the following in separate Jupyter code cells:

months = ('January','February','March','April','May','June',
'July','August','September','October','November','December')
print(months[1])
months.append('Undecember')

Table 1: Common operations with sequences (strings, lists, dictionaries and tuples).
More info can be found at https://docs.python.org/3.6/library/stdtypes.html#typesseq-common
b and c are sequence variables, i, j, k are integers, and x is an element of b

Operation Result
b[i] Indexing
b[i:j] slice of b from i to j
b[i:j:k] slice of b from i to j with step k
b + c concatenation – combines sequences together
b * i repetition – concatenate “b” a repeated number of times
len(b) Length
min(b) return smallest element
max(b) return greatest element
x in b True if an item of b is equal to x, else False
x not in b False if an item of b is equal to x, else True
for x in b: Iteration – repeats following statements once for each item in sequence

Critical Thinking Questions

1. Explain why months[1] is February, not January.

2. Give the code to access the last month (December) in the tuple.

3. Explain the reason for the error in the command months.append('Undecember').

4. Experiment with the operations in Table 1 and demonstrate that they all work.

5. The parenthesis are optional when you program with tuples (although they make your code easier to read). Explain how you have actually been using tuples throughout the course to return multiple return values.

Model 2: Dictionaries

Another useful sequence data type built into Python is a dictionary from which you can store and retrieve items. A dictionary can be created by placing a comma-separated list of key:value pairs within {} braces.

Type the following in separate Jupyter code cells:

elements = {'C':'carbon', 'H':'hydrogen', 'O':'oxygen', 'N':'nitrogen'}
elements.keys()
elements.values()
elements.items()
elements["C"]
atom="N"
elements[atom]
elements[C]
elements[1]
elements["oxygen"]
letters = ["C","O","N","H"]
for let in letters:
    print(elements[let])

Critical Thinking Questions

6. List all the keys in the dictionary elements.

7. Consider the output returned by the three dictionary methods (keys(), values(), and items()). What can you conclude about the order of the output of these three methods? Provide justification for your group’s answer.

8. What is the data type of the keys in the dictionary elements?

9. How does accessing a particular “value” of a dictionary differ from accessing a value of a list?

10. Explain the reason for the error in each of the following commands:
10a. elements[C]

10b. elements[1]

10c. elements["oxygen”]

11. Write a Python statement to display the value “nitrogen” by using the appropriate element from the list letters.

12. Recall from Lesson 10 that DNA is composed of two strands of nucleotides, which are referred to as complementary in where the base A bonds only to T, and C bonds only to G. Write Python code to create a dictionary used to access the complimentary bases.

13. If the dictionary assigns two different values to the same key, which value displays?

14a. Write pseudocode for a new function called convert_to_one that takes a single parameter: a list of amino acids, each represented by a three letter string - e.g. ['Phe', 'Ala', 'Ile']. The function should convert this list of three letter amino acid codes to a list of one letter codes - e.g. ['F', 'A', 'I']. All the amino acids and their corresponding one letter codes are given in columns 2 and 3 in Table 1 from Lesson 10.

Your pseudocode should not be Python code yet.

14b. Explain why it is easier for computers to use the single letter amino acid codes.

15. Write pseudocode for a function called reverse_complement that:

  • takes a single string that is a sequence of DNA (5'→3') as its parameter and
  • returns a single string of DNA that is the reverse complement of the input string

Keep reading for more info...

It is particularly important to consider some DNA sequence conventions first. The two complementary strands of DNA each have an orientation described by their chemical structure - 5' has a phosphate group, 3' a hydroxyl. In the sequence below the top strand's orientation starting on the left is described as 5'→3' (top) and the bottom strand starting on the left is 3'→5' (bottom). You may wish to refer back to Figure 2 from Lesson 10 to make sure you understand the problem.

5' TACGTATGATCGGCTATAGCCGATGCATTAGCTAGTGCTGATACTGATCG  3' ("top")   
3' ATGCATACTAGCCGATATCGGCTACGTAATCGATCACGACTATGACTAGC  5' ("bottom")

So, when working with DNA sequences, we use the following conventions:

  • for a double strand (as shown above), the top strand is always 5'→3'
  • for a single strand, it is always written left to right from 5'→3' (first base is most 5')
  • the reverse of a sequence is that sequence rewritten to reverse the strands 3'→5' strand becomes 5'→3' (see example below)
  • the complement of a sequence is the other half of the double helix and but the orientation is now 5'→3' since that is the convention.
  • the reverse complement is just the reverse of the complement strand (and is most commonly used, which is why we want you to convert)

using the top strand above as an example:

reverse: 5' GCTAGTCATAGTCGTGATCGATTACGTAGCCGATATCGGCTAGTATGCAT 3'
complement: 5' ATGCATACTAGCCGATATCGGCTACGTAATCGATCACGACTATGACTAGC 3'
reverse complement: 5' CGATCAGTATCAGCACTAGCTAATGCATCGGCTATAGCCGATCATACGTA 3'

Your pseudocode should not be Python code yet.

Model 3: Modifying Dictionaries

Unlike tuples and strings, but similar to lists, dictionaries are mutable - we can modify a dictionary.

Type the following in separate Jupyter code cells:

len(elements)
elements['B'] = 'Boron'
len(elements)
elements.items()
del(elements['C'])
len(elements)
elements.items()

Critical Thinking Questions

16a. Give code that assigns an existing key in a dictionary to a different value. Run the code and provide results that show that it has worked.

16b. Does the size of the dictionary change afterwards?

17. Compare the syntax for modifying dictionaries to the syntax for modifying other sequences.
17a. Explain the similarities or differences between adding to a dictionary versus adding to lists.

17b. Explain the similarities or differences between deleting from a dictionary versus deleting from lists.

17c. Why can't tuples be added to or deleted from?

18. Consider the sort and append methods for a list.
18a. Explain why these methods do not make sense for dictionaries.

18b. Explain why these methods do not make sense for tuples.

19. We can also create a dictionary from two lists by using the zip method. Type the following into Jupyter cells:

symbols = ['H','He','Li','Be','B','C','N','O','F','Ne']
atomic_weights = [1.0079,4.0026,6.941,9.0122,10.811,12.0107,14.0067,15.9994,18.9984,20.1797]
table = dict(zip(symbols,atomic_weights))
print(table['He'])

19a. What is the length of the two lists?

19b. What is the length of the dictionary?

19c. Why does the order of the two lists matter, since the keys are not stored in the dictionary in the same order as the original lists?

20. Work as a team of 4 to identify and explain at least 2 applications of a Python dictionary in science. Clearly articulate what are they keys and what are the values for each example.

Temporal Analysis Report

How much time did it require for your team to complete each Model?

Model 1:

Model 2:

Model 3: