Unit 2: Programming Design

Lesson 10: Strings and Biomolecules

Notebook Authors

(fill in your two names here)

Facilitator: (fill in name)
Spokesperson: (fill in name)
Process Analyst: (fill in name)
Quality Control: (fill in name)

If there are only three people in your group, have one person serve as both spokesperson and process analyst for the rest of this activity.

At the end of this Lesson, you will be asked to record how long each Model required for your team. The Facilitator should keep track of time for your team.

Computational Focus: Strings

Model 1: General Sequence Operations

Recall that a string is defined by enclosing characters (letters or numbers) in single (') or double (") quotes, but not mixed quotes. Depending on the application, we can treat the character string as a single entity, or access individual components just like a list using square brackets [ ]. The operator [n:m] returns a segment of a sequence called a slice. Table 1 at the end of this lesson describes several other operations that can be used with any of the Python sequence types including strings and lists.

Let's get started.
Type the following in separate Jupyter code cells:

dna = 'CTGACCACTT'
dna[3]
dna[10]
len(dna)
dna[5:10]
triplet = dna[2:5]
print(triplet)
dna[:6]
dna[6:]
dna2 = dna + triplet
print(dna2)
for base in triplet:
    print(base)

Critical Thinking Questions

1. Consider the notation for creating a slice segment.
1a. Explain what m and n mean when the operator [m:n] is used.

1b. Explain what does it means when m or n are not used in creating a slice, as in [m:] or [:n].

2. Why does the statement dna[5:10] execute successfully, but dna[10] doesn’t?

3. Give two different Python statements to access the first three elements of the string. Explain how/why each works.

4. Write a Python statement to create a new string dna3 = ‘GACT’ using only variables and operations introduced in Model 1.

Model 2: String Methods

Like a list, a Python string has several built-in capabilities (methods) that can be called using the dot notation. A list of string methods is provided in Table 2 at the end of this handout. You can always view the methods available to an object by using the dir (directory) command.

Let's explore some string methods.
Type the following in separate Jupyter code cells:

dna = 'CTGACCACTT'
dna.lower()
print(dna)
lowercase = dna.lower()
print(lowercase)
dna = dna.split('A')
print(dna)
print(dna[0])
dna[0].replace('C','g')
lowercase.replace('c','G')
print(lowercase)
dir(lowercase)

Critical Thinking Questions

5. What is the result of applying the replace method on a string?

6. Does the original string variable change after applying a string method?

7. Describe the similarities and differences between using a Python string method compared to using a list method.

Model 3: Editing a String

Unlike lists, a single element of a string cannot be changed arbitrarily, nor can the number of elements in the string change once it has been created: strings are immutable. (whereas lists are mutable)

Let's explore the immutability of strings.
Type the following in separate Jupyter code cells:

dna1 = 'AAAA'
dna1[0] = 'G'
dna2 = 'GAAAA'
ldna = list(dna1)  # first letter in ldna is lowercase 'L', not number 1
print(ldna)
ldna[0] = 'G'
print(ldna)
dna2 = ""
print(dna2)
dna2 = dna2.join(ldna)
print(dna2)

Critical Thinking Questions

8. How does changing a list to a string differ from changing a string to a list?

9. What might be an advantage of turning a string into a list?

10. Given a DNA sequence as a string, what programming construct (sequential code, branching or looping) would you use to print out the full sequence as individual DNA codons? (hint: review the DNA info from the Pre-Activity)

11. Write pseudocode for a function that takes a DNA sequence string as its parameter and prints the DNA codons, ending at either a stop codon or the end of the DNA sequence string. In this first version, assume it just starts at the 1st base in the sequence. (this is not terribly bioloically accurate, but we're getting going)

12. Copy, paste, and modify the pseudocode from above to only start making codons with the first ATG found in the sequence. The pseudocode should not print anything if no ATG is found, and it does not need to worry about multiple ATG codons since any internal ATG codons just make a Met amino acid.
Show your pseudocode to your instructor before declaring victory.

Temporal Analysis Report

How much time did it require for your team to complete each Model?

Model 1:

Model 2:

Model 3:

Table 1: Common operations with sequences (strings, lists, dictionaries and tuples).
More info can be found at https://docs.python.org/3.6/library/stdtypes.html#typesseq-common
b and c are sequence variables, i, j, k are integers, and x is an element of b

Operation Result
b[i] Indexing
b[i:j] slice of b from i to j
b[i:j:k] slice of b from i to j with step k
b + c concatenation – combines sequences together
b * i repetition – concatenate “b” a repeated number of times
len(b) Length
min(b) return smallest element
max(b) return greatest element
x in b True if an item of b is equal to x, else False
x not in b False if an item of b is equal to x, else True
for x in b: Iteration – repeats following statements once for each item in sequence

Table 2: Common string methods.
More string methods can be found at https://docs.python.org/3.6/library/stdtypes.html#string-methods
The []’s in this table indicate optional parameters.

String Method Description
s.count(sub [,start [, end]]) Counts occurrences of sub between start and end
s.find(sub [, start [, end]]) Finds the first occurrence of sub between start and end. If no occurrence. find returns -1.
s.join(words) Joins the list of words with s as delimiter
s.lower() Returns a lowercase version of s
s.rfind(sub [, start [, end]]) Finds the last occurrence of substring sub between start and end
s.split([sep [, maxsplit]])) Split s into maximal maxsplit words using sep as separator (default whitespace)
s.replace(old, new[, count]) Return a copy of the string with all occurrences of substring old replaced by new.
s.splitlines([keepends]) Return a list of the lines in the string, breaking at line boundaries.

In [ ]: