Day 2: slices, lists, methods, and loops

Review exercises (10 minutes)

a = 1; b = 2; c = 2.0; d = 'abcdefg'; e = '1'
1 / 2
a / b
a / c
float(a) / b
a / 'b'
a + b
d + e
d + 'e'

if 'd' in d:
    print 'found d'
if d in 'd':
    print 'found abcdefg'
else:
    print 'the variable d was not in the string "d"'



In [0]:

Exercises 4: slices (20 minutes).

Run these operations on the variable x which you should set up to hold the string 'UAUCGCUU'

x[0]
x[2:4]
x[4]
x[:]
x[:-3]
x[-2:-1]
x[-5:4]
x[-5:-9]
x[1:]



In [0]:

Try some values of your own to see if you can reconstruct how negative indices and left off start and end arguments work.



In [ ]:

Store the string `'gene_1_UAUCCUA_0.3'` as a variable and write a slice notation to retrieve the third character (the 'two-eth' character). (remember that the first item is item number 0). This should give back `'n'`.



In [0]:

Write a second slice notation for your variable from above that retrieves the substring composed of the second character up until but not including the 5th character (second, third, and fourth items, should give back 'ene').



In [0]:

Write a slice notation that uses negative indices to retrieve the substring composed of the last 3 characters.



In [0]:

Write a slice notation that uses negative indices to retrieve everything except the last 3 characters (everything up until '0.3' including the '_' that precedes it).



In [0]:

Write a slice notation that retrieves the RNA portion of the string and nothing else.



In [ ]:

Exercises 5: lists and splitting things (20-30 minutes)

Assign the string from the last set of exercises to the variable x and then try this command: x2 = x.split('_').



In [0]:

What happens if you print out `x2`? Note the `[` and `]` symbols and the commas, as well as the extra quote markings. These are features denoting that `x2` is a list (that's what the brackets on the end denote) composed of several strings (commas delimit the individual elements of the list). The `split` command that created this list from a string will be covered in part 5.



In [0]:

With both `x` and `x2`, try these commands to compare the slice notation properties of a list with the slice notation properties of a string.

type(x)
type(x2)
x[0]
x2[0]
x[0:2]
x2[0:2]
x[1:4][1:3]
x2[1:4][1:3]
type(x2[1:3])
type(x2[1:3][1])
x2[1:3][1][3]
x[1:3][1][3]



In [0]:

Using what you’ve learned, use slice notation to grab the RNA portion of the original string (like we did in part 6) only this time using the new `list` variable we created (`x2`).



In [0]:

The `split` method can be generalized as follows: `some_list = 'some_string'.split(some_delimiter)`. There are two inputs to this split statement (the string that needs splitting and the delimiter used to split) and one output (the output list). Try the following commands to familiarize yourself with how split goes about splitting up a string into a list.

x = 'abcdefgh'.split('c')
print(x)
print('abcdefgh'.split('r'))
a = 'hi, my name is Bob, this is Una'
print(a.split(','))
z = a.split(' ')
print(z)
print(z[1])
l = a.split() # this is a very useful trait of split, look up what happens when you split with empty parentheses
print(l)
caveman = a.split(' is ') # notice that multiple characters can be used as the delimiter
print(caveman)



In [0]:

Mentally, our eyes split things into lists all the time. We're so used to doing it that it's easy to forget the exact delimiter we're using. Figure out the delimiter that splits these strings, and then use this delimiter to make useful lists that no longer have these delimiters.

'ACCGCGU,LLMNAQR,2.4'
'gene1 gene2 gene3'
'gene1, gene2, gene3'



In [0]:

15 minute break

Exercises 6: more properties of lists and 'methods' like split (20-30 minutes)

Try reassigning the slice notations of lists and strings to other values. For example:

x = [1, 2, 3]
y = 'ABCDE'
x[1] = 'ab'
print(x)
x[1:3] = 'R'
print(x)
y[1] = 'L'
print(y)
y = y[0:1] + 'hello' + y [2:]
print(y)



In [0]:

First, split the following string of DNA:

`'ATGCACTATTGCGTTAACTAGATGGGGCATTTTTAAATGGGACCCTGA'`

into potential open reading frames (ORFs) using the start codon (`'ATG'`) as your delimiter. Next, we need to fix each individual ORF in the list (as each ORF is now missing its start codon). To fix each ORF, replace each element in the list with the start codon plus the element (hint: use the string concatenation operator '`+`').



In [0]:

In general in Python, anything that ends with a `.something` is a __method__ of whatever came before the dot. Methods are a major feature of the Python programming language. We’ve been using the `split` method of strings which operates on a string and returns a list. Usually, these methods use whatever came before the dot as the input item to operate on, whatever is in parentheses as parameters, and return some value which the user can store as a variable. Importantly, these three components can be input in very creative ways so long as they evaluate to have values that the python method knows what to do with (in this case the method requires an input string and uses a second string as a delimiter). Try these bizarre looking exercises to test this.

x = ['abcdefgh', 'cd']
y = x[0].split(x[1])
y = x[0].split(x[0][2])
y = x[0].split(x[0][4])[1]
y = x[0].split(x[0][4])[x[1].index('c')]
y = x[0].split(x[0][4])[1].split('g')



In [1]:

In general, you can make your code compact by putting the content of one operation into the input fields of the next, or readable by storing each step as a variable. Here is an alternate version of the final statement:

{python}
first_string = x[0]
delimiter = x[0][4]
first_list = first_string.split(delimiter)
new_string = first_list[1]
final_list = new_string.split('g')

Write a couple lines of code to go through the following string, look only at the `interesting_genes` portion of the string (with `split`), split this portion of the string by `'gene5'` to only look at the part that comes after `'gene5'` (with a second split), and return the value associated with gene5 (with a third split)

'boring_genes; gene1:2.6, gene2:3.8, interesting_genes; gene4:1.9, gene5:8.2, gene6:9.1'



In [1]:

Repeat part 4, only modify it with an `if` statement so that it would find the gene5 expression level regardless of whether gene5 was in the `interesting_genes` or `boring_genes`, and so that it would report back which set of genes gene5 was in.



In [1]:

Create your own list from scratch using square brackets to denote a list and commas to denote separate elements, and anything you want as the individual elements. Store this list as a variable.



In [1]:

Create a different list and store this as a second variable.



In [1]:

Create a third list, and use the variables you assigned to the previous two lists as the elements of this third list. Save this list to a third variable.



In [2]:

Use slice notation, `print`, and `len` to explore the properties of your new nested list.



In [2]:

Exercises 7: loops and nested lists (20-30 minutes).

Run these loops:

{python}
for hamster_plan in 'ALMJKLKJ':
    print hamster_plan

for horse_vitamin in ['frosted', 'berry', 'cereal']:
    print horse_vitamin



In [2]:

What is being iterated through in the string `'ALMJKLKJ'`? In other words, what is each `hamster_plan`? What is being iterated through in the list `['frosted', 'berry', 'cereal']`? What is each `horse_vitamin`? Do you notice the difference between the type of data retrieved by the `hamster_plan` loop (which is going through a string) and the `horse_vitamin` loop (which is going through a list)? What will `bean_juice` be if you nest the loops as below:

{python}
for horse_vitamin in ['frosted', 'berry', 'cereal']:
    for bean_juice in horse_vitamin:
        print bean_juice



In [2]:

Create a loop that goes through all characters in the string `'ALSQRWQT'` and prints each character.



In [2]:

Make it so that the above loop prints `'found Q'` every time the character is `'Q'`



In [ ]:

Recall from Exercises 2, part 5, that you can redefine a variable to hold new values. Try assigning `x` in the line preceding your loop from above (the part 4 loop) at some initial value of your choosing, and putting `x = x + 1` within the loop. What happens to `x` as you go through the loop? Use this nifty property to print out the letter number where `'Q'` was found whenever `'Q'` is found.



In [2]:

Remember that lists can be nested. Sometimes the structure of the list can become difficult to discern. Find the length of the nested list stored as `x` (below) to figure out how many lists are in `x`. Use a loop to print each of the lists that make up `x`.

x = [[['gene1', 'heart'], ['gene2', 'brain']], [['gene4', 'appendix'], ['gene5', 'stomach'], ['gene6', 'esophagus']]]



In [ ]:

For each of the component lists in `x`, see how many items the component list has (with the `len` function), and make a loop that prints out what those items are.



In [2]:

Continue investigating the nested list. See how many layers of lists you can query about the elements within them using nested loops without getting confused, and use slice notation to experiment with pulling out elements of interest from this nested list (ex. `x[1]` or `x[1][1]`). If you have time, try looping through all elements of the slices instead of the full list. (replacing `x` in the outermost loop with `x[1]`, `x[1][0:2]`, etc.)



In [2]:

Use a "while-loop" to find the character, right after the 10th occurrence of 'A' in the string 'AATTACCGCATTCCACGGGACCTACGAATTATAGTACCTAAA' (complete the statement below)



In [ ]:

    
dna = 'AATTACCGCATTCCACGGGACCTACGAATTATAGTACCTAAA'
i = 0
while i < 10:
    ....
print(dna[i])