Programming Bootcamp 2016

Lesson 3 Exercises - ANSWERS



1. Guess the output: loop practice (1pt)

For the following blocks of code, first try to guess what the output will be, and then run the code yourself. Points will be given for filling in the guesses; guessing wrong won't be penalized.


In [1]:
for i in range(1, 10, 2):
    print i


1
3
5
7
9

In [2]:
for i in range (5, 1, -1):
    print i


5
4
3
2

In [3]:
count = 0
while (count < 5):
    print count
    count = count + 1


0
1
2
3
4

In [4]:
total = 0
for i in range(4):
    total = total + i
print total


6

In [5]:
name = "Mits"
for i in name:
    print i


M
i
t
s

In [6]:
name = "Wilfred"
newName = ""
for letter in name:
    newName = newName + letter
print newName


Wilfred

In [7]:
name = "Wilfred"
newName = ""
for letter in name:
    newName = letter + newName
print newName


derfliW

This ends up reversed because we did

newName = letter + newName

instead of

newName = newName + letter

So we're basically adding each new letter to the beginning of what's already in newName.


In [8]:
seq = "AGCTGATGC"
count = 0
for letter in seq:
    count = count + 1
print count


9

In [9]:
seq = "AGCTGATGC"
count = 0
for letter in seq:
    if letter == "T":
        count = count + 1
print count


2

2. Spot the endless loop (1pt)

For the following examples, first guess whether or not the loop will be endless. Then run the code to find out (if it doesn't stop within a few seconds, you can assume it's endless).

NOTE: If you hit an endless loop, you will not be able to run anything else until you stop it! Use the kernel interrupt button (square button up top) to stop the execution of the loop.


In [ ]:
count = 0
while count < 5:
    print count
print "Done"

Endless loop or not? Yes


In [11]:
count = 0
while count > 0:
    print count
print "Done"


Done

Endless loop or not? No - it never even enters the loop!


In [ ]:
count = 0
while count < 10:
    print count
count = count + 1
print "Done"

Endless loop or not? Yes - we didn't increment count inside the loop


In [10]:
count = 10
while count > 0:
    count = count - 1
print "Done"


Done

Endless loop or not? No


In [ ]:
a = True
count = 0
while a:
    count = count + 1
print "Done"

Endless loop or not? Yes


In [ ]:
x = 1
while x != 100:
    x = x + 5
print "Done"

Endless loop or not? Yes - we skip right over x = 100!


In [ ]:
x = 1
while x <= 100:
    x = x + 5
print "Done"

Endless loop or not? No


3. Simple loop practice (4pts)

Write code to accomplish each of the following tasks using a for loop or a while loop. Choose whichever type of loop you want for each problem (you can try both, if you want extra practice).

(A) (1pt) Print the integers between 8 and 33, inclusive.


In [13]:
for i in range(8,34):
    print i


8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

(B) (1pt) Starting with x = 1, double x until it's greater than 1000. Print each value of x as you go along.


In [14]:
x = 1
while x <= 1000:
    x = x*2
    print x


2
4
8
16
32
64
128
256
512
1024

(C) (1pt) Print the positive integers less than 500 that are multiples of 13.


In [15]:
for i in range(13, 500, 13):
    print i


13
26
39
52
65
78
91
104
117
130
143
156
169
182
195
208
221
234
247
260
273
286
299
312
325
338
351
364
377
390
403
416
429
442
455
468
481
494

(D) (1pt) Print each character of the string "AGTAATCGCGATGAATACCATCGCAGCC" on a separate line.


In [16]:
nts = "AGTAATCGCGATGAATACCATCGCAGCC"
for i in nts:
    print i


A
G
T
A
A
T
C
G
C
G
A
T
G
A
A
T
A
C
C
A
T
C
G
C
A
G
C
C

4. File reading and processing (6pts)

For these problems, use the file sequences.txt provided on Piazza. This file contains several DNA sequences of different lengths. You can assume each sequence is on a separate line.

Note: I recommend saving sequences.txt in the same directory as this notebook to make things easier.

(A) (1pt) Using a loop, read in each sequence from the file and print it. Make sure to remove any newline characters (\n) while reading in the data.


In [17]:
fileName = "sequences.txt"
ins = open(fileName, 'r')
for line in ins:
    line = line.rstrip('\r\n') #\r is another potential line ending. I check for it just in case. 
    print line
ins.close()


CTGTGCCTGATCTTGAGGTGCCAATGAGACTCAGCGA
TAAATCACCGCCCAAGAAGTATAATGCTTGGGGGTGATAGGTTTTACATATTTTTAAGTTCGCTAGCTAAAAATTATCCGTATCATAGGCTGAA
CAGTCCTGCCAATAAAAGAAATATCCCAAGACAGATTAAGCTTTAATCTTTGCTCAACCACGTCGTGGTTATGAATTCGCTAAATTAGTTGATCTCGTTGG
TGAGGGCGAATTACCCAAGCACCGACTCACTTGTCACGGAAAAATACCGGACAATTTGTATAACTCAACAAAGTTTCGGA
TAGCTATGTCAGCCGGAGACCAGAAAGACTCTCTTGTATTTAAGGTCAGGGCTATGGCTATCGAGT
TACCGTTATTGCGTGAAACGGTGTAGCTATAGGGCTGAGTGTGTCTTTGTTTCTTCACTCCTATTGGGCTGACTACGATTGCCCTTAGGTTTTCATTTAGTTGTTAATAATCGCTACT
TTCTTAAGACCGCCGAGCTTCGTCTTTTATGGCC
GAACGACCAACATGCACGGTTAGGGGTTGGAATGCTATCGATTACGTCGGACCGAAAAGTCAGGAAAAAG
ATGTGTTGGGGGTCTGGGACCGCGTCGACACCTAGCGCCTTCCACGTAGCATACAGCCTGGCTCACGCGGTTCTGCGGACCCTACATAGT
GATCCGATTTGTTTCTACCGGAAGCTCCACGCAGGAGGGAGCAACGCAA
CAATAATTAGCCTCTCCCCAGGGCTCACATGCCCCCATGGTTAAATAGCACAAAGCAGATCGGTGACTGGAACCCCCTTCGTTGATGTCCGCTAATCGATGAG
CTAATTACCGCTGACTGCAGGGTGTTTCTGGTGTACACTATTCCTATATCGCAATCAATT
ACAAGTGATCATCCCGGTCATGCTAAAACGGTGATTAAGGGTACTATGCGAAGTGTAGATATGCCCTGAGCCTCTGGCCGGGCCCATCTTGCA
CCGGAACGTGGGAGCTGTTTAAAGGCCGAACATATAACGGATAAGTCTGTGTTAGCGACTAGGCCTGCAGATCAGTTTGAGCTAATAAATTCCA
AAAGTGGACTTGAGTAGAGGTGTCGAACAATGATGAGGCCTCTATTTGAATATAAACTGAACGCCAGTAGGTCCAGG
AGTCTTCAAAGAGCTGGGAAGGATCTCAGAGTGCGCCACCGACCAGCGTCCGTCCTTAGGTTGATTCTAACGCGAGGGTCTGTACATAACTTCTGTTTGACCTAAATGTATCACA
TATTGAATATCAGGCTGAGCGTCCTGACCGGTAAAAAAAACATAAAT
CAGAATAGGGGTCTTTCTCTCCCTGTTCATGTATTGTGCACACCTGGCAATGGTACTA
CGTAACCTCATGGAAGTTGCCATTAATGTAGAGTCAGACTTGCCCAGCTTCTCGATCACCCAAAATG
TTCGTAAGCCCTGACGTGTCTAGCTAAGTTTGTCCGCACGGAGCTA
GCGCTGCCGCCATCGGTTGTGCGTCATCGCAATTAGTACCAGGACGGGCGTAGCTAA
ACTCGGCCCAACGCTGGCGATATGGGGAAAAACACGGGTACAGGACGACCCTGCGAGCCTCGGAGACAGGCGATAGCGCCCGATCCTGAACT
GACGATAATAGCGGCTTTTAAACCCATAGATGGGAAACGCAATGGGTGCGCACGGTGCAGGTAAAGAGTAACAACACGGTGAACGGA
ACCCCAACCCTTCAACCCATCTTGGCCCACACTGATCAGTCCAGGTATGAACTGAGGAAGGATAAGGGCAGTGCTGTGTCATACGGGCACCCCACATAACGCCGAT
TTACACTAGCCCCGCTATGTTAACACTCGCCCCCCGTGGGCTTTTGCTCCACTGATGTTCGATCTTGTCAGGTCGCGTCTAGGTGAGTGAGTGAAGAT

(B) (1pt) Now, instead of printing the sequences themselves, print the length of each sequence. At the end, print the average length of the sequences.

Hint: use the concept of an "accumulator" variable to help with computing the average, and watch out for integer division!

[ Check your answer ] You should get 77.56 as the average.


In [18]:
fileName = "sequences.txt"
totalLen = 0
numSeqs = 0

ins = open(fileName, 'r')
for line in ins:
    line = line.rstrip('\r\n')
    print len(line)

    totalLen = totalLen + len(line)
    numSeqs = numSeqs + 1
ins.close()

print ""
print "Avg len:", float(totalLen) / numSeqs


37
94
101
80
66
118
34
70
90
49
103
60
93
94
77
115
47
58
67
46
57
92
87
106
98

Avg len: 77.56

(C) (2pts) Instead of printing lengths, print the GC content of each sequence (GC content is the number of G's and C's in a DNA sequence divided by the total sequence length). At the end print the average GC content.

[ Check your answer ] You should get ~0.48 as the average.


In [19]:
fileName = "sequences.txt"
totalGC = 0
numSeqs = 0

ins = open(fileName, 'r')
for line in ins:
    line = line.rstrip('\r\n')

    seqGC = 0 #this needs to be reset to 0 for each sequence
    for nt in line:
        if (nt == "G") or (nt == "C"):
            seqGC = seqGC + 1

    fractGC = float(seqGC) / len(line)
    print fractGC

    # add this fraction to the running total so we can compute the avg
    totalGC = totalGC + fractGC
    numSeqs = numSeqs + 1
ins.close()

print ""
print "Avg GC:", float(totalGC) / numSeqs


0.540540540541
0.36170212766
0.386138613861
0.4375
0.469696969697
0.415254237288
0.5
0.485714285714
0.611111111111
0.551020408163
0.514563106796
0.416666666667
0.516129032258
0.457446808511
0.441558441558
0.486956521739
0.36170212766
0.465517241379
0.462686567164
0.521739130435
0.59649122807
0.619565217391
0.494252873563
0.547169811321
0.530612244898

Avg GC: 0.487669412538

(D) (2pts) Convert each sequence to its reverse complement and print it. This means changing each nucleotide to its complement (A->T, T->A, G->C, C->G) and reversing the entire sequence.

Hint: we've already touched on everything you need to know to do this -- see problem 1 above for some clues!

[ Check your answer ] Spot check this by comparing at least one sequence from the file to its reverse complement and make sure it looks correct.


In [20]:
fileName = "sequences.txt"

ins = open(fileName, 'r')
for line in ins:
    line = line.rstrip('\r\n')

    # Here I am reversing and complementing at the same time.
    # You could also do it as separate steps, it will just be 
    # less efficient (not a big deal on a small dataset).
    revCompl = ""
    for nt in line:
        if nt == "A":
            revCompl = "T" + revCompl
        elif nt == "T":
            revCompl = "A" + revCompl
        elif nt == "G":
            revCompl = "C" + revCompl
        elif nt == "C":
            revCompl = "G" + revCompl
        else:
            # The below isn't strictly necessary, but it makes for more robust code.
            # A good warning/error message should state: (1) the problem, (2) the data that 
            # caused the problem, and (3) what the code will do about the problem.
            print ">> Warning: Encountered unknown nt:", nt
            print "   Keeping unknown nt as-is."
            revCompl = nt + revCompl 

    # We don't want to print until we've looped through the whole sequence.
    # So this print statement should be outside of the sequence for-loop
    # (but NOT outside the file for-loop)
    print revCompl 

ins.close()


TCGCTGAGTCTCATTGGCACCTCAAGATCAGGCACAG
TTCAGCCTATGATACGGATAATTTTTAGCTAGCGAACTTAAAAATATGTAAAACCTATCACCCCCAAGCATTATACTTCTTGGGCGGTGATTTA
CCAACGAGATCAACTAATTTAGCGAATTCATAACCACGACGTGGTTGAGCAAAGATTAAAGCTTAATCTGTCTTGGGATATTTCTTTTATTGGCAGGACTG
TCCGAAACTTTGTTGAGTTATACAAATTGTCCGGTATTTTTCCGTGACAAGTGAGTCGGTGCTTGGGTAATTCGCCCTCA
ACTCGATAGCCATAGCCCTGACCTTAAATACAAGAGAGTCTTTCTGGTCTCCGGCTGACATAGCTA
AGTAGCGATTATTAACAACTAAATGAAAACCTAAGGGCAATCGTAGTCAGCCCAATAGGAGTGAAGAAACAAAGACACACTCAGCCCTATAGCTACACCGTTTCACGCAATAACGGTA
GGCCATAAAAGACGAAGCTCGGCGGTCTTAAGAA
CTTTTTCCTGACTTTTCGGTCCGACGTAATCGATAGCATTCCAACCCCTAACCGTGCATGTTGGTCGTTC
ACTATGTAGGGTCCGCAGAACCGCGTGAGCCAGGCTGTATGCTACGTGGAAGGCGCTAGGTGTCGACGCGGTCCCAGACCCCCAACACAT
TTGCGTTGCTCCCTCCTGCGTGGAGCTTCCGGTAGAAACAAATCGGATC
CTCATCGATTAGCGGACATCAACGAAGGGGGTTCCAGTCACCGATCTGCTTTGTGCTATTTAACCATGGGGGCATGTGAGCCCTGGGGAGAGGCTAATTATTG
AATTGATTGCGATATAGGAATAGTGTACACCAGAAACACCCTGCAGTCAGCGGTAATTAG
TGCAAGATGGGCCCGGCCAGAGGCTCAGGGCATATCTACACTTCGCATAGTACCCTTAATCACCGTTTTAGCATGACCGGGATGATCACTTGT
TGGAATTTATTAGCTCAAACTGATCTGCAGGCCTAGTCGCTAACACAGACTTATCCGTTATATGTTCGGCCTTTAAACAGCTCCCACGTTCCGG
CCTGGACCTACTGGCGTTCAGTTTATATTCAAATAGAGGCCTCATCATTGTTCGACACCTCTACTCAAGTCCACTTT
TGTGATACATTTAGGTCAAACAGAAGTTATGTACAGACCCTCGCGTTAGAATCAACCTAAGGACGGACGCTGGTCGGTGGCGCACTCTGAGATCCTTCCCAGCTCTTTGAAGACT
ATTTATGTTTTTTTTACCGGTCAGGACGCTCAGCCTGATATTCAATA
TAGTACCATTGCCAGGTGTGCACAATACATGAACAGGGAGAGAAAGACCCCTATTCTG
CATTTTGGGTGATCGAGAAGCTGGGCAAGTCTGACTCTACATTAATGGCAACTTCCATGAGGTTACG
TAGCTCCGTGCGGACAAACTTAGCTAGACACGTCAGGGCTTACGAA
TTAGCTACGCCCGTCCTGGTACTAATTGCGATGACGCACAACCGATGGCGGCAGCGC
AGTTCAGGATCGGGCGCTATCGCCTGTCTCCGAGGCTCGCAGGGTCGTCCTGTACCCGTGTTTTTCCCCATATCGCCAGCGTTGGGCCGAGT
TCCGTTCACCGTGTTGTTACTCTTTACCTGCACCGTGCGCACCCATTGCGTTTCCCATCTATGGGTTTAAAAGCCGCTATTATCGTC
ATCGGCGTTATGTGGGGTGCCCGTATGACACAGCACTGCCCTTATCCTTCCTCAGTTCATACCTGGACTGATCAGTGTGGGCCAAGATGGGTTGAAGGGTTGGGGT
ATCTTCACTCACTCACCTAGACGCGACCTGACAAGATCGAACATCAGTGGAGCAAAAGCCCACGGGGGGCGAGTGTTAACATAGCGGGGCTAGTGTAA

5. Guessing game (2pts)

Write code that plays a number guessing game with the user. The code will loop until the user gets the number right or quits. Follow the directions below.

  • First, have your program generate a random integer between 1 and 20 (the "secret number").
  • Then prompt the user "Guess a number between 1 and 20 (enter 0 to quit): "
  • Read in their answer with raw_input() (recall that you'll need to convert the return value to an int) and save it in a variable
  • Compare their guess to your "secret number":
    • If the guess is correct, print "You got it!" and end the loop.
    • If the guess is higher than the secret number, print "Too high!" and allow the user to keep guessing.
    • If the guess is lower than the secret number, print "Too low!" and allow the user to keep guessing.
    • If they entered 0, print "Ending program" and end the loop.

Tip: It will be easier for you to test your program if you initially set the secret number to be a number of your choosing instead of something random. Then when you run the program, you will know if the responses are correct. Make sure to test all possibilities (higher, lower, equal, zero). Once you're sure the logic is correct, you can change the secret number to random generation.


In [21]:
import random

secretNum = random.randint(1,20)
quitLoop = False

while (quitLoop == False):
    guess = int(raw_input("Guess a number between 1 and 20 (enter 0 to quit): "))

    if guess == 0:
        print "Ending program."
        quitLoop = True

    elif guess == secretNum:
        print "You got it!"
        quitLoop = True

    elif guess > secretNum:
        print "Too high!"

    elif guess < secretNum:
        print "Too low!"


Guess a number between 1 and 20 (enter 0 to quit): 10
Too low!
Guess a number between 1 and 20 (enter 0 to quit): 15
Too high!
Guess a number between 1 and 20 (enter 0 to quit): 13
Too low!
Guess a number between 1 and 20 (enter 0 to quit): 14
You got it!

6. Family simulation (5pts)

Use a simulation to determine the average number of children in a family if all families had children until they had a girl (and then stopped). Assume equal probability of girls and boys and use a random generator to simulate each birth. Simulate 10,000 families and output the average number of children they had. Your answer should be close to 2.

Note: I'm purposely not giving any step by step instructions here because I'd like you to practice translating a word problem into code. It may help if you try break down the problem into smaller parts first. For example, see if you can simulate just one family first, then once that's working, add code to make it run through 10,000 families. Alternatively, you might feel more comfortable writing code to print "hi" 10,000 times, and then replace "print hi" with code for simulating each family. Do whatever makes the problem feel more manageable to you. Learning to break down big problems into smaller parts like this is an essential skill for programming, so I encourage you to work through this. If you get it, then I think it's safe to say you've mastered the material in this lesson and are well on your way to becoming a true programmer!

I'll give partial credit for good efforts!


In [29]:
import random

# By convention, variables with a set, pre-defined value ("constants")
# are indicated by ALL_CAPS variable names and are defined at the top.
# It's not really necessary to define GIRL = 1, but it will make
# the code easier to read and understand, so it's considered good form.
NUM_FAMS = 10000 
GIRL = 1

# regular variables, like this counter, are lower case/camelCase
totalKidsAllFams = 0

for i in range(NUM_FAMS):
    numKids = 0
    hadGirl = False

    while not hadGirl:
        numKids = numKids + 1
        gender = random.randint(0,1) #the miracle of birth
        if gender == GIRL:
            hadGirl = True
        
    # once we exit the while loop, we must have had a girl.
    # so add the number of kids this fam had to the running total, and
    # move on to the next family
    totalKidsAllFams = totalKidsAllFams + numKids

# finally, divide the total number of kids by the number of families to get the avg
print "Avg kids per fam:", float(totalKidsAllFams) / NUM_FAMS


Avg kids per fam: 2.0132

Extra Problems (0pts)

The following problems are for people who would like more practice. They will not be counted for points.

(A) Computing factorials. The factorial of a number n is defined as:

n! = n * (n - 1) * (n - 2) * ... * 1

Prompt the user to enter a positive integer and then output the factorial of that number. Do not use any modules (i.e. don't use the factorial function in the math module)!


In [8]:
num = int(raw_input("Enter a positive integer: "))
factorial = 1
for i in range(1, num+1):
    factorial = factorial * i
print factorial


Enter a positive integer: 5
120

(B) Fibonacci sequence. The Fibonacci sequence is a series where each number is the sum of the previous two numbers. The first two numbers are always 0 and 1. So for example, the first 10 numbers of the series are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34. Prompt the user to enter a positive integer, and then output that number of terms in the Fibonacci sequence.


In [19]:
num = int(raw_input("Enter a positive integer: "))
fibA = 0
fibB = 1
fibCurr = 1

# always print the first term
print fibA

# only print the second term if at least 2 terms are requested
if num > 1:
    print fibB

# print num-2 additional terms (if num == 2, this won't run)
for i in range(num-2):
    fibCurr = fibA + fibB
    print fibCurr
    fibA = fibB
    fibB = fibCurr


Enter a positive integer: 15
0
1
1
2
3
5
8
13
21
34
55
89
144
233
377

(C) Counting mutations. Prompt the user to input two DNA sequences of the same length. Count up the number of nucleotides that differ between the sequences and output this number.

[ Check your answer ] Try the following to check that your program is correct:
Seq1 = AAGTCGTACA
Seq2 = AAGTCGGACG
Num differences: 2

Seq1 = GTGTGATGAGCGCGACA
Seq2 = GAAAGATGAGCGTGTCA
Num differences: 5


In [16]:
seq1 = raw_input("Enter seq1: ")
seq2 = raw_input("Enter seq2 (same length): ")
diffCount = 0

for i in range(len(seq1)):
    if seq1[i] != seq2[i]:
        diffCount = diffCount + 1
        
print diffCount


Enter seq1: GTGTGATGAGCGCGACA
Enter seq2 (same length): GAAAGATGAGCGTGTCA
5

(D) Exact motif search. Prompt the user to input a "reference" sequence (a longish DNA sequence) and a "query" (a shorter DNA sequence). Print out the locations (nt position) of all exact matches (if any) of the query within the reference.

There are shortcuts for doing this in Python, such as the .find() function, but see if you can do it without such things.

[ Check your answer ]
Reference: ATGCGCTAAAGCGCTAGATCTCTAGCTAAAGCTAGCTTATTCGGATGGGCTAG
Query: AAAGC
Matches at position 7 and 27 within the reference (counting starting at 0)


In [18]:
refSeq = raw_input("Enter the reference sequence: ")
querySeq = raw_input("Enter the query: ")

refLength = len(refSeq)
queryLength = len(querySeq)

for i in range(refLength - queryLength + 1): 
    matchedPositions = 0
    for j in range(queryLength):
        if querySeq[j] == refSeq[i+j]:
            matchedPositions += 1 
    if matchedPositions == queryLength: 
        print "Match found at index", i, "in reference."


Enter the reference sequence: ATGCGCTAAAGCGCTAGATCTCTAGCTAAAGCTAGCTTATTCGGATGGGCTAG
Enter the query: AAAGC
Match found at index 7 in reference.
Match found at index 27 in reference.