This is a "closed book" examination - in particular, you are not to use any resources outside of this notebook (except possibly pen and paper). You may consult help from within the notebook using ? but not any online references. You should turn wireless off or set your laptop in "Airplane" mode prior to taking the exam.
You have 2 hours to complete the exam.
In [38]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
Q1 (10 points).
Given the 2 matrices
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
Perform matrix multiplication of A
and B
using the following methods:
for
loops without the dot
function (4 points)
In [49]:
import numpy as np
In [40]:
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
In [41]:
m, n = A.shape
n, p = B.shape
C = np.zeros((m, p))
for i in range(m):
for j in range(p):
for k in range(n):
C[i,j] += A[i,k] * B[k, j]
C
Out[41]:
In [42]:
A @ B
Out[42]:
In [51]:
%load_ext rpy2.ipython
In [43]:
%R -iA,B A %*% B
Out[43]:
In [44]:
%R -o iris
Q2 (10 points)
Read the data/iris.csv
data set into a Pandas DataFrame, and answer the following questions:
In [50]:
import pandas as pd
In [47]:
df = pd.read_csv('data/iris.csv')
df.groupby('Species').agg(['mean', 'min', 'max'])
Out[47]:
In [48]:
df[df['Petal.Length'] < df['Sepal.Width']].mean()
Out[48]:
Q3 (10 points)
Find the longest sequence of repeated letters (e.g. 'AAA') in the string below. Print 1) the length, 2) the index of the starting location, 3) the actual sequence. If there are ties, print the last sequence found. You can assume that only the letters A, C, T and G are found in the string.
TGTAGTCCATGCGGAATTCCACAGGGGCTCTGGGGACAGATTCGGACCTTTCTGTCAACGCCAATCATGGAGGTAGTGTGAGGTATAAATTTGGTCGGCGTAGGTCAAGAAAACCCACCTGCGCTGCTGTACGACACATGGCCGAGGCTTCAAGGGCATTCCACGAAGAGGCTCATGGCAACGCCTCTCGAAAGCTGGCGCTCAGGAAGGTACGATCACCCTCGAAATCAAAGATTTCATCTGAAATAAAAGTTAGTACGCCACTTTAGGGTATCGAGTACTTACCCATTTATAACGGAGGCTGAGCGAACGCTTGGCTGATGAAAAAACAACACTCGGTATAAACGGCGATTTCCACTGATCCAGGTAAAGCATGTTTGTGGATAGCAAGGGCAAGTAGTATGCAGCGAGTTTCGTGACAGTATAGCTCGACATGTATATCTCTGTGGGCGCATTTGGATGCTGTATACTGTAGAAGCAGTATATTCCCTGATGACCGAACTTACTACAAGTTGTTGTCTCGACAGGTAGTACGTGTGATCTGTGTCTGAGACCTGCAACTGGTGCGCATTGAAACTTCGTACATAAACCTACCGACTTCACCGTTTCGGCGTCGGCTTGTAACTGGAGAGTGTTGTTGCGTCATGGTCGATTGAGGATTTGGCCTAAATGTAGCGCGTATACACTGCATTATTAGCGGCTTCGAGGAACATGTAATGGGCGAGGACAGAGAATTGTATGAGATTCAAACTGCCAGGTTTTATGGCGGACCCCTGCTCCCATTGTAATCGACCGGCGGCTGGGGTACGCCCGCACGAGGGTATCGGTAGTATATCTAGCTAAGCTCCGGTGTATGCTGTTGAGACACCATTCATGCGCAAAGCCCCACCGTGCACGCATGCGATGATAAATAAGGATGACTATGGCTTACAGAGATCTTTTTCAGGGGCGTCTTGCAATAATGGTTGATAAATGTGTTTTGCCGAATCAACTGCGCGGC
In [52]:
import re
In [53]:
s = "TGTAGTCCATGCGGAATTCCACAGGGGCTCTGGGGACAGATTCGGACCTTTCTGTCAACGCCAATCATGGAGGTAGTGTGAGGTATAAATTTGGTCGGCGTAGGTCAAGAAAACCCACCTGCGCTGCTGTACGACACATGGCCGAGGCTTCAAGGGCATTCCACGAAGAGGCTCATGGCAACGCCTCTCGAAAGCTGGCGCTCAGGAAGGTACGATCACCCTCGAAATCAAAGATTTCATCTGAAATAAAAGTTAGTACGCCACTTTAGGGTATCGAGTACTTACCCATTTATAACGGAGGCTGAGCGAACGCTTGGCTGATGAAAAAACAACACTCGGTATAAACGGCGATTTCCACTGATCCAGGTAAAGCATGTTTGTGGATAGCAAGGGCAAGTAGTATGCAGCGAGTTTCGTGACAGTATAGCTCGACATGTATATCTCTGTGGGCGCATTTGGATGCTGTATACTGTAGAAGCAGTATATTCCCTGATGACCGAACTTACTACAAGTTGTTGTCTCGACAGGTAGTACGTGTGATCTGTGTCTGAGACCTGCAACTGGTGCGCATTGAAACTTCGTACATAAACCTACCGACTTCACCGTTTCGGCGTCGGCTTGTAACTGGAGAGTGTTGTTGCGTCATGGTCGATTGAGGATTTGGCCTAAATGTAGCGCGTATACACTGCATTATTAGCGGCTTCGAGGAACATGTAATGGGCGAGGACAGAGAATTGTATGAGATTCAAACTGCCAGGTTTTATGGCGGACCCCTGCTCCCATTGTAATCGACCGGCGGCTGGGGTACGCCCGCACGAGGGTATCGGTAGTATATCTAGCTAAGCTCCGGTGTATGCTGTTGAGACACCATTCATGCGCAAAGCCCCACCGTGCACGCATGCGATGATAAATAAGGATGACTATGGCTTACAGAGATCTTTTTCAGGGGCGTCTTGCAATAATGGTTGATAAATGTGTTTTGCCGAATCAACTGCGCGGC"
In [97]:
current = s[0]
n = 0
idx = None
current = s[0]
count = 1
for i, ch in enumerate(s[1:], 1):
if ch == current:
count += 1
else:
if count >= n:
n = count
idx = i
count = 1
current = ch
idx -= n
print(n, idx, s[idx:(idx+n)])
In [90]:
n = 0
idx = None
for m in re.finditer(r'(.)(\1+)', s):
x = m.group(2)
if len(x) > n:
n = len(x)
idx = m.start()
n += 1
print(n, idx, s[idx:(idx+n)])
In [88]:
n = 0
idx = None
for m in re.finditer(r'(A+|C+|T+|G+)', s):
x = m.group(1)
if len(x) > n:
n = len(x)
idx = m.start()
print(n, idx, s[idx:(idx+n)])
Q4 (10 points)
Euclid's algorithm for finding the greatest common divisor of two numbers is
gcd(a, 0) = a
gcd(a, b) = gcd(b, a modulo b)
Note:
In [98]:
def gcd(a, b):
if b == 0:
return a
else:
return gcd(b, a % b)
In [99]:
gcd(17384, 1928)
Out[99]:
In [104]:
def lcm(a, b):
return (a*b) // gcd(a, b)
In [105]:
lcm(17384, 1928)
Out[105]:
Q5 (10 points)
Write a function to flatten a list of lists using
reduce
higher-order function (4 points)For example,
flatten([[1,2], [3,4,5],[6,7,8,9]])
should return
[1,2,3,4,5,6,7,8,9]
In [106]:
def flatten1(list_of_lists):
xs = []
for alist in list_of_lists:
for item in alist:
xs.append(item)
return xs
In [111]:
def flatten2(list_of_lists):
return [item for alist in list_of_lists for item in alist]
In [133]:
from functools import reduce
In [135]:
def flatten3(list_of_lists):
return list(reduce(lambda a, b: a + b, list_of_lists, []))
In [ ]:
xs = [[1,2], [3,4,5],[6,7,8,9]]
In [124]:
flatten1(xs)
Out[124]:
In [125]:
flatten2(xs)
Out[125]:
In [136]:
flatten3(xs)
Out[136]: