Midterm exams

This is a "closed book" examination - in particular, you are not to use any resources outside of this notebook (except possibly pen and paper). You may consult help from within the notebook using ? but not any online references. You should turn wireless off or set your laptop in "Airplane" mode prior to taking the exam.

You have 2 hours to complete the exam.


In [24]:
%matplotlib inline

Q1 (10 points).

  1. Read the data/iris.csv data set into a Pandas DataFrame. Dispaly the first 4 lines of the DataFrame. (2 points)
  2. Create a new DataFrame showing the mean SepalLength, SepalWidth, PetalLength and PetalWidth for the 3 different types of irises. (4 points)
  3. Make a scatter plot of SepalLength against PetalLength where each species is assigned a different color. (4 points)

In [ ]:

Q2 (10 points)

Write a function peek(df, n) to display a random selection of $n$ rows of any dataframe (without repetition). Use it to show 5 random rows from the iris data set. The function should take as inputs a dataframe and an integer. Do not use the pandas sample method.


In [ ]:

Q3 (10 points)

Write a function that when given $m$ vectors of length $k$ and another $n$ vectors of length $k$, returns an $m \times n$ matrix of the cosine distance between each pair of vectors. Take the cosine distance to be $$ \frac{A \cdot B}{\|A\} \|B\|} $$ for any two vectors $A$ and $B$.

Do not use the scipy.spatial.distance.cosine function or any functions from np.linalg or scipy.llnalg.


In [ ]:

Q4 (10 points)

Consider the following matrix $A$ with dimensions (4,6), to be interpreted as 4 rows of the measurements of 6 features.

np.array([[5, 5, 2, 6, 2, 0],
          [8, 6, 7, 8, 9, 7],
          [9, 5, 0, 4, 6, 8],
          [8, 7, 9, 3, 6, 1]])
  1. Add 1 to the first row, 2 to the second row, 3 to the third row and 4 to the fourth row using a vector v = np.array([1,2,3,4]) and broadcasting. (2 points)
  2. Normalize A so that its row means are all 0 and call it A1. (2 points)
  3. What are the singular values of A1? (2 points)
  4. What are the eigenvalues of the covariance matrix of A1? (2 points)
  5. Find the least squares solution vector $x$ if $Ax = y$ where y = np.array([1,2,3,4]).T (2 points)

In [ ]:

Q10 (10 points)

We want to calculate the first 100 Catalan numbers. The $n^\text{th}$ Catalan number is given by $$ C_n = \prod_{k=2}^n \frac{n+k}{k} $$ for $n \ge 0$.

  1. Use numpy to find the first 100 Catalan number - the function should take a single argument $n$ and return an array [Catalan(1), Catalan(2), ..., Catalan(n)] (4 points).
  2. Use numba to find the first 100 Catalan numbers (starting from 1) fast using a JIT compilation 4 points)
  3. Use cython to find the first 100 Catalan numbers (starting from 1) fast both AOT compilation (4 points)

In each case, code readability and efficiency is important.


In [ ]: