CSE 6040, Fall 2015: Practice Exam

This notebook contains a set of practice questions that should help you review for the final exam. It is fairly close in form and content to what you should expect on the final.

Run these cells, which load modules you are likely to need. Also, be sure to download or pull the latest version of cse6040utils.py and quickly review it.


In [ ]:
import math
from decimal import Decimal
import numpy as np
import scipy as sp

import urllib
import sqlite3 as db
import pandas as pd

In [ ]:
import matplotlib.pyplot as plt
from IPython.display import display
import seaborn as sns

%matplotlib inline

In [ ]:
%reload_ext autoreload
%autoreload 2

import cse6040utils

Problem 1: Can you compute the variance quickly and accurately?

Let $x_0, x_1, \ldots, x_{n-1}$ be a sequence of $n$ observations in some experiment. The sample mean is defined as,

$$ \begin{array}{rcl} \mu_n & \equiv & \frac{1}{n} \sum_{i=0}^{n-1} x_i. \end{array} $$

The sample variance is defined as,

$$ \begin{array}{rcl} \sigma_n^2 & \equiv & \frac{1}{n} \sum_{i=0}^{n-1} (x_i - \mu_n)^2 \end{array} $$

The standard way to compute the sample variance is to first compute the sample mean ($\mu_n$), and then compute the sample variance ($\sigma_n^2$). Thus, the standard approach requires two passes through the data.

Part a) Write a Python function to compute the sample mean. You may assume the input list, x, contains floating-point values as a Numpy array.

Ground rules: Do not use Numpy's mean() function. (You may use other Numpy functions.)


In [ ]:
def sample_mean (x):
    """Returns the sample mean of the Numpy array x."""
    assert type (x) is np.ndarray
    assert x.dtype == float
    
    # @YOUSE: Complete this function
    pass

In [ ]:
# Testing code -- Not graded, so you can modify this cell when debugging
x = np.random.rand (5)

print "Input:", x
print "Your result:", sample_mean (x)
print "Error compared to Numpy's built-in function:", sample_mean (x) - np.mean (x)

Part b) Write a Python function to compute the sample variance, using the standard ("two-pass") approach. That is, your implementation should first compute $\mu_n$ and use it to then complete the computation of $\sigma_n^2$.

Ground rules: You can use sample_mean() above, but do not use Numpy's mean(), var(), or std() functions.


In [ ]:
def sample_variance (x):
    """Returns the sample variance of the Numpy array x."""
    assert type (x) is np.ndarray
    assert x.dtype == float
    
    # @YOUSE: Complete this function
    pass

In [ ]:
# Testing code -- Not graded, so you can modify this cell
print "Your result:", sample_variance (x)
print "Error compared to Numpy's built-in function:", sample_variance (x) - np.var (x)

Part c) Suppose the dataset is very large and sweeping through it is, therefore, very slow. You might naturally ask whether there is a one-pass approach for computing the sample variance. In other words, is it possible to compute the sample variance making just one sweep through the data?

Your colleague suggests a one-pass method. The idea derives from the following observation.

$$ \begin{array}{rcl} \sigma_n^2 & \equiv & \frac{1}{n} \sum_{i=0}^{n-1} (x_i - \mu_n)^2 \\ & = & \frac{1}{n} \sum_{i=0}^{n-1} \left( x_i^2 - 2 x_i \mu_n + \mu_n^2 \right) \\ & = & \left( \frac{1}{n} \sum_{i=0}^{n-1} x_i^2 \right) - 2 \mu_n \underbrace{\left( \frac{1}{n} \sum_{i=0}^{n-1} x_i \right)}_{= \mu_n} + \mu_n^2 \\ & = & \frac{1}{n} \left( \sum_{i=0}^{n-1} x_i^2 \right) - \mu_n^2. \end{array} $$

Thus, in just one pass through the data, you could compute the pair, $(a, b)^T$ by the vector sum,

$$ \begin{array}{rcl} \left(\begin{array}{c} a \\ b \end{array}\right) & \leftarrow & \sum_{i=0}^{n-1} \left(\begin{array}{c} x_i \\ x_i^2 \end{array}\right), \\ v & \leftarrow & \frac{b}{n} - \left (\frac{a}{n} \right)^2. \end{array} $$

For this part of the exercise, implement this one-pass scheme.


In [ ]:
def sample_variance__one_pass (x):
    # @YOUSE: Implement the one-pass scheme
    pass

In [ ]:
# Testing code -- Not graded, so you can modify this cell when debugging
print "Numpy's result:", np.var (x)
print "Your two-pass algorithm:", sample_variance (x)
print "Your one-pass algorithm:", sample_variance__one_pass (x)

Part d) With respect to its numerical accuracy, the one-pass approach can have a downside. Explain what the problem can be.

(Enter response here)

Problem 2: Linear Regression via Maximum Likelihood Estimation

Suppose you are given a sequence of observations, $(x_0, y_0), (x_1, y_1), \ldots, (x_{n-1}, y_{n-1})$, where $x_i$ and $y_i$ are real numbers. You decide to fit a noisy line to these observations:

$$ \begin{array}{rcl} y_i & = & \theta_0 + \theta_1 x_i + \epsilon_i, \end{array} $$

where $\theta_0$ is a y-intercept, $\theta_1$ is the slope of a line, and $\epsilon_i$ is an unknown random variable intended to model the error or noise in the observation. In other words, you believe the underlying phenomenon should produce the value $\theta_0 + \theta_1 x_i$ given $x_i$, but each measurement instead produces a noisy value $\theta_0 + \theta_1 x_i + \epsilon_i$, where $\epsilon_i$ is drawn from some distribution.

This model can be written more compactly as $y = X\theta + \epsilon$ using our usual conventions,

$$ \begin{array}{rcl} y & \equiv & (y_0, \ldots, y_{n-1})^T \\ X & \equiv & \left(\begin{array}{cc} 1 & x_0 \\ \vdots & \vdots \\ 1 & x_{n-1} \end{array}\right) \\ \theta & \equiv & (\theta_0, \theta_1)^T \\ \epsilon & \equiv & (\epsilon_0, \ldots, \epsilon_{n-1})^T. \end{array} $$

Suppose you further believe that each error $\epsilon_i$ is independent and identically distributed as a normal (Gaussian) random variable, $\mathcal{N}(0, \sigma^2)$, with mean $0$ and variance $\sigma^2$. In this case, given $x_i$ and the parameters $\theta$ and $\sigma$, then $y_i$ will also be normally distributed as $\mathcal{N}(\theta_0 + \theta_1 x_i, \sigma^2)$. Thus, the conditional probability of the observed value $y_i$ may be written as

$$ \begin{array}{rcl} \mbox{Pr}[y_i \,|\, x_i; \theta, \sigma] & \equiv & \displaystyle \frac{1}{\sigma \sqrt{2 \pi}} \exp \left[-\frac{(y_i - \theta_0 - \theta_1 x_i)^2}{2\sigma^2}\right]. \end{array} $$

Part a) Show that the log-likelihood of the parameters, $(\theta, \sigma)$, given the observations is

$$ \begin{array}{rcl} \mathcal{L}(\theta, \sigma) & = & \displaystyle - n \ln \left(\sigma \sqrt{2 \pi}\right) - \frac{1}{2\sigma^2} \|y - X\theta\|_2^2. \end{array} $$

Part b) Show that the gradient of $\mathcal{L}$ with respect to $\theta$ is:

$$ \begin{array}{rcl} \nabla_\theta \mathcal{L}(\theta, \sigma) & = & \displaystyle \frac{X^T \left(y - X \theta\right)}{\sigma^2}. \end{array} $$

Part c) Show that the partial derivative of $\mathcal{L}$ with respect to $\sigma$ is:

$$ \begin{array}{rcl} \displaystyle {\frac{\partial} {\partial \sigma}} \mathcal{L}(\theta, \sigma) & = & \displaystyle -\frac{n}{\sigma} + \frac{1}{\sigma^3} \|y - X \theta\|_2^2. \end{array} $$

Part d) Suppose you determine the parameters that maximize the log-likelihood:

$$ \begin{array}{rcl} (\theta_*, \sigma_*) & = & \arg\max_{\theta, \sigma} \, \mathcal{L}(\theta, \sigma) \end{array} $$

Show that these estimates are given by,

$$ \begin{array}{rcl} X^T X \theta_* & = & X^T y \\ \sigma_*^2 & = & \displaystyle \frac{1}{n} \|y - X\theta_*\|_2^2. \end{array} $$

Observe that the MLE procedure, being based on a particular model, also yields more information about the data than the ad hoc least squares minimization procedure, in the form of an estimate for $\sigma^2$.

Part e) Write a Python function to determine $\theta$ and $\sigma$ from a set of observations.


In [ ]:
def linreg_mle (X, y):
    assert type (y) is np.ndarray
    assert type (X) is np.ndarray
    assert len (y) == len (X)
    
    # Your code should compute and return the following:
    theta = np.array ([[0],
                       [0]])
    sigma = 0.0
    
    # @YOUSE: Fill in your code here to update or compute
    # theta and sigma.
    pass

    return (theta, sigma)

In [ ]:
# This code cell just tests your implementation on a synthetic data set.
# It compares your MLE-based procedure against conventional least squares.
# It is not graded so you are free to modify it for testing and debugging.

n = 50 # Number of observations
theta_true = cse6040utils.generate_model (1)
sigma_true = 0.1
(X, y) = cse6040utils.generate_data (n, theta_true, sigma=sigma_true)

print X.shape
print theta_true.shape
print y.shape

print "\nCondition number of the data matrix: ", np.linalg.cond (X)
print "True model coefficients:", theta_true.T, sigma_true

theta_lstsq = cse6040utils.linreg_fit_lstsq (X, y)
(theta_mle, sigma_mle) = linreg_mle (X, y)

print "\nEstimated model coefficients via least squares:", theta_lstsq.T
print "Relative error in the coefficients:", cse6040utils.rel_diff (theta_lstsq, theta_true)

print "\nEstimated model coefficients via MLE:", theta_mle.T, sigma_mle
print "Relative error in the coefficients:", cse6040utils.rel_diff (theta_mle, theta_true)

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot (X[:, 1], y, 'bo') # Noisy observations
ax1.plot (X[:, 1], X.dot (theta_lstsq), 'r^') # Fit via least squares
ax1.plot (X[:, 1], X.dot (theta_mle), 'gv')
ax1.plot (X[:, 1], X.dot (theta_true), 'm*') # True solution

Problem 3: A linear model of exam scores

In this problem, you'll try to try to predict a student's final exam score from his or her midterm scores, using your linreg_mle() procedure from Problem 2 (above).

These data come from, "Test scores for general psychology", available at: http://college.cengage.com/mathematics/brase/understandable_statistics/7e/students/datasets/mlr/frames/frame.html

Let's start by reading these scores into a Pandas data frame.


In [ ]:
df_psych = pd.read_csv ('http://cse6040.gatech.edu/fa15/exam-scores.csv')
display (df_psych.head ())

Regard this data set as having 3 predictors (Exam1, Exam2, and Exam3) and 1 response variable (FinalExam).

Part a) Use Seaborn's pairplot() routine to inspect how the predictors and response variables relate.


In [ ]:
# @YOUSE: Fill in your implementation here
#sns.pairplot (...)
pass

Part b) Use your linreg_mle() procedure to estimate the $\theta_*$ and $\sigma_*$ that fit these data.

To help you out, we've provided some code to build $X$ and $y$; see the two code cells below, and fill in the third code cell to compute $\theta_*$ and $\sigma_*$.


In [ ]:
def df_to_augmat (df):
    """
    Converts a Pandas data frame, 'df', into an augmented data matrix,
    that is, a matrix whose columns are taken from the columns of 'df'
    with an additional column of all ones prepended to it.
    """
    A = df.as_matrix ()
    return np.insert (A, 0, 1.0, axis=1)

In [ ]:
# Extract response variable (final exam) and predictors (exams 1, 2, and 3)
y_psych = df_psych[['FinalExam']].as_matrix ()
X_psych = df_to_augmat (df_psych[['Exam1', 'Exam2', 'Exam3']])

print "Data matrix (first few rows):"
print X_psych[0:5, :]
print "   ..."

print "\nResponse variable (first few entries):"
print y_psych[0:5, :]
print "   ..."

In [ ]:
# @YOUSE: Write some code here to use your `linreg_mle()`
# routine to estimate theta and sigma.
pass

Problem 4: Linear regression via gradient descent

In this problem, you'll continue Problem 2 by developing a gradient descent method for linear regression.

Recall that the log-likelihood for linear regression assuming $n$ i.i.d. samples and normally distributed errors (with mean 0 and variance $\sigma^2$) has a log-likelihood given by,

$$ \begin{array}{rcl} \mathcal{L}(\theta, \sigma) & = & \displaystyle - n \ln \left(\sigma \sqrt{2 \pi}\right) - \frac{1}{2\sigma^2} \|y - X\theta\|_2^2, \end{array} $$

where $X$ is the $n \times (d+1)$ augmented data matrix and $y$ the vector of observed response variable values.

In the MLE procedure, your goal is to determine the $\theta_*, \sigma_*$ values that maximize $\mathcal{L}(\theta, \sigma)$.

Next, recall the generic gradient descent procedure discussed in Lab 26. In the following code cells, we ask you to specialize the gradient descent procedure for the case of linear regression. Refer back to your results in Problem 2 and complete this implementation.

Part a) Complete the following function, which should return the gradient with respect to $\theta$ of the log-likelihood for linear regression.


In [ ]:
def linreg_grad_theta_loglikelihood (X, y, theta, sigma):
    """
    Returns the gradient of the linear regression log-likelihood
    with respect to 'theta', evaluated at (X, y, theta, sigma).
    """
    # @YOUSE: Fill in this implementation
    pass

Part b) Complete the following function, which should return the partial derivative with respect to $\sigma$ of the log-likelihood for linear regression.


In [ ]:
def linreg_grad_sigma_loglikelihood (X, y, theta, sigma):
    """
    Returns the partial derivative of the linear regression
    log-likelihood with respect to 'sigma', evaluated at
    the point (X, y, theta, sigma).
    """
    # @YOUSE: Fill in this implementation
    pass

Part c) The function below contains the "main loop" of gradient descent. Complete the function, in particular by updating the theta_t and sigma_t inside the loop.

This function is written to optionally save and return all of the theta_t and sigma_t estimates across iterations, for later inspection in Part b).


In [ ]:
def linreg_mle_gd (X, y, ALPHA=0.1, MAX_STEP=1000, history=False):
    """
    Returns (theta, sigma), the parameters of a linear regression model
    based on maximum likelihood estimation using a gradient descent
    fitting procedure. The data are given by the matrix of predictors,
    X, and the vector of responses, y.
    
    If 'history' is set to True, then this routine returns the
    parameters at each iteration.
    """
    n = X.shape[0] # No. of data points
    d = X.shape[1] # Dimension (no. of features per data point)

    # Initial guesses
    theta_t = np.zeros ((d, 1))
    sigma_t = 1.0
    
    if history:
        # Make space for history and record the initial guess
        theta_all = np.zeros ((d, MAX_STEP+1))
        theta_all[:, 0:1] = theta_t
        sigma_all = np.zeros (MAX_STEP+1)
        sigma_all[0] = sigma_t

    for t in range (MAX_STEP):
        # @YOUSE: Fill in the code to update (theta_t, sigma_t)
        pass

        # Records your new (theta_t, sigma_t) values
        if history:
            theta_all[:, t+1:t+2] = theta_t
            sigma_all[t+1] = sigma_t
    
    if history:
        return (theta_all, sigma_all)
    else:
        return (theta_t, sigma_t)

In [ ]:
# Tests your implementation of linreg_mle_gd. This code cell
# is not graded, so you can modify it for testing and debugging.

df_psych = pd.read_csv ('http://cse6040.gatech.edu/fa15/exam-scores.csv')

print "First few rows of the data:"
display (df_psych.head ())

y_psych = df_psych[['FinalExam']].as_matrix ()
X_psych = df_to_augmat (df_psych[['Exam1', 'Exam2', 'Exam3']])

(thetas, sigmas) = linreg_mle_gd (X_psych, y_psych, history=True)
theta_psych_mle_gd = thetas[:, -1:]
sigma_psych_mle_gd = sigmas[-1]

print "\nEstimated parameters:", theta_psych_mle_gd.T.flatten (), sigma_psych_mle_gd

y_psych_predicted_mle_gd = X_psych.dot (theta_psych_mle_gd)
relerr_psych_mle_gd = np.divide (y_psych_predicted_mle_gd - y_psych, y_psych)
mean_sq_relerr_psych_mle_gd = np.linalg.norm (relerr_psych_mle_gd, ord=2) / len (y_psych)
print "\nMean squared relative error:", mean_sq_relerr_psych_mle_gd

Part d) Run the code below to plot the magnitude of the parameter estimates versus iteration count. Does the magnitude eventually converge, and if so, after how many iterations?


In [ ]:
plot_df = pd.DataFrame (np.arange (thetas.shape[1]), columns=['Iteration'])
params = np.insert (thetas, thetas.shape[0], sigmas, axis=0)
plot_df['norm(Params)'] = np.linalg.norm (params, ord=2, axis=0)

display (plot_df.head ())
print "   ..."
display (plot_df.tail ())
sns.lmplot ('Iteration', 'norm(Params)', plot_df, fit_reg=False)

Problem 5: Linear regression via gradient descent, revisited: an SQL-based implementation

In this problem, you will take your solution from the preceding question and write a version that can work when the data resides in a SQLite database.

First, suppose the exam scores are stored in an SQLite database. The following code cell downloads a prepared SQL database and saves it to disk.


In [ ]:
exams_db_url = 'http://cse6040.gatech.edu/fa15/exam-scores-tuples.db'
(exams_db_filename, _) = urllib.urlretrieve (exams_db_url)
print "Downloaded database to:", exams_db_filename

In [ ]:
exams_db = db.connect (exams_db_filename)
pd.read_sql_query ('SELECT * FROM ExamScores LIMIT 10', exams_db)

Start by selecting the predictor variables (Exam1, Exam2, and Exam3) into a data matrix, $X$, augmented with a column of ones. Let's store this data matrix as a table, also called X. It will be a set of tuples, (I, J, V), where each tuple corresponds to the values $(i, j, x_{ij})$ of $X$.

You may assume that all $x_{ij}$ entries exist and are unique.


In [ ]:
exams_cursor = exams_db.cursor ()
exams_cursor.execute ('DROP TABLE IF EXISTS X')
exams_cursor.execute ("""
  CREATE TABLE X AS
    SELECT StudentId AS I, Assignment AS J, Score AS V
      FROM ExamScores
      WHERE Assignment IN ('Exam1', 'Exam2', 'Exam3')
""")
exams_cursor.execute ("""
  INSERT INTO X
    SELECT DISTINCT StudentId AS I, 'Intercept' AS J, 1.0 AS Value
      FROM ExamScores
""")
exams_db.commit ()

display (pd.read_sql_query ('SELECT * FROM X ORDER BY I LIMIT 10', exams_db))

Similarly, let's create a table Y to hold the response variable, which is the final exam score. That is, Y will have columns (I, V), and each row will correspond to student $i$'s final exam score, $y_i$.

(Again, assume all $y_i$ exist and each $i$ is unique.)


In [ ]:
exams_cursor.execute ('DROP TABLE IF EXISTS Y')
exams_cursor.execute ("""
  CREATE TABLE Y AS
    SELECT StudentId AS I, Score AS V
      FROM ExamScores
      WHERE Assignment = 'FinalExam'
""")
exams_db.commit ()

display (pd.read_sql_query ('SELECT * FROM Y LIMIT 5', exams_db))

Given matrix and vector objects stored as database tables, you can do linear algebra operations using joins. For example, suppose you wish to compute the product, $z = X^T y$, which is a subexpression of the gradient of the log-likelihood. Using the same convention that an output vector, when stored as a table, has a column index I and value V, then you might run the following query.


In [ ]:
exams_cursor.execute ('DROP TABLE IF EXISTS Z')
exams_cursor.execute ("""
  CREATE TABLE Z AS
    SELECT X.J AS I, SUM (X.V * Y.V) AS V
      FROM X, Y
      WHERE X.I = Y.I
      GROUP BY X.J
""")
exams_db.commit ()

display (pd.read_sql_query ('SELECT * FROM Z ORDER BY I LIMIT 5', exams_db))
exams_cursor.execute ('DROP TABLE Z') # clean-up

With the preceding conventions for creating matrices and vectors as database tables, here is your task: use SQL to implement the gradient descent procedure for computing a linear regression!

Start by filling in the generic functions below, which perform basic linear algebra operations. To help you get started, we've created some for you.

By convention, if the computed result is a vector or matrix, then the following functions create a table in the database with a given name to store that result.

However, for scalar functions, these functions typically return just the value itself rather than creating a table just to store the scalar.

For instance, here is a function that queries a given matrix table in the database and returns the number of rows it has.


In [ ]:
def sql_get_mat_num_rows (X, db):
    """Returns the number of rows in X."""
    c = db.cursor ()
    n = pd.read_sql_query ("""
      SELECT COUNT (*) AS V
        FROM (SELECT DISTINCT I FROM {X})
    """.format (X=X), db).iloc[0]['V']
    return n

# Example: Call with the name of a matrix table
print "The matrix X has", sql_get_mat_num_rows ('X', exams_db), "rows."

Here is some more "freebie" code, which simply encapsulates the matrix-vector product, $y \leftarrow A^T \cdot x$, we saw in the motivating example above as a standalone function.


In [ ]:
def sql_mat_trans_dot_vec (y, A, x, db):
    """
    Computes y = A^T * x.
    
    The name of the input matrix table is given by A and the name
    of the input vector table is given by x. The name of the
    desired output vector table is y; if y exists, it is replaced
    (dropped and recreated).
    """
    c = db.cursor ()
    c.execute ('DROP TABLE IF EXISTS {y}'.format (y=y))
    c.execute ("""
      CREATE TABLE {y} AS
        SELECT {A}.J AS I, SUM ({A}.V * {x}.V) AS V
          FROM {A}, {x}
          WHERE {A}.I = {x}.I
          GROUP BY {A}.J
    """.format (y=y, A=A, x=x))
    db.commit ()

For instance, let's repeat the above example of computing $z \leftarrow X^T y$, where $X$ and $y$ are the predictors matrix (X table) and response vector (y table), respectively. Recall that the output $z$ is just a dummy vector, which we'll store in a dummy table (Z) and then promptly remove it.


In [ ]:
# Demo of sql_mat_trans_dot_vec()
sql_mat_trans_dot_vec ('Z', 'X', 'Y', exams_db)
display (pd.read_sql_query ('SELECT * FROM Z ORDER BY I LIMIT 5', exams_db))
exams_cursor.execute ('DROP TABLE Z') # clean-up

Part a) Complete the implementation for a matrix-vector product, $y \leftarrow A \cdot x$.


In [ ]:
def sql_mat_dot_vec (y, A, x, db):
    """Computes y = A*x."""
    c = db.cursor ()
    c.execute ('DROP TABLE IF EXISTS {y}'.format (y=y))
    
    # @YOUSE: Fill in this implementation
    pass

    db.commit ()

Part b) Complete the implementation of an axpy operation (pronounced "ax-pee"), which computes $z \leftarrow \alpha \cdot x + y$, where $\alpha$ is a scalar and $x$, $y$, and $z$ are vectors.


In [ ]:
def sql_axpy (z, a, x, y, db):
    """
    Computes z = a*x + y, where x, y, and z are vectors and
    a is a scalar.
    """
    c = db.cursor ()
    c.execute ('DROP TABLE IF EXISTS {z}'.format (z=z))
    
    # @YOUSE: Fill in this implementation
    pass

    db.commit ()

Part c) Complete the following function, which should return $x^T x$, given a vector $x$.

By the convention mentioned above, since $x^T x$ is a scalar, this function should simply return that scalar value, rather than creating a table just to hold the scalar.


In [ ]:
def sql_get_vec_dot_self (x, db):
    """
    Returns x^T * x.
    
    Note that this function does _not_ modify the database.
    Instead, it should run a query to compute the dot
    product and return that value as a scalar.
    """
    # @YOUSE: Fill in this implementation
    pass

With those complete, let's give you some more freebies.

The next two functions have a special feature: they update tables in-place!

For instance, sql_scale_inplace() takes an existing vector table and scales all the values (column V in the table) by a given scalar value a.


In [ ]:
def sql_scale_inplace (y, a, db):
    """
    Computes y = a*y, that is, scaling all values of a matrix
    or vector table object by the numerical factor a. Note
    that this operation completes _in-place_.
    """
    c = db.cursor ()
    c.execute ('UPDATE {y} SET V = V * ({a})'.format (y=y, a=a))
    db.commit ()

In [ ]:
def sql_vec_update (y, x, db):
    """Computes y = y + x. This action occurs in-place."""
    c = db.cursor ()
    query = """
      REPLACE INTO {y} (I, V)
        SELECT {y}.I AS I, {y}.V + {x}.V AS V
          FROM {y}, {x} WHERE {y}.I = {x}.I
    """.format (y=y, x=x)
    c.execute (query)
    db.commit ()

To implement the gradient descent procedure, you will need to create a fitted-parameters vector. In this problem, let's use a table to store this vector of parameter values, i.e., $\theta$ and $\sigma$.

The parameters vector's indices will come from the column values of $X$. In addition, let's augment the parameter vector with an additional entry, called Sigma_, to hold the estimated value of $\sigma$.


In [ ]:
def sql_create_params (X, params, db):
    """
    Creates an initial parameters vector. It has 1 entry for each
    column of the table (matrix) X, plus an additional entry named
    `Sigma_` (note trailing underscore).
    """
    c = db.cursor ()
    c.execute ('DROP TABLE IF EXISTS {params}'.format (params=params))
    c.execute ("""
        CREATE TABLE {params} (I TEXT PRIMARY KEY, V REAL)
      """.format (params=params))
    
    # The initial parameter vector is set entirely to '0', except
    # for sigma (named 'Sigma_'), which is set to '1.0'
    c.execute ("""
        INSERT INTO {params}
          SELECT DISTINCT {X}.J AS I, 0.0 AS V FROM {X}
      """.format (params=params, X=X))
    c.execute ("""
        INSERT INTO {params} VALUES ('Sigma_', 1.0)
      """.format (params=params))
    
    db.commit ()

In [ ]:
# Test code for the above routine
sql_create_params ('X', 'Theta', exams_db)
pd.read_sql_query ('SELECT * FROM Theta', exams_db)

Part d) Now for the fun part: we want you to implement the gradient descent procedure in the code cell(s) below. If you need additional helper subroutines beyond the ones shown above, feel free to create additional code cells for them.


In [ ]:
def linreg_mle_gd_sql (X, y, params, db, ALPHA=0.1, MAX_STEP=1000):
    """
    Uses gradient descent to do maximum likelihood estimation for
    the linear regression problem. The input predictors are
    stored in a data matrix as a table named X; the response is a
    vector stored as a table named y. The output is a table named
    params, consisting of the fitted parameters.
    
    For more info on the parameters vector, see `create_params()`.
    """
    # @YOUSE: Implement this procedure
    pass

The following two cells run your procedure and print its results. How does it compare to the pure Python implementation you did in the other practice question?


In [ ]:
linreg_mle_gd_sql ('X', 'Y', 'Theta', exams_db, MAX_STEP=1000)
pd.read_sql_query ('SELECT * FROM Theta', exams_db)

In [ ]:
exams_db.close () # Clean-up

Part e) You probably found the SQL-based implementation to be much more tedious than the Python version. Explain why and under what circumstances one might nevertheless want an SQL-based implementation.

(Enter your response here)