In this homework you are going to implement and test linear model fitting functions, and data quality checking functions. You will need to install (at least) python, jupyter, matplotlib, and numpy to do this assignment. Installing anaconda is a quick way to get all of those things.
There are 4 problems worth a total of 32 points. The description for each problem will tell you how many points each part is worth.
You should not need to edit the boilerplate code in this notebook, but wherever you see ## YOUR CODE HERE ##, you should replace that with your own code (obviously).
To turn in your homework, email your finished .ipynb file to huth@cs.utexas.edu. This homework is due on 10/17.
In [ ]:
# Dependencies
%config InlineBackend.figure_format = 'retina'
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
# Make testing data
def q_make_data(signal_size, n_repeats, n_timepoints):
signal = np.random.randn(n_timepoints)
data = np.random.randn(n_repeats, n_timepoints) + (signal_size ** 0.5 * signal)
return data
q_data = q_make_data(signal_size=0.5, n_repeats=50, n_timepoints=300)
# The scenario: you've done 50 repeats of the same 300-second experiment,
# while measuring the output of 1 neuron
# q_data is a 50 x 300 matrix with the output at each second in each of the 50 repeats
print q_data.shape
In [ ]:
def snr_func(data):
## YOUR CODE HERE ##
print('Estimated SNR:', snr_func(q_data))
In [ ]:
def ev_func(data):
## YOUR CODE HERE ##
print('Estimated EV:', ev_func(q_data))
In [ ]:
n_tests = 150
n_repeats = np.arange(5, 45, 5)
snr_estimates = np.array([[snr_func(q_make_data(signal_size=0.5, n_repeats=r, n_timepoints=300))
for _ in range(n_tests)]
for r in n_repeats])
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.errorbar(n_repeats, snr_estimates.mean(1), yerr=snr_estimates.std(1))
plt.xlabel('Number of repeats')
plt.ylabel('Estimated SNR')
plt.grid()
plt.subplot(1,2,2)
plt.plot(n_repeats, snr_estimates.std(1), 'o-')
plt.xlabel('Number of repeats')
plt.ylabel('Standard deviation of estimated SNR')
plt.grid();
In [ ]:
ev_estimates = np.array([[ev_func(q_make_data(signal_size=0.5, n_repeats=r, n_timepoints=300))
for _ in range(n_tests)]
for r in n_repeats])
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.errorbar(n_repeats, ev_estimates.mean(1), yerr=ev_estimates.std(1))
plt.xlabel('Number of repeats')
plt.ylabel('Estimated EV')
plt.grid();
plt.subplot(1,2,2)
plt.plot(n_repeats, ev_estimates.std(1), 'o-')
plt.xlabel('Number of repeats')
plt.ylabel('Standard deviation of estimated EV')
plt.grid();
Please answer each question below by modifying this cell.
Answer here.
Answer here.
Answer here.
In [ ]:
n_tests = 150
n_timepoints = np.arange(100, 500, 50)
snr_estimates = np.array([[snr_func(q_make_data(signal_size=0.3, n_repeats=50, n_timepoints=t))
for _ in range(n_tests)]
for t in n_timepoints])
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.errorbar(n_timepoints, snr_estimates.mean(1), yerr=snr_estimates.std(1))
plt.xlabel('Number of timepoints')
plt.ylabel('Estimated SNR')
plt.grid()
plt.subplot(1,2,2)
plt.plot(n_timepoints, snr_estimates.std(1), 'o-')
plt.xlabel('Number of timepoints')
plt.ylabel('Standard deviation of estimated SNR')
plt.grid();
In [ ]:
ev_estimates = np.array([[ev_func(q_make_data(signal_size=0.3, n_repeats=50, n_timepoints=t))
for _ in range(n_tests)]
for t in n_timepoints])
plt.figure(figsize=(12,4))
plt.subplot(1,2,1)
plt.errorbar(n_timepoints, ev_estimates.mean(1), yerr=ev_estimates.std(1))
plt.xlabel('Number of timepoints')
plt.ylabel('Estimated EV')
plt.grid();
plt.subplot(1,2,2)
plt.plot(n_timepoints, ev_estimates.std(1), 'o-')
plt.xlabel('Number of timepoints')
plt.ylabel('Standard deviation of estimated EV')
plt.grid();
Suppose you have enough time to collect 15,000 total datapoints, but you can choose whether you want to collect a lot of repetitions with a short experiment, or a few repetitions with a long experiment. Let's also suppose that the shortest experiment you can do would be 10 timepoints long. How would you set n_repeats and n_timepoints to get the best estimate of SNR or EV? And how would you define best?
In [ ]:
# Insert code for answering the bonus question here, if you want!
In [ ]:
# Make testing data
def gd_make_data(nsamp=100, noise=0):
# Generate a two dimensional stimulus (e.g two pixels) with correlations and 100 samples (e.g. points in time)
# First pixel data
x1 = np.random.randn(nsamp)
# Second pixel that is correlated with the first
x2 = .4 * x1 + .6 * np.random.randn(nsamp)
# Concatinate into a stimulus matrix - here rows are dimensions and columns are time points.
x = np.vstack([x1, x2])
## Generate weights and the corresponding one dimensional response
# Set weights on each channel
b = np.array([1, 7])
# Make response of system - this is the output of our toy neuron
y = np.dot(x.T, b) + np.random.randn(nsamp) * noise
return x, y
x, y = gd_make_data()
# Plot timeseries
plt.plot(x[0])
plt.plot(x[1])
plt.plot(y);
In [ ]:
# We are going to pretend we don't know h and make a search for h values by settting up
# a range of potential values for h1 and h2
b1, b2 = np.meshgrid(np.arange(-1, 10, .2), np.arange(-1, 10, .2))
bs = np.vstack([b1.ravel(), b2.ravel()])
# get responses from each set of weights
ys = np.dot(x.T, bs)
# calculate error between the response, y, and each of the possible responses, ys.
errfun = np.sum((y[:,None] - ys) ** 2, 0)
# reshape for plotting
errfun = errfun.reshape(b1.shape)
In [ ]:
## plot contour of error surface. Note the shape of the surface is angled
# because the two variable are correlated.
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3);
plt.axis('equal');
One way to solve this problem is using gradient descent. Take the derivative of the error function, and then take small steps along that derivative (i.e. add the step multiplied by a small step size, eps to your estimate of beta).
For this problem, the features are in the variable x, and the response is in the variable y.
Write code that uses gradient descent to solve this problem. Store the estimated betas after each gradient step, and then plot the path that the gradient descent took (i.e. the values of all the betas along the way) on top of the error contours.
In [ ]:
# Gradient descent!
steps = 100 # how many steps to take
eps = 0.001 # the size of each step
b_est = np.array([0.0, 0.0]) # store your current estimate of beta here
b_est_history = np.zeros([steps+1, 2]) # assume b_est_history[0] = result before you started
for ii in range(steps):
## YOUR CODE HERE ##
## plot contour of error surface and your regression path
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3)
## YOUR CODE HERE ##
plt.axis('equal');
Remember that an alternative to gradient descent is coordinate descent, where you only change the one weight (out of the two betas in this problem) that had the largest derivative on each step.
Write code that uses coordinate descent to solve this problem. Again, store the estimated betas after each step, and then plot the results on top of the error contours.
In [ ]:
# Coordinate descent!
steps = 100 # how many steps to take
eps = 0.001 # the size of each step
b_est = np.array([0.0, 0.0])
b_est_history = np.zeros([steps+1, 2])
for ii in range(steps):
## YOUR CODE HERE ##
## plot contour of error surface and your regression path
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3)
## YOUR CODE HERE ##
plt.axis('equal');
One way to regularize your regression solution is to stop doing gradient descent before you make it all the way to the bottom. This is done by checking the error after each step on a separate dataset (here called the validation set). This can be useful when the data is noisy (unlike the situation above, which was noiseless).
For this problem the training features and responses (that you should use to update the weights) are in the variables trnx and trny. The validation features and responses (that you should use to test your model along the way) are in the variables valx and valy.
Implement code to do gradient descent here, but now compute and store the mean squared error on both the validation and training datasets after each step. (You don't need to actually stop early, do the same number of steps as before.)
Then plot the training and validation error as a function of step number. Explain the plot.
Then plot the path that the gradient descent took on top of the error contours. Note that the contours are for the noiseless case, while the data you're using here is noisy.
In [ ]:
heldout_data = np.load('gd-heldout.npz')
trnx = heldout_data['trnx']
trny = heldout_data['trny']
valx = heldout_data['valx']
valy = heldout_data['valy']
In [ ]:
# Gradient descent!
steps = 100
eps = 0.001
b_est = np.array([0.0, 0.0])
b_est_history = np.zeros([steps+1, 2])
trn_err_history = np.zeros([steps])
val_err_history = np.zeros([steps])
for ii in range(steps):
## YOUR CODE HERE ##
## plot the training and validation error as a function of step number
plt.figure()
## YOUR CODE HERE ##
plt.legend();
print('Best step in held-out set:', val_err_history.argmin(), 'Weights:', b_est_history[val_err_history.argmin()])
print('Where gradient descent ended up:', b_est_history[-1])
## plot the betas along the way
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3)
## YOUR CODE HERE ##
plt.axis('equal');
Explain what's going on in the error plots here.
In [ ]:
# Coordinate descent!
steps = 100
eps = 0.001
b_est = np.array([0.0, 0.0])
b_est_history = np.zeros([steps+1, 2])
trn_err_history = np.zeros([steps])
val_err_history = np.zeros([steps])
for ii in range(steps):
## YOUR CODE HERE ##
## plot the training and validation error as a function of step number
plt.figure()
## YOUR CODE HERE ##
plt.legend();
print('Best step in held-out set:', val_err_history.argmin())
print(b_est_history[val_err_history.argmin()])
print(b_est_history[-1])
## plot the beta path
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3)
## YOUR CODE HERE ##
plt.axis('equal');
Now that you've implemented gradient descent and coordinate descent, let's do the easy versions! For this problem you'll be finding analytic solutions to the ordinary least squares (OLS) problem and the ridge problem.
In [ ]:
beta_ols = ## YOUR CODE HERE ##
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3)
## YOUR CODE HERE ##
plt.axis('equal');
Solve the regression problem using features x and responses y using ridge. Test the different ridge regularization coefficients (lambdas) given here. For each lambda, store the betas that you find. Then plot the resulting beta path on top of an error contour plot.
In [ ]:
lambdas = np.logspace(0, 4, 10)
betas_ridge = np.zeros((len(lambdas), 2))
for ii in range(len(lambdas)):
## YOUR CODE HERE ##
## Plot the ridge solutions on the error contours
plt.figure(figsize=(5,5))
plt.contour(b1, b2, errfun, 50, colors='k', alpha=0.3)
## YOUR CODE HERE ##
plt.axis('equal');
In [ ]:
# Generate high-dimensional data
n_features = 400 # the number of features
n_timepoints = 600 # the number of timepoints
n_training = 450 # the number of timepoints that we'll use for training
noise_level = 5.0 # how much noise to add
# generate the "true" betas, the ones that will be used to generate the data
beta_true = np.random.randn(n_features)
# generate the feature matrix, x
# this uses a trick to make the different features in x be pretty correlated
u,s,vh = np.linalg.svd(np.random.randn(n_timepoints, n_features), full_matrices=False)
x_all = (u*(s**5)).dot(vh)
x_all /= x_all.max()
# generate the responses, y = x . beta + noise
y_all = x_all.dot(beta_true) + np.random.randn(n_timepoints) * noise_level
# split x and y into training part (first n_training timepoints) ..
x = x_all[:n_training]
y = y_all[:n_training]
# .. and validation part (remaining timepoints)
x_val = x_all[n_training:]
y_val = y_all[n_training:]
# plot y, let's see what it looks like
plt.plot(y_all);
In [ ]:
def mean_squared_error(z, z_hat):
return ## YOUR CODE HERE ##
In [ ]:
best_trn_mse = ## YOUR CODE HERE ##
best_val_mse = ## YOUR CODE HERE ##
print('Best possible MSE on training set:')
print(best_trn_mse)
print('Best possible MSE on validation set:')
print(best_val_mse)
In [ ]:
betazero_trn_mse = ## YOUR CODE HERE ##
betazero_val_mse = ## YOUR CODE HERE ##
print('MSE on training set with beta=0:')
print(betazero_trn_mse)
print('MSE on validation set with beta=0:')
print(betazero_val_mse)
(Note that the feature matrix x is transposed here as compared to problem 3.)
If you do this right, the training error should be way better than the minimum possible. That is truly incredible. Unbelievable. Literally. And the validation error is much worse than what you would get from just guessing that $\beta=0$. Explain why below.
In [ ]:
beta_ols = ## YOUR CODE HERE ##
y_hat = ## YOUR CODE HERE ##
y_val_hat = ## YOUR CODE HERE ##
print('Training MSE:', mean_squared_error(y, y_hat))
print('Validation MSE:', mean_squared_error(y_val, y_val_hat))
Explain what the heck is going on here.
(Note again that x is transposed relative to problem 3.)
In [ ]:
lambdas = np.logspace(-3, 5, 10) # let's check 10 lambdas between 10^-3 and 10^5. play with this range if you like
betas_ridge = np.zeros((len(lambdas), n_features))
trn_mse = np.zeros(len(lambdas))
val_mse = np.zeros(len(lambdas))
for ii in range(len(lambdas)):
## YOUR CODE HERE ##
In [ ]:
## YOUR CODE HERE ##
Explanation here.
In [ ]:
## YOUR CODE HERE ##
Explanation here.