In [1]:
import numpy as np
import scipy as sp
import scipy.optimize
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
We have a dataset which relates the final univerity grade of students to their high-school grades and grades from SAT tests[1]. The dataset is given below and containts the following columns: 1. high school grade point average, 2. Math SAT score, 3. Verbal SAT score, 4. Computer science grade point average, 5. Overall university grade point average.
Compute a linear regression to predict the overall university grade point average from the remaining variables. Hint: the standard linear regression model reads:
$ y = X * \beta $
and the coefficients $\beta$ can be computed using the following formula:
$ \beta = (X^TX)^{-1}X^Ty $
compute the predicted values for the overall university grade ($y$) and the residuals ($y_i - data_i$). Compute the mean residual.
[1] source: http://onlinestatbook.com/2/case_studies/sat.html
In [6]:
import pickle
import pylab
data = pickle.load(open('data.p', 'rb'))
In [ ]: