We've seen the following types of regression:
We've seen the following changes we can make to a problem to make regression possible:
Finally, we've seen the following modifications due to measurement error:
We saw a sketch of a derivation for the error equations in lecture 2.
Now we'll look at some examples and choose which approach to take from the above types of regression
In [30]:
plt.plot(pop_x, pop_y, 'o')
plt.xlabel('Year')
plt.ylabel('Population [Millions]')
plt.show()
This is a 2 dimensional problem: temperature and yes/no graphene fibers. This can be answered using OLS-ND. One more issue though is that we need an intercept. We would not expect selectivity to be 0 at absolute zero. Therefore our x-matrix will have the following columns: $[1, T, \delta_{g}]$ where $\delta_g$ indicates if graphene fibers are used.
Here's an example of building this with real data. Let's say we did 5 experiments with and 5 experiments without graphene at a range of temperatures.
In [40]:
graphene_used = np.concatenate( (np.ones(5), np.zeros(5)) )
temperature = np.concatenate( (T, T) )
intercept = np.ones(10)
x_mat = np.column_stack( (intercept, temperature, graphene_used) )
print(x_mat)
In [3]:
#Ignore these
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import scipy.stats as ss
In [2]:
pop_x = np.arange(1998, 2016)
pop_y = 275.9 * np.exp((pop_x - 1998) * 0.005) + ss.norm.rvs(size=len(pop_x)) * 0.2
T = np.arange(280, 280 + 5 * 5, 5)