Linear Regression



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns



In [2]:

    
X = np.random.rand(100)
y = X + 0.1 * np.random.randn(100)



In [3]:

    
plt.scatter(X, y);
plt.show()

Following the steps prescribed by Jake Vanderplas in his awesome text Python Data Science Handbook. He has kindly provided all his codes on github as well.

Step 1. Choose a class of model.

In this case we are using linear regression



In [4]:

    
from sklearn.linear_model import LinearRegression

Step 2. Choose model hyperparameters.



In [5]:

    
model = LinearRegression(fit_intercept=True)

Step 3. Arrange data into a features matrix and target vector



In [6]:

    
X = X.reshape(-1, 1)



In [7]:

    
X.shape









    Out[7]:





(100, 1)

Step 4. Fit the model to your data.



In [8]:

    
model.fit(X, y)









    Out[8]:





LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)



In [9]:

    
model.coef_









    Out[9]:





array([ 0.97408915])



In [10]:

    
model.intercept_









    Out[10]:





0.022535905418693603

If you are statistically trained, you would normally dig into other information such as normality of the residuals and check for autocorrelation etc. You may also want to evaluation the parameters as well. Those are valid statistical modelling questions.

Machine Learning focus is on prediction. You will not find these information with the scikit-learn package. Do take note of this key difference between statistics and machine learning.

Step 5. Predict labels for unknown data



In [11]:

    
x_test = np.linspace(0, 1)
x_test









    Out[11]:





array([ 0.        ,  0.02040816,  0.04081633,  0.06122449,  0.08163265,
        0.10204082,  0.12244898,  0.14285714,  0.16326531,  0.18367347,
        0.20408163,  0.2244898 ,  0.24489796,  0.26530612,  0.28571429,
        0.30612245,  0.32653061,  0.34693878,  0.36734694,  0.3877551 ,
        0.40816327,  0.42857143,  0.44897959,  0.46938776,  0.48979592,
        0.51020408,  0.53061224,  0.55102041,  0.57142857,  0.59183673,
        0.6122449 ,  0.63265306,  0.65306122,  0.67346939,  0.69387755,
        0.71428571,  0.73469388,  0.75510204,  0.7755102 ,  0.79591837,
        0.81632653,  0.83673469,  0.85714286,  0.87755102,  0.89795918,
        0.91836735,  0.93877551,  0.95918367,  0.97959184,  1.        ])



In [12]:

    
y_pred = model.predict(x_test.reshape(-1,1))



In [13]:

    
plt.scatter(X, y)
plt.plot(x_test, y_pred);
plt.show()