Linear Regression - Part 2

In this tutorial we shall see, where linear regression limitations.

Imports



In [ ]:

    
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model

%matplotlib inline

Data



In [ ]:

    
n_samples = 30

true_fun = lambda X: np.cos(1.5 * np.pi * X)
X = np.sort(np.random.rand(n_samples)).reshape(-1, 1)
Y = true_fun(X) + np.random.randn() * 0.1



In [ ]:

    
X[:5]



In [ ]:

    
Y[:5]

Modelling



In [ ]:

    
regr = linear_model.LinearRegression()
regr.fit(X, Y)

Evaluation



In [ ]:

    
from sklearn.metrics.regression import mean_squared_error
mean_squared_error(Y, regr.predict(X))

Visualisation



In [ ]:

    
pred_Y = regr.predict(X)

plt.figure(figsize=(15, 5))
plt.scatter(X, Y, color='b', alpha=0.4)
plt.plot(X, pred_Y + 0, color='r', alpha=0.4)
plt.legend(['Actual Line', 'Predicted Line + 0 offset'])
plt.xlabel('X - Input values')
plt.ylabel('Y - Response values')

Notes:

Linear Models are great but they have their limitations.
- Example: Like above they cannot describe(non linear) complexity well.

Polynomial Models generally do a great job at defining complex models but they too have theri limits.
- Can take more time
- Explodes with increase in number of dimensions

Overfitting and Underfitting

Read: http://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html