Linear Regression is perhaps the most straightforward supervised learning technique for regression problems. This notebook will walk through the basic algorithms of the model and build this learning tool in python.
1) we can define the notations:
$m$ - Number of training examples
$n$ - Number of features
$X$ - Features
$y$ - Target
$\theta$ - Parameters
$h_\theta(x)$ - Hypothesis
2) The cost function (witn L2 regularization term) is:
3) The Gradient is:
We want to minimize the cost function with repect to $\theta$. Once we know the cost function and the graident, we can apply Gradient Descent or other more advanced optimization algorithms to learn the parameter values.
In [1]:
# Linear Regression
import numpy as np
import scipy
class LinearRegression():
def __init__(self, lamda):
self._lamda = lamda
self._mu = None
self._sigma = None
self._coef = None
def _feature_norm(self, X):
# Normalize all features to expedite the gradient descent process
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_norm = (X - mu) / sigma
return X_norm, mu, sigma
def _cost_calc(self, theta, X, y):
# Formulate cost function
m, n = X.shape
y = y.reshape((m, 1))
theta = theta.reshape((n, 1))
J = 1.0 / (2 * m) * sum((X.dot(theta) - y)**2) \
+ self._lamda / (2.0 * m) * sum(theta[1:]**2)
return J
def _gradient_calc(self, theta, X, y):
# Formulate the gradient of the cost function
m, n = X.shape
y = y.reshape((m, 1))
theta = theta.reshape((n, 1))
grad = np.zeros((n, 1))
grad[0] = 1.0 / m * sum(X.dot(theta) - y)
grad[1:] = 1.0 / m * X[:, 1:].T.dot(X.dot(theta) - y) \
+ float(self._lamda) / m * theta[1:]
return grad.ravel()
def fit(self, X, y):
# Fit the model
m, n = X.shape
X, self._mu, self._sigma = self._feature_norm(X)
X = np.c_[np.ones((m, 1)), X]
theta = np.zeros(X.shape[1])
result = scipy.optimize.minimize(fun=self._cost_calc, x0=theta, args=(X, y),
method='BFGS', jac=self._gradient_calc,
options={"maxiter": 100, "disp": False})
self._coef = result.x
def predict(self, X):
# predict with the fitted model
m, n = X.shape
X = np.c_[np.ones((m, 1)), (X - self._mu) / self._sigma]
y_pred = X.dot(self._coef.reshape((n+1, 1)))
return y_pred
Now, we can use the Boston Housing Data to do a demo.
Sklearn.datasets already incorporate this dataset. We can directly load from there.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_boston.html
In [2]:
from sklearn.datasets import load_boston
boston = load_boston()
X = boston['data']
y = boston['target']
print X.shape
print y.shape
We can then randomly split the train and test set
In [3]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=22)
Finally, we fit the model on training set using the LinearRegression() class, and then predict on the test set to check the Root Mean Squared Error (RMSE)
In [4]:
lr = LinearRegression(lamda=0)
lr.fit(X_train, y_train)
y_pred = lr.predict(X_test)
print "RMSE is {:.4}".format(np.sqrt(np.mean((y_test.reshape((len(y_test), 1)) - y_pred) ** 2)))