Logistic Regression is the basic learning algorithm for classification problems. It is simple but elegant in its form. This notebook will cover the algorithms of Logistic Regression and build it in Python from scratch.
1) let's first define the notations:
$m$ - Number of training examples
$n$ - Number of features
$X$ - Features
$y$ - Target (discrete)
$(X^{(i)}$,$y^{(i)})$ - $i^{th}$ taining example
$\theta: (\theta_1,\theta_2,...,\theta_n)^T$ - model parameters
2) The hypothesis of the Logistic Regression is:
Here $g(z)$ is called the Sigmoid Function. We will use it for the activation function of neurons later in the Neural Network as well.
3) The log-loss is then used to define the cost function of the Logistic Regression (L2 Regularization Term):
4) To minimize cost function with respect to the parameters, we can calculate the gradient (for $j>=1$):
In [1]:
# Binary Logistic Regression
import numpy as np
import scipy
class BinaryLogisticRegression():
def __init__(self, lamda):
self._lamda = lamda
# Define class variables
_mu = None
_sigma = None
_coef = None
def _feature_norm(self, X):
# Normalize all features to expedite the gradient descent process
mu = np.mean(X, axis=0)
sigma = np.std(X, axis=0)
X_norm = (X - mu) / sigma
return X_norm, mu, sigma
def _sigmoid(self, z):
# Formulate sigmoid function
return 1.0 / (1.0 + np.exp(-1.0 * z))
def _cost_calc(self, theta, X, y):
# Formulate cost function
m, n = X.shape
y = y.reshape((m, 1))
theta = theta.reshape((n, 1))
z = X.dot(theta)
h_x = self._sigmoid(z)
J = -1.0 / m * (y.T.dot(np.log(h_x)) + (1 - y).T.dot(np.log(1 - h_x))) \
+ self._lamda / (2.0 * m) * sum(theta[1:]**2)
return J.ravel()
def _gradient_calc(self, theta, X, y):
# Formulate the gradient of the cost function
m, n = X.shape
y = y.reshape((m, 1))
theta = theta.reshape((n, 1))
z = X.dot(theta)
h_x = self._sigmoid(z)
grad = np.zeros((n, 1))
grad[0] = 1.0 / m * sum(h_x - y)
grad[1:] = 1.0 / m * X[:, 1:].T.dot(h_x - y) + float(self._lamda) / m * theta[1:]
return grad.ravel()
def fit(self, X, y):
# Fit the model
m, n = X.shape
X, self._mu, self._sigma = self._feature_norm(X)
X = np.c_[np.ones((m, 1)), X]
theta = np.zeros(X.shape[1])
result = scipy.optimize.minimize(fun=self._cost_calc, x0=theta, args=(X, y),
method='BFGS', jac=self._gradient_calc,
options={"maxiter": 100, "disp": False})
self._coef = result.x
return self
def predict_prob(self, X):
# predict probabilities with the fitted model
m, n = X.shape
X = np.c_[np.ones((m, 1)), (X - self._mu) / self._sigma]
y_prob = self._sigmoid(X.dot(self._coef.reshape((n+1, 1))))
return y_prob.ravel()
def predict(self, X):
# predict with the fitted model
p = self.predict_prob(X)
y_predict = np.copy(p)
y_predict[p > 0.5] = 1
y_predict[p <= 0.5] = 0
return y_predict.ravel()
We can use the "one-versus-all" scheme to achive the goal. To be specific, for each class, we code it as the positive class (1), while the rest of classes are coded as the negative class (0). We then apply Binary Logistic Regression and predict the probabilities of being positive for each class against all others. The class with the highest probability of being positive will be chosen as the final prediction of the classification task.
In [2]:
class MultiClassLogisticRegression():
def __init__(self, lamda):
self._lamda = lamda
# Define class variables
_values = None
_n_samples = None
_n_classes = None
_lr_list = []
def fit(self, X, y):
# fit the model
self._values = np.unique(y)
self._n_samples = X.shape[0]
self._n_classes = len(self._values)
y_copy = np.array(y)
for value in self._values:
idx = (y == value)
y_copy[idx] = 1
y_copy[~idx] = 0
lr = BinaryLogisticRegression(lamda=self._lamda)
self._lr_list.append(lr.fit(X, y_copy))
return self
def predict_prob(self, X):
# predict the probabilities for each class
y_prob_ini = np.empty((X.shape[0], self._n_classes))
for k, lr in enumerate(self._lr_list):
y_prob_ini[:, k] = lr.predict_prob(X)
row_sums = y_prob_ini.sum(axis=1)
y_prob = y_prob_ini / row_sums[:, np.newaxis]
return y_prob
def predict(self, X):
# predict with one_versus_all scheme
p = self.predict_prob(X)
y_predict_idx = np.argmax(p, axis=1)
y_predict = np.array([self._values[i] for i in y_predict_idx])
return y_predict
It's time to test a data set with the learning tool we just built.
We've loaded the Iris data set from the Scikit-Learn Module. It has 4 features and 3 classes.
In [3]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris['data']
y = iris['target']
print X.shape
print y.shape
print "Number of Classes: {}".format(len(np.unique(y)))
We can split the original data for test purposes
In [4]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=26)
This is a multi-class problem. We then use the defined class MultiClassLogisticRegression() to train the model and predict the results for test examples.
In [5]:
lr = MultiClassLogisticRegression(lamda=1)
lr = lr.fit(X_train, y_train)
y_predict = lr.predict(X_test)
print "True Values: {}".format(y_test)
print "Predicted Values: {}".format(y_predict)
print "Prediction Accuracy: {:.2%}".format(np.mean((y_predict == y_test).astype(float)))
Other things to consider: