Logistic regression is a tool that we can use to solve classification problems with arbitrary number of classes. To simplify equations, we focus on binary regression problems.
Solving classification problems with linear regression models have some issues:
Figure 4.2 [#] illustrates the difference between the linear function (left) and the logistic function (right). As our data points can only be one of two values on the $y$-axis, a linear regression model is not a good fit since it dips below zero. The logistic function is a better fit because it never estimates outside the $[0, 1]$ interval
Logistic regression models the probability that $y$ is class 1 given $x$ instead of modelling $y$ as a function $x$ directly like in linear regression. In other words, just like a Linear Regression model, a Logistic Regression first computes a weighted sum of the input features (plus the intercept) but instead of outputting the result directly, it outputs the logistic of this result. The logistic model is:
We can rewrite the equation above to:
The left-hand side of the equality is called the odds. We get the log-odds or logit if we take the logarithm of both sides of the equality sign:
In general, $logit(a) = \log(a/(1 − a))$
If we have $p$ predictors $x_1, x_2, \cdots, x_p$, we can collect them in a vector $X=(1, x_1, x_2, \cdots, x_p)$. Similarly, the corresponding coefficients can put in a vector $\beta = (\beta_0, \beta_1, \beta_2, \cdots, \beta_p)$. Now, we can construct a linear transformation:
We can write the logistic function as:
The coefficients $\beta_0$ and $\beta_1$ can be learned using the maximum likelihood method.
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
The logistic function is characterised by the logistic (sigmoid or S) curve:
In [2]:
X = np.linspace(-10, 10, 100)
y = 1/ (1 + np.exp(-1*X))
plt.plot(X, y)
Out[2]:
In [ ]: