Logistic Regression (Classification)

So far we have been looking at regression problems, where the solution are continous variables. Let us look at a binary classification problem - where we need to identify the correct class [0 or 1]



In [1]:

    
import numpy as np
import pandas as pd



In [2]:

    
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('fivethirtyeight')
plt.rcParams['figure.figsize'] = (10, 6)



In [3]:

    
pop = pd.read_csv('data/cars_small.csv')



In [4]:

    
pop.head()









    Out[4]:






  
    
      
      brand
      model
      price
      kmpl
      bhp
      type
    
  
  
    
      0
      Chevrolet
      Beat
      421
      18.6
      79
      Hatchback
    
    
      1
      Chevrolet
      Sail
      551
      18.2
      82
      Sedan
    
    
      2
      Chevrolet
      Sail Hatchback
      468
      18.2
      82
      Hatchback
    
    
      3
      Chevrolet
      Spark
      345
      16.2
      62
      Hatchback
    
    
      4
      Fiat
      Linea Classic
      612
      14.9
      89
      Sedan

Lets say we want to classify the vehicles by 'Hatchback' and 'Sedan'



In [5]:

    
class_mapping = {'Hatchback': 0, 'Sedan': 1}



In [6]:

    
pop['types'] = pop['type'].map(class_mapping)



In [7]:

    
plt.scatter(pop.price, pop.types, c=pop.types, s = 150, alpha = 0.8 )
plt.xlabel('price')
plt.ylabel('types')









    Out[7]:





<matplotlib.text.Text at 0x104fd75f8>

Why Linear Function does not work

Now we can use a linear function and try to choose a cutoff value to show where the two class separate - lets say cutoff = 0.5



In [8]:

    
def ols (df, xlabel, ylabel):
    n = df.shape[0]
    x0 = np.ones(n)
    x1 = df[xlabel]
    X = np.c_[x0, x1]
    X = np.asmatrix(X)
    y = np.transpose(np.asmatrix(df[ylabel]))
    X_T = np.transpose(X)
    X_pseudo = np.linalg.inv(X_T * X) * X_T
    beta = X_pseudo * y
    return beta



In [9]:

    
def plot_ols(df, xlabel, ylabel):
    beta = ols(df, 'price', 'types')
    beta_0 = beta.item(0)
    beta_1 = beta.item(1)
    plt.scatter(df[xlabel], df[ylabel], c=df[ylabel], s = 150, alpha = 0.8 )
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    
    y = beta_0 + beta_1 * df[xlabel]
    plt.plot(df[xlabel], y, '-')
    
    cutoff = (0.5 - beta_0)/beta_1
    plt.vlines(cutoff, -0.4, 1.4)



In [10]:

    
ols(pop, 'price', 'types')









    Out[10]:





matrix([[-0.85225329],
        [ 0.00257551]])



In [11]:

    
plot_ols(pop, 'price', 'types')

However, there are two problems with this approach

The cut-off value is highly influenced by outliers
The value of linear regression is unbounded, while our classification is bounded



In [12]:

    
pop1 = pop.copy()



In [13]:

    
pop1.tail()









    Out[13]:






  
    
      
      brand
      model
      price
      kmpl
      bhp
      type
      types
    
  
  
    
      37
      Toyota
      Etios
      603
      16.8
      89
      Sedan
      1
    
    
      38
      Toyota
      Etios Liva
      500
      17.7
      79
      Hatchback
      0
    
    
      39
      Volkswagen
      Polo
      535
      16.5
      74
      Hatchback
      0
    
    
      40
      Volkswagen
      Up
      360
      21.0
      74
      Hatchback
      0
    
    
      41
      Volkswagen
      Vento
      785
      16.1
      104
      Sedan
      1



In [14]:

    
# Lets create an outlier
pop1.loc[37,'price'] = 1500
pop1.loc[41,'price'] = 2000



In [15]:

    
plot_ols(pop1, 'price', 'types')

Logistic Function

So we need a function which is bounded between 0 < f(X) < 1, for our classification to work. We will use a logit function for this purpose

$$ P(z) = \frac {1}{1 + e^{-z}} $$



In [16]:

    
z = np.linspace(-10, 10, 100)



In [17]:

    
p = 1/(1+np.exp(-z))



In [18]:

    
plt.plot(z,p)
plt.hlines(0.5, -20,20)
plt.vlines(0, 0,1)
plt.xlabel('z')
plt.ylabel('P(z)')









    Out[18]:





<matplotlib.text.Text at 0x1054355f8>

So now we can transpose our linear regression problem with this logit function

$$ y = X\beta $$

becomes

$$ P(y) = P(X\beta) $$$$ y = 0, when \, P(y) >0.5 $$

which is equivalent to $$X\beta > 0 $$

Lets see this in the example in 'price' and 'kmpl' variables, for class $types$

$$ \beta_0 + \beta_1*price + \beta_2*kmpl + \epsilon > 0 $$

So we are looking for a line which will fulfill the above condition



In [19]:

    
plt.scatter(pop['kmpl'], pop['price'], c=pop['types'], s = 150, alpha = 0.8 )
plt.xlabel('kmpl')
plt.ylabel('price')









    Out[19]:





<matplotlib.text.Text at 0x10505dfd0>

Cost Function

$$Cost (P(X\beta), y)) = \begin{cases} -log(P(X\beta)), & \text{if $y=1$ } \\ -log(1 -P(X\beta)), & \text{if $y =0$} \end{cases}$$



In [20]:

    
z = np.linspace(0.001, 0.999, 1000)



In [21]:

    
c1 = -np.log(z)
c2 = -np.log(1-z)



In [22]:

    
plt.plot(z,c1)
plt.plot(z,c2)
#plt.hlines(0.5, -10,10)
#plt.vlines(0, 0,1)
plt.xlabel('z')
plt.ylabel('Cost')









    Out[22]:





<matplotlib.text.Text at 0x10505d978>

Gradient Descent for Logistic Function

To make it easier to work with we can write this in one line - Log-Likelihood Function

$$ C(\beta) = \frac{1}{n} (- y^T * log(P(X\beta)) - (1- y^T) * log(1 -P(X\beta))) $$

If we were to differentiate this, we will get our gradient which is very similiar to our linear regression one:

$$ \nabla C(\beta) = \frac {2}{n} X^T(P(X\beta)−y) $$

and are gradient descent algorithm will be

$$ \beta_{i+1} = \beta_{i} - \eta * \nabla C(\beta)$$



In [23]:

    
n = pop.shape[0]
x0 = np.ones(n)
x1 = pop.kmpl
x2 = pop.price
X_actual = np.c_[x1, x2]



In [24]:

    
X_norm = (X_actual - np.mean(X_actual, axis=0)) / np.std(X_actual, axis=0)



In [25]:

    
X = np.c_[x0, X_norm]



In [26]:

    
X = np.asmatrix(X)
y = np.asmatrix(pop.types.values.reshape(-1,1))
b = np.asmatrix([[0],[0],[0]])



In [27]:

    
def P(z): 
    return 1.0/(1+np.exp(-z))



In [28]:

    
def cost(X,y,b,n):
    C = (- y.T*np.log(P(X*b))-(1-y.T)*np.log(1-P(X*b)))/n
    return C[0,0]



In [29]:

    
def gradient(X,y,b,n):
    g = (2/n)*X.T*(P(X*b) - y)
    return g



In [30]:

    
def gradient_descent_logistic (eta, epochs, X, y, n):
    
    # Set Initial Values 
    b = np.asmatrix([[0],[0],[0]])
    c = cost(X,y,b,n)
    c_all = []
    c_all.append(c)

    # Run the calculation for those many epochs
    for i in range(epochs):
        g = gradient(X,y,b,n)
        b = b - eta * g
        c = cost(X,y,b,n)
        c_all.append(c)
            
    return c_all, b



In [31]:

    
from __future__ import division
x1_min, x1_max = -3, 3
x2_min, x2_max = -3, 3
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, (x1_max - x1_min)/100), 
                       np.arange(x2_min, x2_max, (x2_max - x2_min)/100))
xx = np.c_[np.ones(xx1.ravel().shape[0]), xx1.ravel(), xx2.ravel()]



In [32]:

    
def plot_gradient_descent(eta, epoch, gradient_func):
    
    es, bs = gradient_func(eta, epoch, X, y, n)
    
    # Plot the intercept and coefficients
    plt.subplot(1, 2, 1)
    #plt.tight_layout()
    
    # Plot the probabilty plot contour
    Z = P(xx*bs)
    Z = Z.reshape(xx1.shape)
    cs = plt.contourf(xx1, xx2, Z, cmap=plt.cm.viridis, alpha = 0.5)
    plt.colorbar(cs)
    
    # Plot the intercept and coefficients
    plt.scatter(X[:,1], X[:,2], c=pop.types, s = 150, alpha = 0.8 )
    plt.xlabel('kmpl')
    plt.ylabel('price')
    
    # Plot the error rates
    plt.subplot(1, 2, 2)
    plt.plot(es)
    plt.xlabel('Epochs')
    plt.ylabel('Error')



In [33]:

    
plot_gradient_descent(0.05, 1000, gradient_descent_logistic)

	brand	model	price	kmpl	bhp	type
0	Chevrolet	Beat	421	18.6	79	Hatchback
1	Chevrolet	Sail	551	18.2	82	Sedan
2	Chevrolet	Sail Hatchback	468	18.2	82	Hatchback
3	Chevrolet	Spark	345	16.2	62	Hatchback
4	Fiat	Linea Classic	612	14.9	89	Sedan

	brand	model	price	kmpl	bhp	type	types
37	Toyota	Etios	603	16.8	89	Sedan	1
38	Toyota	Etios Liva	500	17.7	79	Hatchback	0
39	Volkswagen	Polo	535	16.5	74	Hatchback	0
40	Volkswagen	Up	360	21.0	74	Hatchback	0
41	Volkswagen	Vento	785	16.1	104	Sedan	1