Exercise 2: Logistic Regression


In [ ]:
import pandas
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
%matplotlib inline

Part 1: Plotting

We start the exercise by first plotting the data to understand the the problem we are working with.


In [ ]:
data1 = pandas.read_csv("ex2data1.txt", header=None, names=['test1', 'test2', 'accepted'])
data1.head()

Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.


In [ ]:
def plotData(data):
    fig, ax = plt.subplots()
    results_accepted = data[data.accepted == 1]
    results_rejected = data[data.accepted == 0]
    ax.scatter(results_accepted.test1, results_accepted.test2, marker='+', c='b', s=40)
    ax.scatter(results_rejected.test1, results_rejected.test2, marker='o', c='r', s=30)
    return ax

ax = plotData(data1)
ax.set_ylim([20, 130])
ax.legend(['Admitted', 'Not admitted'], loc='best')
ax.set_xlabel('Exam 1 score')
ax.set_ylabel('Exam 2 score')

In [ ]:
X = data1[['test1', 'test2']].values
y = data1.accepted.values
m, n = X.shape
X = np.insert(X, 0, np.ones(len(X)), 1)
m, n

Part 2: Compute Cost and Gradient

In this part of the exercise, you will implement the cost and gradient for logistic regression. You neeed to complete the code in the function cost.


In [ ]:
def sigmoid(z):
    #SIGMOID Compute sigmoid functoon
    #   J = SIGMOID(z) computes the sigmoid of z.
    
    # You need to return the following variables correctly 
    g = np.zeros(z.shape)

    # ====================== YOUR CODE HERE ======================
    # Instructions: Compute the sigmoid of each value of z (z can be a matrix,
    #               vector or scalar).

    
    # =============================================================
    
    return g

In [ ]:
def cost(X, y, theta, lambda_=0):
    #COSTFUNCTION Compute cost and gradient for logistic regression
    #   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
    #   parameter for logistic regression and the gradient of the cost
    #   w.r.t. to the parameters.

    # Initialize some useful values
    m = len(y)
    
    # You need to return the following variables correctly
    J = 0
    
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Compute the cost of a particular choice of theta.
    #               You should set J to the cost.
    #               Compute the partial derivatives and set grad to the partial
    #               derivatives of the cost w.r.t. each parameter in theta
    #

    
    
    # =============================================================
    
    return J

In [ ]:
def gradient(X, y, theta, lambda_=0):
    # Initialize some useful values
    m = len(y)
    
    # You need to return the following variables correctly
    grad = np.zeros(theta.shape)
    
    # ====================== YOUR CODE HERE ======================
    
    
    
    
    
    # =============================================================
    
    return grad

In [ ]:
initial_theta = np.zeros(n + 1)
initial_theta.shape

The cost at initial theta (zeros) should be about 0.693.


In [ ]:
cost(X, y, np.array(initial_theta))

The gradient at initial theta should be [-0.1, -12.01, -11.26].


In [ ]:
gradient(X, y, np.array([0,0,0]))

Part 3: Optimizing using fminunc

In this exercise, you will use a built-in function (scipy.optimize.fmin_ncg) to find the optimal parameters theta.


In [ ]:
def mycost(t):
    return cost(X, y, t)

def mygrad(t):
    return gradient(X, y, t)

optimal_theta = scipy.optimize.fmin_ncg(mycost,
                                        initial_theta,
                                        fprime=mygrad)

Value of theta that minimizes the cost function:


In [ ]:
optimal_theta

We plot the decision boundary.


In [ ]:
ax = plotData(data1)
x_plot = np.array([np.max(X[:, 1]), np.min(X[:,1])])
y_plot = (-optimal_theta[0] - optimal_theta[1]*x_plot) / (optimal_theta[2])
ax.plot(x_plot, y_plot)

Part 4: Predict and Accuracies

After learning the parameters, you'll like to use it to predict the outcomes on unseen data. In this part, you will use the logistic regression model to predict the probability that a student with score 45 on exam 1 and score 85 on exam 2 will be admitted.

Furthermore, you will compute the training and test set accuracies of our model.

Your task is to complete the code in predict.


In [ ]:
def predict(t, x):
    #PREDICT Predict whether the label is 0 or 1 using learned logistic 
    #regression parameters theta
    #   p = PREDICT(theta, X) computes the predictions for X using a 
    #   threshold at 0.5 (i.e., if sigmoid(theta'*x) >= 0.5, predict 1)
    
    m = X.shape[0] # Number of training examples
    
    # You need to return the following variables correctly
    p = np.zeros(m)
    
    # ====================== YOUR CODE HERE ======================
    # Instructions: Complete the following code to make predictions using
    #               your learned logistic regression parameters. 
    #               You should set p to a vector of 0's and 1's
    #


    
    
    # =========================================================================
    
    return p

Let's predict the admission probably of a student with scores 45 and 85:


In [ ]:
0

Training set accuracy:


In [ ]:
np.mean(predict(optimal_theta, X) == y)

Part 2: Regularized logistic regression

In this part, you are given a dataset with data points that are not linearly separable. However, you would still like to use logistic regression to classify the data points.

To do so, you introduce more features to use -- in particular, you add polynomial features to our data matrix (similar to polynomial regression).

You're expected to modify the cost and gradient functions you've already written so that they take the regularization constant into account and perform regularization.


In [ ]:
data2 = pandas.read_csv("./ex2data2.txt", header=None, names=['test1', 'test2', 'accepted'])
data2.head()

In [ ]:
ax = plotData(data2)
ax.legend(['y = 1', 'y = 0'], loc='best')
ax.set_xlabel('Microchip test 1')
ax.set_ylabel('Microchip test 2')

In [ ]:
def mapFeature(x1, x2):
    ret = np.array([x1**(i-j) * x2**j 
                    for i in range(1,7) for j in range(i+1)
                   ])
    return np.insert(ret, 0, np.ones(len(x1)), 0).T

mapFeature(np.array([2,3]),np.array([3,2]))[:, :10]

Note that mapFeature also adds a column of ones for us, so the intercept term is handled.


In [ ]:
X = mapFeature(data2.test1, data2.test2)
y = data2.accepted.values
initial_theta = np.zeros(X.shape[1])
X.shape, y.shape, initial_theta.shape

The cost at the initial theta is:


In [ ]:
cost(X, y, initial_theta, lambda_)

Part 2: Regularization and Accuracies

Optional Exercise: In this part, you will get to try different values of lambda and see how regularization affects the decision coundart

Try the following values of lambda (0, 1, 10, 100).

How does the decision boundary change when you vary lambda? How does the training set accuracy vary?


In [ ]:
lambda_ = 0

In [ ]:
optimal_theta = scipy.optimize.fmin_bfgs(lambda t: cost(X, y, t, lambda_),
                                        initial_theta,
                                        lambda t: gradient(X, y, t, lambda_))

At the optimal theta value, the accuracy is:


In [ ]:
np.mean(predict(optimal_theta, X) == y)

In [ ]:
optimal_theta

The decision boundary:


In [ ]:
contour_x = np.linspace(-1, 1.5)
contour_y = np.linspace(-1, 1.5)
def calc_z(x, y):
    return mapFeature(np.array([x]), np.array([y])).dot(optimal_theta)

z = np.zeros((len(contour_x), len(contour_y)))
for i, c_x in enumerate(contour_x):
    for j, c_y in enumerate(contour_y):
        z[i,j] = calc_z(c_x, c_y)[0]
        
ax = plotData(data2)
ax.contour(contour_x, contour_y, z, levels=[0])