We will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assur- ance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly.

We have the test results for some microchips on two different tests. From these two tests, we would like to determine whether the microchips should be accepted or rejected. To help us make the decision, we have a dataset of test results on past microchips, from which we can build a logistic regression model.


In [6]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('ex2data2.txt', header=None)
df.columns = ["test1", "test2", "y"]
pos = df[(df.y == 1)] 
neg = df[(df.y == 0)]
plt.scatter(pos['test1'], pos['test2'], label='accepted')
plt.scatter(neg['test1'], neg['test2'], label='rejected')
plt.legend()
plt.show()