Logistic regression is a scheme for binary classification problems involving $d$ variables $x_i , i =1,\ldots,d$. The output variables $\mathbf{y}$ can take only the value $0$ or $1$. The classification scheme goes as follows:
ther references:
https://github.com/justmarkham/gadsdc1/blob/master/logistic_assignment/kevin_logistic_sklearn.ipynb
https://github.com/jcgillespie/Coursera-Machine-Learning
http://www.ats.ucla.edu/stat/r/dae/logit.htm
http://blog.yhat.com/posts/logistic-regression-and-python.html
http://blog.smellthedata.com/2009/06/python-logistic-regression-with-l2.html
Nandos de Feritas Youtube course- Look at the logistic regression video. The basic idea is to look at the likelihood function. Take the minus of the log-likelihood to get the error function which is then minimized by a gradient descent approach.
In [1]:
import numpy as np
import pandas as pd
%pwd
Out[1]:
The logistic function is $s(z) = \frac{1}{1+e^{-z}}$ and it's derivative is $s'(z) = s(z) \cdot (1-s(z))$.
In [2]:
# data set consists of two variables representingscores on two exams
# and decision on admission: 0 or 1
data = np.loadtxt(r'data/ex2data1.txt', delimiter=',')
X = data[:, 0:2]
y = data[:, 2]
print type(X),X.shape
print len(y)
In [4]:
def sigmoid(z):
return 1.0/(1.0+np.exp(-z))
def der_sigmoid(z):
s = sigmoid(z)
return s*(1.0-s)