# Perceptron

Based on Chapter 1 of "Learning from Data: A Short Course"

## classifier

$h(x) = \text{sign}(w^T x)$

where $x \in X = \{1\} \times R^{d} = \{[x_0,\ldots,x_d]^T | x_0 = 1, x_1 \in R, \ldots, x_d \in R \}$

If the data set $(x_i, y_i) \in D$, where $x_i$ is the data point and $y_i$ is the binary classification $y \in Y = \{-1,+1\}$, can be linearly separated then a $w$ can be found such that $y_i = h(x_i), \forall (x_i,y_i) \in D$

## Perceptron Learning Algorithm (PLA)

Until $y_i = h(x_i) \forall i$, $w_{t+1} = w_t + y_i x_i, \forall i \text{ s.t. } y_i \ne h(x_i)$

Note that,

• $y_i w^T_t x_i < 0$
• $y_i w^T_{t+1} x_i > y_i w^T_t x_i$
• $w_{t+1} = w_t + y_i x_i$ is "a move in the right direction"

Consider the energy function, $E(w_t,x,y) = \sum_i y_i \text{sign}(w^T_t x_i)$, which is maximized when all points are correctly classified,

$\frac{ \partial E }{ \partial w } \approx \sum_i y_i x_i$

indicates that the PLA step $w_{t+1} = w_t + y_i x_i$ is steping in the gradient direction, which is "a move in the right direction".



In [12]:

using PyPlot






## Generate some data that is linearly separable



In [28]:

c1 = [-5,-2]
D1 = randn(10,2)*2 + ones(10,1)*c1'
c2 = [5,2]
D2 = randn(10,2)*2 + ones(10,1)*c2'
D = [ones(20) [D1; D2]]
y = [ones(10); -1*ones(10)]

plot(D1[:,1],D1[:,2],"ro")
hold(:on)
plot(D2[:,1],D2[:,2],"b+")




Out[28]:

1-element Array{Any,1}:
PyObject <matplotlib.lines.Line2D object at 0x114c50450>




In [30]:

[D y]




Out[30]:

20x4 Array{Float64,2}:
1.0  -4.68766   0.358691   1.0
1.0  -5.18645  -3.50959    1.0
1.0  -5.19938  -1.47336    1.0
1.0  -3.46683  -4.58769    1.0
1.0  -3.67418   0.233168   1.0
1.0  -4.47334  -3.40176    1.0
1.0  -5.8959   -2.12598    1.0
1.0  -8.0742   -3.58883    1.0
1.0  -5.7666    0.149347   1.0
1.0  -2.97506  -0.694423   1.0
1.0   3.78955  -0.842331  -1.0
1.0   6.02272   2.83102   -1.0
1.0   4.56045   0.4476    -1.0
1.0   6.27679   3.5642    -1.0
1.0   2.75056   4.70278   -1.0
1.0   8.357     2.60472   -1.0
1.0   3.83944   1.33819   -1.0
1.0   2.74907   2.72218   -1.0
1.0   5.54815  -1.97183   -1.0
1.0   5.9011    1.85294   -1.0



## Run PLA on the data and observe how the decision boundary changes



In [27]:

w0 = rand(3)*10




Out[27]:

3-element Array{Float64,1}:
7.41404
6.04215
5.28556




In [31]:

function h(w,d)
sign(d*w)
end




Out[31]:

h (generic function with 1 method)




In [36]:

W = zeros(30,3)
W[1,:] = w0
for i=2:size(W,1)
H = h(W[i-1,:]',D)
mis = find(H .!= y)
if length(mis) > 0
j = mis[1] # pick first misclassified point
W[i,:] = W[i-1,:] + y[i]*D[i,:]
else
W = W[1:i-1,:]
break
end
end

W




Out[36]:

3x3 Array{Float64,2}:
7.41404   6.04215   5.28556
8.41404   0.855702  1.77597
9.41404  -4.34368   0.302609




In [39]:

function drawline(w,style,xmin,xmax,npts=10)
xs = linspace(xmin,xmax,npts)
ys = -w[1]/w[3] - (w[2]/w[3]) * xs
plot(xs,ys,style)
end




Out[39]:

drawline (generic function with 2 methods)




In [43]:

plot(D1[:,1],D1[:,2],"ro")
hold(:on)
plot(D2[:,1],D2[:,2],"b+")

xmin = minimum(D[:,2])
xmax = maximum(D[:,3])
for i = 1:size(W,1)-1
drawline(W[i,:]',"g--",xmin,xmax)
end
drawline(W[end,:]',"g",xmin,xmax)




Out[43]:

1-element Array{Any,1}:
PyObject <matplotlib.lines.Line2D object at 0x115583b90>




In [ ]: