Perceptron

Based on Chapter 1 of "Learning from Data: A Short Course"

classifier

$h(x) = \text{sign}(w^T x)$

where $x \in X = \{1\} \times R^{d} = \{[x_0,\ldots,x_d]^T | x_0 = 1, x_1 \in R, \ldots, x_d \in R \}$

If the data set $(x_i, y_i) \in D$, where $x_i$ is the data point and $y_i$ is the binary classification $y \in Y = \{-1,+1\}$, can be linearly separated then a $w$ can be found such that $y_i = h(x_i), \forall (x_i,y_i) \in D$

Perceptron Learning Algorithm (PLA)

Until $y_i = h(x_i) \forall i$, $w_{t+1} = w_t + y_i x_i, \forall i \text{ s.t. } y_i \ne h(x_i)$

Note that,

  • $y_i w^T_t x_i < 0$
  • $y_i w^T_{t+1} x_i > y_i w^T_t x_i$
  • $w_{t+1} = w_t + y_i x_i$ is "a move in the right direction"

Consider the energy function, $E(w_t,x,y) = \sum_i y_i \text{sign}(w^T_t x_i)$, which is maximized when all points are correctly classified,

$\frac{ \partial E }{ \partial w } \approx \sum_i y_i x_i$

indicates that the PLA step $w_{t+1} = w_t + y_i x_i$ is steping in the gradient direction, which is "a move in the right direction".


In [12]:
using PyPlot


INFO: Loading help data...

Generate some data that is linearly separable


In [28]:
c1 = [-5,-2]
D1 = randn(10,2)*2 + ones(10,1)*c1'
c2 = [5,2]
D2 = randn(10,2)*2 + ones(10,1)*c2'
D = [ones(20) [D1; D2]]
y = [ones(10); -1*ones(10)]

plot(D1[:,1],D1[:,2],"ro")
hold(:on)
plot(D2[:,1],D2[:,2],"b+")


Out[28]:
1-element Array{Any,1}:
 PyObject <matplotlib.lines.Line2D object at 0x114c50450>

In [30]:
[D y]


Out[30]:
20x4 Array{Float64,2}:
 1.0  -4.68766   0.358691   1.0
 1.0  -5.18645  -3.50959    1.0
 1.0  -5.19938  -1.47336    1.0
 1.0  -3.46683  -4.58769    1.0
 1.0  -3.67418   0.233168   1.0
 1.0  -4.47334  -3.40176    1.0
 1.0  -5.8959   -2.12598    1.0
 1.0  -8.0742   -3.58883    1.0
 1.0  -5.7666    0.149347   1.0
 1.0  -2.97506  -0.694423   1.0
 1.0   3.78955  -0.842331  -1.0
 1.0   6.02272   2.83102   -1.0
 1.0   4.56045   0.4476    -1.0
 1.0   6.27679   3.5642    -1.0
 1.0   2.75056   4.70278   -1.0
 1.0   8.357     2.60472   -1.0
 1.0   3.83944   1.33819   -1.0
 1.0   2.74907   2.72218   -1.0
 1.0   5.54815  -1.97183   -1.0
 1.0   5.9011    1.85294   -1.0

Run PLA on the data and observe how the decision boundary changes


In [27]:
w0 = rand(3)*10


Out[27]:
3-element Array{Float64,1}:
 7.41404
 6.04215
 5.28556

In [31]:
function h(w,d)
    sign(d*w)
end


Out[31]:
h (generic function with 1 method)

In [36]:
W = zeros(30,3)
W[1,:] = w0
for i=2:size(W,1)
    H = h(W[i-1,:]',D)
    mis = find(H .!= y)
    if length(mis) > 0
        j = mis[1] # pick first misclassified point
        W[i,:] = W[i-1,:] + y[i]*D[i,:]
    else
        W = W[1:i-1,:]
        break
    end
end

W


Out[36]:
3x3 Array{Float64,2}:
 7.41404   6.04215   5.28556 
 8.41404   0.855702  1.77597 
 9.41404  -4.34368   0.302609

In [39]:
function drawline(w,style,xmin,xmax,npts=10)
    xs = linspace(xmin,xmax,npts)
    ys = -w[1]/w[3] - (w[2]/w[3]) * xs
    plot(xs,ys,style)
end


Out[39]:
drawline (generic function with 2 methods)

In [43]:
plot(D1[:,1],D1[:,2],"ro")
hold(:on)
plot(D2[:,1],D2[:,2],"b+")

xmin = minimum(D[:,2])
xmax = maximum(D[:,3])
for i = 1:size(W,1)-1
    drawline(W[i,:]',"g--",xmin,xmax)
end
drawline(W[end,:]',"g",xmin,xmax)


Out[43]:
1-element Array{Any,1}:
 PyObject <matplotlib.lines.Line2D object at 0x115583b90>

In [ ]: