Based on Chapter 1 of "Learning from Data: A Short Course"
$h(x) = \text{sign}(w^T x)$
where $x \in X = \{1\} \times R^{d} = \{[x_0,\ldots,x_d]^T | x_0 = 1, x_1 \in R, \ldots, x_d \in R \}$
If the data set $(x_i, y_i) \in D$, where $x_i$ is the data point and $y_i$ is the binary classification $y \in Y = \{-1,+1\}$, can be linearly separated then a $w$ can be found such that $y_i = h(x_i), \forall (x_i,y_i) \in D$
Until $y_i = h(x_i) \forall i$, $w_{t+1} = w_t + y_i x_i, \forall i \text{ s.t. } y_i \ne h(x_i)$
Note that,
Consider the energy function, $E(w_t,x,y) = \sum_i y_i \text{sign}(w^T_t x_i)$, which is maximized when all points are correctly classified,
$\frac{ \partial E }{ \partial w } \approx \sum_i y_i x_i$
indicates that the PLA step $w_{t+1} = w_t + y_i x_i$ is steping in the gradient direction, which is "a move in the right direction".
In [12]:
using PyPlot
In [28]:
c1 = [-5,-2]
D1 = randn(10,2)*2 + ones(10,1)*c1'
c2 = [5,2]
D2 = randn(10,2)*2 + ones(10,1)*c2'
D = [ones(20) [D1; D2]]
y = [ones(10); -1*ones(10)]
plot(D1[:,1],D1[:,2],"ro")
hold(:on)
plot(D2[:,1],D2[:,2],"b+")
Out[28]:
In [30]:
[D y]
Out[30]:
In [27]:
w0 = rand(3)*10
Out[27]:
In [31]:
function h(w,d)
sign(d*w)
end
Out[31]:
In [36]:
W = zeros(30,3)
W[1,:] = w0
for i=2:size(W,1)
H = h(W[i-1,:]',D)
mis = find(H .!= y)
if length(mis) > 0
j = mis[1] # pick first misclassified point
W[i,:] = W[i-1,:] + y[i]*D[i,:]
else
W = W[1:i-1,:]
break
end
end
W
Out[36]:
In [39]:
function drawline(w,style,xmin,xmax,npts=10)
xs = linspace(xmin,xmax,npts)
ys = -w[1]/w[3] - (w[2]/w[3]) * xs
plot(xs,ys,style)
end
Out[39]:
In [43]:
plot(D1[:,1],D1[:,2],"ro")
hold(:on)
plot(D2[:,1],D2[:,2],"b+")
xmin = minimum(D[:,2])
xmax = maximum(D[:,3])
for i = 1:size(W,1)-1
drawline(W[i,:]',"g--",xmin,xmax)
end
drawline(W[end,:]',"g",xmin,xmax)
Out[43]:
In [ ]: