The perceptron - limitations

Table of contents

In [2]:
%matplotlib inline
from pylab import *
from utils import *

As in the previous simulation we implement a very simple network. It will only have two input units plus a bias unit, as in the figure. We will give to groups of input patterns to the network. This time the two groups can not be linearly separable. We will see how the perceptron fails in categorizing this kind of data.

Training

Initializing data and parameters

We create input data with pseudo-random number generation. We will start from two sets of points, the centroid groups, and create all patterns in each group (belonging/not belonging) adding noise to each centroid. We take the first set from a circle of radius = 2, while the second group is composed of points nearby the origin. Thus the second group of patterns is surrounded by the first group.


In [3]:
#-------------------------------------------------
# Training


# Constants

# Number of input elements
n = 2        

# Learning rate
eta = 0.0001 

# number of training patterns
n_patterns = 2000

# Number of repetitions of 
# the pattern series
epochs = 30

# Number of timesteps
stime = n_patterns*epochs


# Variables

# we define a set of 20 angles
angles = linspace(-pi, pi,20)

# the first group of centroids is a set of points 
# lying on a circle of radius=2
centroids1 = [ [2*cos(x), 2*sin(x)] for x in angles ]

# the second group of centroids is a set of points 
# lying on a circle of radius=0.001
centroids2 = [ [0.001*cos(x), 0.001*sin(x)] for x in angles ]

# generate training data (function build_dataset in utils.py)
data = build_dataset(n_patterns, centroids1 = centroids1, centroids2=centroids2 )

# Each row of P is an input pattern
P = data[:,:2]

# Each element of o is the desired output 
# relative to an input pattern
o = data[:,2]

# Initialize weights
w = zeros(n+1)

# Initialize the weight history storage
dw = zeros([n+1,stime])

# Initialize the error history storage
squared_errors = zeros(epochs)

Let's plot the input points. Red points belong to the class to be learned, while blue ones do not belong to it. You can see how the blue points form a cloud that is surrounded by a ring of red points.


In [4]:
# limits
upper_bound = P.max(0) + 0.2*(P.max(0)-P.min(0))
lower_bound = P.min(0) - 0.2*(P.max(0)-P.min(0))


# Create the figure
fig = figure(figsize=(4,4))

scatter(*P[(n_patterns/2):,:].T, s = 20,  c = '#ff8888' )
scatter(*P[:(n_patterns/2),:].T, s = 20,  c = '#8888ff' )

xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )

show()


Spreading of the network during training

Here starts the core part, iterating the timesteps. We also divide the training phase in epochs. Each epoch is a single presentation of the whole input pattern series. The sum of squared errors will be grouped by epochs.


In [5]:
# Create a list of pattern indices.
# We will reshuffle it at each 
# repetition of the series
pattern_indices = arange(n_patterns)

# counter of repetitions 
# of the series of patterns
epoch = -1

for t in xrange(stime) :
    
    # Reiterate the input pattern 
    # sequence through timesteps
    
    # Reshuffle at the end 
    # of the series
    if t%n_patterns == 0:
        shuffle(pattern_indices)
        epoch += 1
        
    # Current pattern 
    k = pattern_indices[t%n_patterns]
    
    # MAIN STEP CALCULATIONS
     
    # Bias-plus-input vector
    x = hstack([1, P[k]])
    
    # Weighted sum - !!dot product!!
    net = dot(w, x)
    
    # Activation
    y = step(net)
    
    # Learning
    w += eta*(o[k] - y)*x
       
    # Store current weights
    dw[:,t] = w
    
    # Current error
    squared_errors[epoch] += 0.5*(o[k] - y)**2

Plotting the results of training

We plot the final decision boundary together with the history of the squared errors through epocs. As you see below, the network finds a line that completely divides points belonging to the class from those not belonging to it. The perceptron can only define a linear boundary, thus it cannot find any correct solution. The background of the "Decision boundary" plot is all gray. This was due to the fact that the network is continously exploring all the possible inclinations of the boundary. You can see in the error plot how the curve of the error do not converge to zero. The network cannot find a minimum in this case.


In [6]:
# Create the figure
fig = figure(figsize=(10,4))
ax = fig.add_subplot(121)
ax.set_title('Decision boundary')

# Chose the x-axis coords of the
# two points to plot the decision 
# boundary line
x1 = array([lower_bound[0],upper_bound[0]])

# Calculate the y-axis coords of the
# two points to plot the decision 
# boundary line as it changes 
for t in xrange(stime) :
    
    # Show evert 10th timestep
    if t%10 == 0:
        
        if dw[2,t] != 0 :
            # Evaluate x2 based on current weights
            x2 = -(dw[1,t]*x1 + dw[0,t])/dw[2,t]
        
            # Plot the changes in the boundary line during learning
            ax.plot(x1,x2, c='#cccccc', linewidth = 1, zorder = 1)

# Evaluate x2 ibased on final weights
x2 = -(w[1]*x1 + w[0])/w[2]

# Plot the learned boundary line
plot(x1,x2, c= '#000000', linewidth = 2, zorder = 1)

# Plot in red points belonging to the class
scatter(*P[(n_patterns/2):,:].T, s = 50,  c = '#ff8888', zorder = 2 )       
# Plot in blue points not belonging to the class 
scatter(*P[:(n_patterns/2),:].T, s = 50,  c = '#8888ff', zorder = 2 )

# Limits and labels of the plot
xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )
xlabel("$p_1$", size = 'xx-large')
ylabel("$p_2$", size = 'xx-large')

# Plot squared errors
ax = fig.add_subplot(122)
ax.set_title('Error')
ax.plot(squared_errors)

# Labels and ticks of the plot
xlabel("epochs", size = 'xx-large')
ylabel("SSE", size = 'xx-large')
xticks(range(epochs)) 

show()


Testing

Initializing data and parameters

We now create a new dataset to test the network by generating a cloud of random points allover the space of inputs:


In [7]:
#-------------------------------------------------
# Test

# Number of test patterns
n_patterns = 50000

# Generating test data - we use a single repeated centroid
# so we have a single population of points expanding across
# the decision boundary line  
test_centroid = lower_bound +(upper_bound-lower_bound)/2.0

# Generating test data - build_dataset function from utils.py. 
# We change the standard deviation 
data = build_dataset(n_patterns, 
                      centroids1 = [ test_centroid ],
                      centroids2 = [ test_centroid ],
                      std_deviation = 2.6 )

# Each row of P is a test pattern
P = data[:,:2]

y = zeros(n_patterns)

Classifying the input patterns

We read each pattern and collect the answer of the network


In [8]:
# iterate tests
for t in xrange(n_patterns) :
    
    # Bias-plus-input vector
    x = hstack([1, P[t]])
    
    # Weighted sum - !!dot product!!
    net = dot(w, x)
    
    # Activation
    y[t] = step(net)

Plotting the results of test

We can plot all test patterns using the output of the network to color them. Red and blue dots correspond to patterns belonging or not belonging to the class. You can see that the network divided all inputs into two groups with a linear separation. These two group do not correspond at all to the desired division!!!


In [9]:
# Create the figure
fig = figure(figsize=(5,4))


title('Tests - average error = {}'.format(mean(squared_errors).round(4)))


# Show points
ax = scatter(*P.T, s = 2,  c = y, edgecolors='none', zorder = 2, cmap = cm.coolwarm )

#limits
xlim( [lower_bound[0], upper_bound[0]] )
ylim( [lower_bound[1], upper_bound[1]] )
xlabel("$p_1$", size = 'xx-large')
ylabel("$p_2$", size = 'xx-large')
show()












































The next cell is just for styling


In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("../style/ipybn.css", "r").read()
    return HTML(styles)
css_styling()


Out[1]: