Learning XOR

This notebook demonstrates a fundamental motivation of representation learning for machine learning: the XOR function.

Learning the XOR function is impossible for a separating-hyperplane based classifier, unless an alternative representation is employed. Neural networks can be viewed as representation learners, and can therefore learn the XOR function (if the architecture is adequate).



In [ ]:

    
%matplotlib inline
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt



In [ ]:

    
# Build data set (XOR truth table)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0,1,1,0])



In [ ]:

    
# Build a network with w layers: the first one with two neurons, the second with one
# Observe that each of these neurons individually looks just like a logistic regression model
from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential()

model.add(Dense(units=2, input_dim=2,kernel_initializer='random_uniform'))
model.add(Activation('sigmoid'))
model.add(Dense(units=1))
model.add(Activation('sigmoid'))



In [ ]:

    
# Keras models need to be compiled once they have been defined
# Here we determine the loss function, the optimization algorithm and the
# metrics to be computed during optimization
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])



In [ ]:

    
# This call runs the optimization algorithm, as many times as specified by the "epochs" arguments
# Generally, this network will have a hard time learning the XOR function, even though it can. This illustrates
# a common problem with neural networks: they are difficult to train
model.fit(X, y, epochs=10, verbose=1, batch_size=4)



In [ ]:

    
# This call applies the function learned by the network to the input data
model.predict(X)



In [ ]:

    
# We can print the parameters learned by the network
weights = model.get_weights()
weights



In [ ]:

    
# We can also modify them by hand. 
# By formalizing the problem we can derive a set of inequalities that lead to suitable weights (try it at home!)
def sigmoid(x):
    return 1/(1+np.exp(-x))

# Given a set of weights for the first layer and one weight for the second, 
# this gives us a range in which we can look for the remaining one
for a in range(3):
    print sigmoid(a)/sigmoid(2*a)

# These ones are correct for the XOR problem
weights[0][0][0]=1
weights[0][0][1]=2
weights[0][1][0]=1
weights[0][1][1]=2
weights[1][0]=0
weights[1][1]=0
weights[2][0]=-1
weights[2][1]=0.85
weights[3] = [0]
model.set_weights(weights)