This notebook demonstrates a fundamental motivation of representation learning for machine learning: the XOR function.
Learning the XOR function is impossible for a separating-hyperplane based classifier, unless an alternative representation is employed. Neural networks can be viewed as representation learners, and can therefore learn the XOR function (if the architecture is adequate).
In [ ]:
%matplotlib inline
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
In [ ]:
# Build data set (XOR truth table)
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([0,1,1,0])
In [ ]:
# Build a network with w layers: the first one with two neurons, the second with one
# Observe that each of these neurons individually looks just like a logistic regression model
from keras.models import Sequential
from keras.layers import Dense, Activation
model = Sequential()
model.add(Dense(units=2, input_dim=2,kernel_initializer='random_uniform'))
model.add(Activation('sigmoid'))
model.add(Dense(units=1))
model.add(Activation('sigmoid'))
In [ ]:
# Keras models need to be compiled once they have been defined
# Here we determine the loss function, the optimization algorithm and the
# metrics to be computed during optimization
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
In [ ]:
# This call runs the optimization algorithm, as many times as specified by the "epochs" arguments
# Generally, this network will have a hard time learning the XOR function, even though it can. This illustrates
# a common problem with neural networks: they are difficult to train
model.fit(X, y, epochs=10, verbose=1, batch_size=4)
In [ ]:
# This call applies the function learned by the network to the input data
model.predict(X)
In [ ]:
# We can print the parameters learned by the network
weights = model.get_weights()
weights
In [ ]:
# We can also modify them by hand.
# By formalizing the problem we can derive a set of inequalities that lead to suitable weights (try it at home!)
def sigmoid(x):
return 1/(1+np.exp(-x))
# Given a set of weights for the first layer and one weight for the second,
# this gives us a range in which we can look for the remaining one
for a in range(3):
print sigmoid(a)/sigmoid(2*a)
# These ones are correct for the XOR problem
weights[0][0][0]=1
weights[0][0][1]=2
weights[0][1][0]=1
weights[0][1][1]=2
weights[1][0]=0
weights[1][1]=0
weights[2][0]=-1
weights[2][1]=0.85
weights[3] = [0]
model.set_weights(weights)