Deep Learning - Chapter 6

Theory

Output activation
The function applied to the node in the output layer

  • Binary output (yes/no): Sigmoid
  • Multi-class output (red, green blue): Softmax
  • Gaussian output: linear

Activation functions
The function applied to the nodes in the hidden layer. A full list available at https://en.wikipedia.org/wiki/Activation_function Most commonly used:

  • sigmoid
  • tanh
  • Rectified Linear Unit (ReLu)

Cost functions
The optimization goal. Most commonly used:

  • Mean-Squared error: continous output
  • Cross-entropy: binary / categorical output

Back-propagation
Update the weights of the neural network with the negative derivative of the error multipled by the learning rate

Practical

This section just shows how to use Keras to learn a neural network the XOR function. Keras is a high-level neural network API that builds on TensorFlow or Theano. Be aware that both TensorFlow and Theano WILL use your GPU if available, so your computer can run quite warm or the battery be quickly drained

First off, lets verify that a linear model cannot learn the XOR function


In [26]:
import numpy as np
from keras.models import Sequential
from keras.layers.core import Dense
from sklearn import linear_model
import keras.callbacks

In [27]:
# Truth table
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y = np.array([[0],[1],[1],[0]])

In [28]:
# Make a linear model w. predictions
reg = linear_model.LinearRegression()
reg.fit(X, y)


Out[28]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [29]:
# Should predict everything to be 0.5
reg.predict(X)


Out[29]:
array([[ 0.5],
       [ 0.5],
       [ 0.5],
       [ 0.5]])

In [45]:
# This is the Neural Network using the Keras language

# Just instantiate a new object
model = Sequential() 

# This is the first hidden layer. There are 2 hidden nodes (+1 bias node). The input dimension is two
model.add(Dense(2, input_dim=2, activation='tanh', kernel_initializer='glorot_uniform')) 

# This is the final output layer. There is one binary output
model.add(Dense(1, activation='sigmoid'))

# The model will be optimized using Stochastic Gradient Descent, the target is to minimuze the cross entropy
model.compile(loss='binary_crossentropy', optimizer='sgd')

In [46]:
# Fit the model
model.fit(X, y, batch_size=1, epochs=3000, verbose=0)
print(model.predict_classes(X))


4/4 [==============================] - 0s
[[0]
 [1]
 [1]
 [0]]