Introduction to Neural Networks:

Author:

Dr. Rahul Remanan

Dr. Jesse Kanter

CEO and Chief Imagination Officer

Moad Computer

This is a hands-on workshop notebook on deep-learning using python 3. In this notebook, we will learn how to implement a neural network from scratch using numpy. Once we have implemented this network, we will visualize the predictions generated by the neural network and compare it with a logistic regression model, in the form of classification boundaries. This workshop aims to provide an intuitive understanding of neural networks.

In practical code development, there is seldom an use case for building a neural network from scratch. Neural networks in real-world are typically implemented using a deep-learning framework such as tensorflow. But, building a neural network with very minimal dependencies helps one gain an understanding of how neural networks work. This understanding is essential to designing effective neural network models. Also, towards the end of the session, we will use tensorflow deep-learning library to build a neural network, to illustrate the importance of building a neural network using a deep-learning framework.

Architecture of the basic XOR gate neural network:

XOR gate problem and neural networks -- Background:

The XOR gate is an interesting problem in neural networks. Marvin Minsky and Samuel Papert in their book 'Perceptrons' (1969) showed that the XOR gate cannot be solved using a two layer perceptron, since the solution for a XOR gate was not linearly separable. This conclusion lead to a significantly reduced interest in Frank Rosenblatt's perceptrons as a mechanism for building artificial intelligence applications.

Some of these earliest work in AI were using networks or circuits of connected units to simulate intelligent behavior. Examples of this kind of work are called "connectionism". After the publication of 'Perceptrons', the interest in connectionism significantly reduced, till the renewed interest following the works of John Hopfield and David Rumelhart.

The assertions in the book 'Perceptrons' by Minsky was inspite of his thorough knowledge that the powerful perceptrons have multiple layers and that Rosenblatt's basic feed-forward perceptrons have three layers. In the book, to deceive unsuspecting readers, Minsky defined a perceptron as a two-layer machine that can handle only linearly separable problems and, for example, cannot solve the exclusive-OR problem. The Minsky-Papert collaboation is now believed to be a political maneuver and a hatchet job for contract funding by some knowledgeable scientists. This strong, unidimensional and misplaced criticism of perceptrons essentially halted work on practical, powerful artificial intelligence systems that were based on neural-networks for nearly a decade.

Part 1 of this notebook explains how to build a very basic neural network in numpy. This perceptron like neural network is trained to predict the output of a XOR gate.

XOR gate table:

Image below shows an example of a lienarly separable dataset:

Image below shows the XOR gate problem and no linear separation:


In [0]:
if input_1 == input_2:
  output = 0
else:
  output = 1

Part 01a -- Simple neural network as XOR gate using sigmoid activation function:

The XOR gate neural network implemention uses a two layer perceptron with sigmoid activation function. This portion of the notebook is a modified fork of the neural network implementation in numpy by Milo Harper.

Import the dependent libraries -- numpy and matplotlib:


In [0]:
import numpy as np
import matplotlib.pyplot as plt

Create sigmoid function:

  • The sigmoid function takes two input arguments: x and a boolean argument called 'derivative'.
  • When the boolean argument is set as true, the sigmoid function calculates the derivative of x.
  • The derivative of x is required when calculating error or performing back-propagation.
  • The sigmoid function runs in every single neuron.
  • The sigmoid funtion feeds forward the data by converting the numeric matrices to probablities.

To implement the logistic sigmoid function using numpy, we use the mathematical formula:

Backpropagation:

If sigmoid function can be expressed as follows:

Then, the first derivative of this function can be expressed as:

Forwardpropagation and backpropagation functions using sigmoid activation:

Implementing sigmoid function using math library in python:


In [0]:
import math
x  = -1.2
y = 1/(1+math.exp(-x))
print (y)

In [0]:
import numpy as np
y = 1/(1+np.exp(-x))
print (y)

In [0]:
def sigmoid(x, derivative=False):
    """
    Parameters:
      x: input
      derivative: boolean to specify if the derivative of the function should be computed
    """
    if derivative:
        return (x*(1-x))
    return (1/(1+np.exp(-x)))

In [0]:
sigmoid(-1.2, derivative=False)

In [0]:
x = -1.2
y_d = (1/(1+np.exp(x)))*(1-(1/(1+np.exp(x))))

In [0]:
y_d

In [0]:
sigmoid(0.23147521650098238, derivative=True)

Plotting sigmoid activation function:


In [0]:
xmin= -10
xmax = 10
ymin = -0.1
ymax = 1.1
step_size = 0.01

x = list(np.arange(xmin, xmax, step_size))
y = []
for i in x:
  y_i = sigmoid(i)
  y.append(y_i)
  

axis = [xmin, xmax, ymin, ymax]
plt.axhline(y=0.5, color='C2', alpha=0.5)
plt.axvline(x=0, color='C2', alpha=0.5)
plt.axis(axis)
plt.plot(x, y, linewidth=2.0)

Create an input data matrix as numpy array:

  • Matrix with n number of dimensions.

In [0]:
import numpy as np

In [0]:
x = np.asarray([[0,0],
                [1,1],
                [1,0],
                [0,1]])

In [0]:
print (x.shape)

In [0]:
x.shape[1]

In [0]:
x.shape[0]

In [0]:
x_ = (1 , 2, 3, 4)

In [0]:
len(x_)

In [0]:
for i in range(len(x_)):
    print ("This is the {} element in the tuple".format(i))
    print ("The value is: {}".format(x_[i]))

Define the output data matrix as numpy array:


In [0]:
y = np.asarray([[0],
                [0],
                [1],
                [1]])

In [0]:
y.shape

Create a random number seed:

  • Random number seeding is useful for producing reproducible results.

In [0]:
seed = 1
np.random.seed(seed)

Create a synapse matrix:

  • A function applied to the syanpses.
  • For the first synapse, weights matrix of shape: input_shape_1 x input_shape_2 is created.
  • For the second synapse, weights matrix of shape: input_shape_2 x output_dim is created.
  • This function also introduces the first hyper-parameter in neural network tuning called 'bias_val', which is the bias value for the synaptic function.

In [0]:
bias_val = 1

output_dim = 1

input_shape_1 = x.shape[1]
input_shape_2 = x.shape[0]

hidden_layer_size = 5

synapse_0 = 2*np.random.random((input_shape_1, hidden_layer_size)) - bias_val
synapse_1 = 2*np.random.random((hidden_layer_size, output_dim)) - bias_val

loss_col = []

In [0]:
print (synapse_0.shape)

In [0]:
synapse_0

In [0]:
print (synapse_1.shape)

Implement a single forward pass of the XOR input table

Create the input layer


In [0]:
layer_0 = x

Create the activation function for layer 1


In [0]:
bias_val = 1
layer_1 = sigmoid(np.dot(layer_0, synapse_0) - bias_val)

In [0]:
layer_1.shape

In [0]:
layer_1

Create an activation function for layer 2


In [0]:
layer_2 = sigmoid(np.dot(layer_1, synapse_1) - bias_val)

In [0]:
layer_2.shape

In [0]:
layer_2

Part 01b -- Backpropagation

This backpropagation example implments the logistic function: $ \frac{\partial E}{\partial o_j} = \frac{\partial E}{\partial o_j} = \frac{\partial }{\partial o_j} {\frac{1}{2}}(t-y)^2 {= y-t}$ and computes layer delta using: $\Delta w_{ij} = -\eta \frac{\partial E}{\partial w_{ij}} $

In this implementation, learning rate ($\eta$) = 1

Read more by following the backpropogation link above.

Implement a signle backprop pass


In [0]:
outputLoss_derivative = (layer_2 - y)

In [0]:
outputLoss_derivative

In [0]:
layer_2_delta = (outputLoss_derivative*sigmoid(layer_2,derivative=True))

In [0]:
layer_2_delta

In [0]:
layer_1_error = (layer_2_delta.dot(synapse_1.T))

In [0]:
layer_1_error

In [0]:
layer_1_delta=layer_1_error*sigmoid(layer_1,derivative=True)

In [0]:
layer_1_delta

Updating the weights/synapses of the neural network


In [0]:
synapse_1 += layer_1.T.dot(layer_2_delta)

In [0]:
synapse_1.shape

In [0]:
synapse_0 += layer_0.T.dot(layer_1_delta)

In [0]:
synapse_0

Training the simple XOR gate neural network:

  • Note: There is no function that defines a neuron! In practice neuron is just an abstract concept to understand the probability function.
  • Continuously feeding the data throught the neural network.
  • Updating the weights of the network through backpropagation.
  • During the training the model becomes better and better in predicting the output values.
  • The layers are just matrix multiplication functions that apply the sigmoid function to the synapse matrix and the corresponding layer.
  • Backpropagation portion of the training is the machine learning portion of this code.
  • Backpropagation function reduces the prediction errors during each training step.
  • Synapses and weights are synonymous.

In [0]:
training_steps = 50000
update_freq = 10

input_data = x
output_data = y

bias_val_1 = 1e-2
bias_val_2 = 10

learning_rate = 0.1

for t in range(training_steps):
  # Creating the layers of the neural network:
  layer_0 = input_data
  layer_1 = sigmoid(np.dot(layer_0, synapse_0)+bias_val_1)
  layer_2 = sigmoid(np.dot(layer_1, synapse_1)+bias_val_2)
  
  
  # Backpropagation:
  outputLoss_derivative = output_data - layer_2
  loss_col.append(np.mean(np.abs(outputLoss_derivative)))
  if ((t*update_freq) % training_steps == 0):
    print ('Training step :' + str(t))
    print ('Prediction error during training :' + str(np.mean(np.abs(outputLoss_derivative))))
    
  # Layer-wsie delta function:
  layer_2_delta = (learning_rate*outputLoss_derivative*sigmoid(layer_2, derivative = True))  
  layer_1_error = layer_2_delta.dot(synapse_1.T) # Matrix multiplication of the layer 2 delta with the transpose of the first synapse function. 
  layer_1_delta = (layer_1_error*learning_rate)*(sigmoid(layer_1, derivative = True))
  
  # Updating synapses or weights:
  synapse_1 += layer_1.T.dot(layer_2_delta)
  synapse_0 += layer_0.T.dot(layer_1_delta)
  del layer_0
  del layer_1
  
print ('Training completed ...')
print ('Predictions :' + str (layer_2))

In [0]:
plt.plot(loss_col)
plt.show()

delete_model = True

if delete_model:
  try:
    del loss_col
  except:
    pass
  try:
    del input_data
  except:
    pass
  try:
    del output_data
  except:
    pass
  try:
    del x
  except:
    pass
  try:
    del y
  except:
    pass
  try:
    del layer_2
  except:
    pass
  try:
    del output_data
  except:
    pass
  try:
    del synapse_0
  except:
    pass
  try:
    del synapse_1
  except:
    pass
  import gc
  gc.collect()

Part 01b -- Neural network based XOR gate using rectified linear units activation function:


In [0]:
import numpy as np
import matplotlib.pyplot as plt

Plotting the rectified linear units (ReLU) activation function:


In [0]:
def ReLU(x, h = None, derivative=False):
  if derivative:
    return x[h < 0]
  x_relu = np.maximum(x, 0)
  return x_relu

In [0]:
x = list(np.arange(-6.0, 6.0, 0.1))
y = []
for i in x:
  y_i = ReLU(i)
  y.append(y_i)
  
xmin= -6
xmax = 6
ymin = 0
ymax = 1
axis = [xmin, xmax, ymin, ymax]
plt.axhline(y=0.5, color='C2', alpha=0.5)
plt.axvline(x=0, color='C2', alpha=0.5)
plt.axis(axis)
plt.plot(x, y, linewidth=2.0)

Create input and output data


In [0]:
x = np.array([[0, 0], 
              [0, 1], 
              [1, 0], 
              [1, 1]])

y = np.array([[0], 
              [1], 
              [1], 
              [0]])

N is batch size(sample size); D_in is input dimension; H is hidden dimension; D_out is output dimension:


In [0]:
N, D_in, H, D_out = hidden_layer_size, x.shape[1], 30, 1

Randomly initialize weights:


In [0]:
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

In [0]:
learning_rate = 0.002
update_freq = 10
training_steps = 200

loss_col = []

ReLu as the activation function and squared error as the loss function:


In [0]:
for t in range(training_steps):
    # Forward pass: compute predicted y
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)  # using ReLU as activate function
    y_pred = h_relu.dot(w2)

    # Compute and print loss
    loss = np.square(y_pred - y).sum() # squared error as the loss function
    loss_col.append(loss)
    if ((t*update_freq) % training_steps ==0):
      print ('Training step :' + str(t))
      print ('Loss function during training :' + str(loss))

    # Backprop to compute gradients of w1 and w2 with respect to loss
    grad_y_pred = 2.0 * (y_pred - y) # the last layer's error
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T) # the second layer's error 
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0  # the derivate of ReLU
    grad_w1 = x.T.dot(grad_h)

    # Update weights
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2
    
print ('Training completed ...')
print ('Predictions :' + str (y_pred))

In [0]:
plt.plot(loss_col)
plt.show()

In [0]:
delete_model = True

if delete_model:
  try:
    del loss_col
  except:
    pass
  try:
    del input_data
  except:
    pass
  try:
    del output_data
  except:
    pass
  try:
    del x
  except:
    pass
  try:
    del y
  except:
    pass
  try:
    del output_data
  except:
    pass
  
  import gc
  gc.collect()

Importing dependent libraries:


In [0]:
import matplotlib.pyplot as plt # pip3 install matplotlib
import numpy as np # pip3 install numpy
import sklearn # pip3 install scikit-learn
import sklearn.datasets
import sklearn.linear_model
import matplotlib

Plotting hyperbolic tan (tanh) activation function:


In [0]:
def tanh(x, derivative=False):
    if (derivative == True):
        return (1 - (x ** 2))
    return np.tanh(x)

In [0]:
x = list(np.arange(-6.0, 6.0, 0.1))
y = []
for i in x:
  y_i = tanh(i)
  y.append(y_i)
  
xmin=-6
xmax = 6
ymin = -1.1
ymax = 1.1
axis = [xmin, xmax, ymin, ymax]
plt.axhline(y=0, color='C2', alpha=0.5)
plt.axvline(x=0, color='C2', alpha=0.5)
plt.axis(axis)
plt.plot(x, y, linewidth=2.0)

Display plots inline and change default figure size:


In [0]:
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)

Generate a dataset and create a plot:


In [0]:
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.20)
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)

Train the logistic regression classifier:

The classification problem can be summarized as creating a boundary between the red and the blue dots.


In [0]:
linear_classifier = sklearn.linear_model.LogisticRegressionCV()
linear_classifier.fit(X, y)

Visualize the logistic regression classifier output:


In [0]:
def plot_decision_boundary(prediction_function):
  # Setting minimum and maximum values for giving the plot function some padding
  x_min, x_max = X[:, 0].min() - .5, \
                 X[:, 0].max() + .5
  y_min, y_max = X[:, 1].min() - .5, \
                 X[:, 1].max() + .5
  h = 0.01
  # Generate a grid of points with distance h between them
  xx, yy = np.meshgrid(np.arange(x_min, x_max, h), \
                       np.arange(y_min, y_max, h))
  # Predict the function value for the whole grid
  Z = prediction_function(np.c_[xx.ravel(), yy.ravel()])
  Z = Z.reshape(xx.shape)
  # Plotting the contour and training examples
  plt.contourf(xx, yy, Z, cmap=plt.cm.get_cmap("Spectral"))
  plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.get_cmap("Spectral"))

Plotting the decision boundary:


In [0]:
plot_decision_boundary(lambda x: linear_classifier.predict(x))
plt.title("Logistic Regression")

Create a neural network:


In [0]:
num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality

Gradient descent parameters:


In [0]:
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

Compute loss function on the dataset:

Calculating predictions using forward propagation


In [0]:
def loss_function(model):
  W1, b1, W2, b2 = model['W1'], \
                   model['b1'], \
                   model['W2'], \
                   model['b2']
  z1 = X.dot(W1) + b1
  a1 = np.tanh(z1)
  z2 = a1.dot(W2) + b2
  exp_scores = np.exp(z2)
  probabilities = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
  # Calculating the loss function:
  corect_logprobs = -np.log(probabilities[range(num_examples), y])
  data_loss = np.sum(corect_logprobs)
  # Adding the regulatization term to the loss function
  data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
  return 1./num_examples * data_loss

Function that predicts the output of either 0 or 1:


In [0]:
def predict(model, x):
    W1, b1, W2, b2 = model['W1'], \
                     model['b1'], \
                     model['W2'], \
                     model['b2']
    # Design a network with forward propagation
    z1 = x.dot(W1) + b1
    a1 = np.tanh(z1)
    z2 = a1.dot(W2) + b2
    exp_scores = np.exp(z2)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    return np.argmax(probs, axis=1)

This function learns parameters for the neural network and returns the model:

  • nn_hdim: Number of nodes in the hidden layer
  • num_passes: Number of passes through the training data for gradient descent
  • print_loss: If True, print the loss every 1000 iterations

In [0]:
def build_model(nn_hdim, num_passes=20000, print_loss=False):
    
    # Initialize the parameters to random values. We need to learn these.
    np.random.seed(0)
    W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)
    b1 = np.zeros((1, nn_hdim))
    W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)
    b2 = np.zeros((1, nn_output_dim))

    # This is what we return at the end
    model = {}
    
    # Gradient descent. For each batch...
    for i in range(0, num_passes):

        # Forward propagation
        z1 = X.dot(W1) + b1
        a1 = np.tanh(z1)
        z2 = a1.dot(W2) + b2
        exp_scores = np.exp(z2)
        probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)

        # Backpropagation
        delta3 = probs
        delta3[range(num_examples), y] -= 1
        dW2 = (a1.T).dot(delta3)
        db2 = np.sum(delta3, axis=0, keepdims=True)
        delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))
        dW1 = np.dot(X.T, delta2)
        db1 = np.sum(delta2, axis=0)

        # Add regularization terms (b1 and b2 don't have regularization terms)
        dW2 += reg_lambda * W2
        dW1 += reg_lambda * W1

        # Gradient descent parameter update
        W1 += -epsilon * dW1
        b1 += -epsilon * db1
        W2 += -epsilon * dW2
        b2 += -epsilon * db2
        
        # Assign new parameters to the model
        model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
        
        # Optionally print the loss.
        # This is expensive because it uses the whole dataset, so we don't want to do it too often.
        if print_loss and i % 1000 == 0:
          print("Loss after iteration %i: %f" %(i, loss_function(model)))
    
    return model

Build a model with 50-dimensional hidden layer:


In [0]:
model = build_model(50, print_loss=True)

Plot the decision boundary:


In [0]:
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size  50")

Visualizing the hidden layers with varying sizes:


In [0]:
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, nn_hdim in enumerate(hidden_layer_dimensions):
    plt.subplot(5, 2, i+1)
    plt.title('Hidden Layer size %d' % nn_hdim)
    model = build_model(nn_hdim)
    plot_decision_boundary(lambda x: predict(model, x))
plt.show()

Part 03 -- Example illustrating the importance of learning rate in hyper-parameter tuning:

  • Learning rate is a decreasing function of time.
  • Two forms that are commonly used are:
    • 1) a linear function of time
    • 2) a function that is inversely proportional to the time t

Create a noisier, more complex dataset:


In [0]:
np.random.seed(0)
X, y = sklearn.datasets.make_moons(20000, noise=0.5)
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)

In [0]:
linear_classifier = sklearn.linear_model.LogisticRegressionCV()
linear_classifier.fit(X, y)

In [0]:
plot_decision_boundary(lambda x: linear_classifier.predict(x))
plt.title("Logistic Regression")

In [0]:
num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality

In [0]:
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

In [0]:
model = build_model(50, print_loss=True)

Plotting output of the model that failed to learn, given a set of hyper-parameters:


In [0]:
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size  50")

Adjusting the learning rate such that the neural network re-starts learning:


In [0]:
epsilon = 1e-6 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength

In [0]:
model = build_model(50, print_loss=True)

Plotting the decision boundary layer generated by an improved neural network model:


In [0]:
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size  50")

Part 04 -- Building a neural network using tensorflow:

  • A neural network that predicts the y value given an x value.
  • Implemented using tensorflow, an open-source deep-learning library.

Import dependent libraries:


In [0]:
import tensorflow as tf
import numpy as np

Create a synthetic dataset for training and generating predictions:


In [0]:
x_data = np.float32(np.random.rand(2,500))
y_data = np.dot([0.5, 0.7], x_data) + 0.6
  • Variable objects store tensors in tensorflow.
  • Tensorflow considers all input data tensors.
  • Tensors are 3 dimensional matrices.

Constructing a linear model:


In [0]:
bias = tf.Variable(tf.zeros([1]))
synapses = tf.Variable(tf.random_uniform([1, 2], -1, 1))

In [0]:
y = tf.matmul(synapses, x_data) + bias

Gradient descent optimizer:

  • Imagine the valley with a ball.
  • The goal of the optimizer is to localize the ball to the lowest point in the valley.
  • Loss function will be reduced over the training.
  • Mean squared error as the loss function.

In [0]:
lr = 0.01

loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(lr)

Training function:

  • In tensorflow the computation is wrapped inside a graph.
  • Tensorflow makes it easier to visualize the training sessions.

In [0]:
train = optimizer.minimize(loss)

Initialize the variables for the computational graph:


In [0]:
init = tf.global_variables_initializer()

Launching the tensorflow computational graph:


In [0]:
sess = tf.Session()
sess.run(init)

Training the model:


In [0]:
training_steps = 60000

for step in range (0, training_steps):
  sess.run(train)
  if step % 1000 == 0:
    print ('Current training session: ' + str(step) + str(sess.run(synapses))+ str(sess.run(bias)))

Part 05 -- Neural net XOR gate solver using Tensorflow and Keras


In [0]:
import keras
import numpy as np
import os

from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD

In [0]:
MODEL_PATH = './XOR_gate_keras_network.h5'

In [0]:
! wget https://github.com/rahulremanan/python_tutorial/raw/master/Fundamentals_of_deep-learning/weights/XOR_gate_keras_network.h5 -O XOR_gate_keras_network.h5

Create input and output data


In [0]:
x = np.array([[0, 0], 
              [0, 1], 
              [1, 0], 
              [1, 1]])

y = np.array([[0], 
              [1], 
              [1], 
              [0]])

Create a neural network using Keras Sequential API


In [0]:
model = Sequential()
model.add(Dense(5, activation="relu", 
                input_shape=(2,)))
model.add(Dense(5, activation="relu"))
model.add(Dense(1, activation="relu"))

Select optimizer


In [0]:
optimizer = keras.optimizers.SGD(lr=1e-4)

Compile keras model


In [0]:
model.compile(optimizer=optimizer, 
              loss="binary_crossentropy",
              metrics=['accuracy'])

Load model weights


In [0]:
if os.path.exists(MODEL_PATH):
  model.load_weights(MODEL_PATH)

Summarize keras model


In [0]:
model.summary()

Visualize model architecture


In [0]:
! apt-get install -y graphviz libgraphviz-dev && pip3 install pydot graphviz

In [0]:
from keras.utils import plot_model 
import pydot 
import graphviz # apt-get install -y graphviz libgraphviz-dev && pip3 install pydot graphviz 
from IPython.display import SVG 
from keras.utils.vis_utils import model_to_dot

In [0]:
output_dir = './'
plot_model(model, to_file= output_dir + '/model_summary_plot.png') 
SVG(model_to_dot(model).create(prog='dot', format='svg'))

Train model


In [0]:
model.fit(x, y, batch_size=4,epochs=1000)

Save model weights


In [0]:
model.save_weights(MODEL_PATH)

In [0]:
model.predict(x)

In [0]:
from google.colab import files
files.download(MODEL_PATH)

In [0]:
import numpy as np

def sigmoid(x, derivative=False):
    if (derivative == True):
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))

def tanh(x, derivative=False):
    if (derivative == True):
        return (1 - (x ** 2))
    return np.tanh(x)

def relu(x, derivative=False):
    if (derivative == True):
        for i in range(0, len(x)):
            for k in range(len(x[i])):
                if x[i][k] > 0:
                    x[i][k] = 1
                else:
                    x[i][k] = 0
        return x
    for i in range(0, len(x)):
        for k in range(0, len(x[i])):
            if x[i][k] > 0:
                pass  # do nothing since it would be effectively replacing x with x
            else:
                x[i][k] = 0
    return x

def arctan(x, derivative=False):
    if (derivative == True):
        return (np.cos(x) ** 2)
    return np.arctan(x)

def step(x, derivative=False):
    if (derivative == True):
        for i in range(0, len(x)):
            for k in range(len(x[i])):
                if x[i][k] > 0:
                    x[i][k] = 0
        return x
    for i in range(0, len(x)):
        for k in range(0, len(x[i])):
            if x[i][k] > 0:
                x[i][k] = 1
            else:
                x[i][k] = 0
    return x

def squash(x, derivative=False):
    if (derivative == True):
        for i in range(0, len(x)):
            for k in range(0, len(x[i])):
                if x[i][k] > 0:
                    x[i][k] = (x[i][k]) / (1 + x[i][k])
                else:
                    x[i][k] = (x[i][k]) / (1 - x[i][k])
        return x
    for i in range(0, len(x)):
        for k in range(0, len(x[i])):
            x[i][k] = (x[i][k]) / (1 + abs(x[i][k]))
    return x

def gaussian(x, derivative=False):
    if (derivative == True):
        for i in range(0, len(x)):
            for k in range(0, len(x[i])):
                x[i][k] = -2* x[i][k] * np.exp(-x[i][k] ** 2)
    for i in range(0, len(x)):
        for k in range(0, len(x[i])):
            x[i][k] = np.exp(-x[i][k] ** 2)
    return x

Concluding notes -- Frank Rosenblatt, connectionism and the Perceptron:

Image 1: Frank Rosenblatt working on his Mark 1 Perceptron at Cornell Aeronautical Laboratory in Buffalo, New York, circa 1960.

This notebook is created to coincide the 90th birth anniversary of pioneering psychologist and artificial intelligence researcher, Frank Rosenblatt, born July 11, 1928 – died July 11, 1971. He is known for his work on connectionism, the incredible Mark 1 Perceptron. This notebook aims to remember the promise, the controversy and the resurgence of connectionism and neural networks as a tool in artificial intelligence.

Here is a brief biography of Frank Rosenblatt (Via Wikipedia):

Frank Rosenblatt was born in New Rochelle, New York as son of Dr. Frank and Katherine Rosenblatt. After graduating from The Bronx High School of Science in 1946, he attended Cornell University, where he obtained his A.B. in 1950 and his Ph.D. in 1956.

He then went to Cornell Aeronautical Laboratory in Buffalo, New York, where he was successively a research psychologist, senior psychologist, and head of the cognitive systems section. This is also where he conducted the early work on perceptrons, which culminated in the development and hardware construction of the Mark I Perceptron in 1960. This was essentially the first computer that could learn new skills by trial and error, using a type of neural network that simulates human thought processes.

Rosenblatt’s research interests were exceptionally broad. In 1959 he went to Cornell’s Ithaca campus as director of the Cognitive Systems Research Program and also as a lecturer in the Psychology Department. In 1966 he joined the Section of Neurobiology and Behavior within the newly formed Division of Biological Sciences, as associate professor. Also in 1966, he became fascinated with the transfer of learned behavior from trained to naive rats by the injection of brain extracts, a subject on which he would publish extensively in later years.

In 1970 he became field representative for the Graduate Field of Neurobiology and Behavior, and in 1971 he shared the acting chairmanship of the Section of Neurobiology and Behavior. Frank Rosenblatt died in July 1971 on his 43rd birthday, in a boating accident in Chesapeake Bay.

Image 2: Mark 1 Perceptron at Smithsonian Institute, Washington DC