This is a hands-on workshop notebook on deep-learning using python 3. In this notebook, we will learn how to implement a neural network from scratch using numpy. Once we have implemented this network, we will visualize the predictions generated by the neural network and compare it with a logistic regression model, in the form of classification boundaries. This workshop aims to provide an intuitive understanding of neural networks.
In practical code development, there is seldom an use case for building a neural network from scratch. Neural networks in real-world are typically implemented using a deep-learning framework such as tensorflow. But, building a neural network with very minimal dependencies helps one gain an understanding of how neural networks work. This understanding is essential to designing effective neural network models. Also, towards the end of the session, we will use tensorflow deep-learning library to build a neural network, to illustrate the importance of building a neural network using a deep-learning framework.
The XOR gate is an interesting problem in neural networks. Marvin Minsky and Samuel Papert in their book 'Perceptrons' (1969) showed that the XOR gate cannot be solved using a two layer perceptron, since the solution for a XOR gate was not linearly separable. This conclusion lead to a significantly reduced interest in Frank Rosenblatt's perceptrons as a mechanism for building artificial intelligence applications.
Some of these earliest work in AI were using networks or circuits of connected units to simulate intelligent behavior. Examples of this kind of work are called "connectionism". After the publication of 'Perceptrons', the interest in connectionism significantly reduced, till the renewed interest following the works of John Hopfield and David Rumelhart.
The assertions in the book 'Perceptrons' by Minsky was inspite of his thorough knowledge that the powerful perceptrons have multiple layers and that Rosenblatt's basic feed-forward perceptrons have three layers. In the book, to deceive unsuspecting readers, Minsky defined a perceptron as a two-layer machine that can handle only linearly separable problems and, for example, cannot solve the exclusive-OR problem. The Minsky-Papert collaboation is now believed to be a political maneuver and a hatchet job for contract funding by some knowledgeable scientists. This strong, unidimensional and misplaced criticism of perceptrons essentially halted work on practical, powerful artificial intelligence systems that were based on neural-networks for nearly a decade.
Part 1 of this notebook explains how to build a very basic neural network in numpy. This perceptron like neural network is trained to predict the output of a XOR gate.
In [0]:
if input_1 == input_2:
output = 0
else:
output = 1
The XOR gate neural network implemention uses a two layer perceptron with sigmoid activation function. This portion of the notebook is a modified fork of the neural network implementation in numpy by Milo Harper.
In [0]:
import numpy as np
import matplotlib.pyplot as plt
To implement the logistic sigmoid function using numpy, we use the mathematical formula:
In [0]:
import math
x = -1.2
y = 1/(1+math.exp(-x))
print (y)
In [0]:
import numpy as np
y = 1/(1+np.exp(-x))
print (y)
In [0]:
def sigmoid(x, derivative=False):
"""
Parameters:
x: input
derivative: boolean to specify if the derivative of the function should be computed
"""
if derivative:
return (x*(1-x))
return (1/(1+np.exp(-x)))
In [0]:
sigmoid(-1.2, derivative=False)
In [0]:
x = -1.2
y_d = (1/(1+np.exp(x)))*(1-(1/(1+np.exp(x))))
In [0]:
y_d
In [0]:
sigmoid(0.23147521650098238, derivative=True)
In [0]:
xmin= -10
xmax = 10
ymin = -0.1
ymax = 1.1
step_size = 0.01
x = list(np.arange(xmin, xmax, step_size))
y = []
for i in x:
y_i = sigmoid(i)
y.append(y_i)
axis = [xmin, xmax, ymin, ymax]
plt.axhline(y=0.5, color='C2', alpha=0.5)
plt.axvline(x=0, color='C2', alpha=0.5)
plt.axis(axis)
plt.plot(x, y, linewidth=2.0)
In [0]:
import numpy as np
In [0]:
x = np.asarray([[0,0],
[1,1],
[1,0],
[0,1]])
In [0]:
print (x.shape)
In [0]:
x.shape[1]
In [0]:
x.shape[0]
In [0]:
x_ = (1 , 2, 3, 4)
In [0]:
len(x_)
In [0]:
for i in range(len(x_)):
print ("This is the {} element in the tuple".format(i))
print ("The value is: {}".format(x_[i]))
In [0]:
y = np.asarray([[0],
[0],
[1],
[1]])
In [0]:
y.shape
In [0]:
seed = 1
np.random.seed(seed)
In [0]:
bias_val = 1
output_dim = 1
input_shape_1 = x.shape[1]
input_shape_2 = x.shape[0]
hidden_layer_size = 5
synapse_0 = 2*np.random.random((input_shape_1, hidden_layer_size)) - bias_val
synapse_1 = 2*np.random.random((hidden_layer_size, output_dim)) - bias_val
loss_col = []
In [0]:
print (synapse_0.shape)
In [0]:
synapse_0
In [0]:
print (synapse_1.shape)
In [0]:
layer_0 = x
In [0]:
bias_val = 1
layer_1 = sigmoid(np.dot(layer_0, synapse_0) - bias_val)
In [0]:
layer_1.shape
In [0]:
layer_1
In [0]:
layer_2 = sigmoid(np.dot(layer_1, synapse_1) - bias_val)
In [0]:
layer_2.shape
In [0]:
layer_2
This backpropagation example implments the logistic function: $ \frac{\partial E}{\partial o_j} = \frac{\partial E}{\partial o_j} = \frac{\partial }{\partial o_j} {\frac{1}{2}}(t-y)^2 {= y-t}$ and computes layer delta using: $\Delta w_{ij} = -\eta \frac{\partial E}{\partial w_{ij}} $
In this implementation, learning rate ($\eta$) = 1
Read more by following the backpropogation link above.
In [0]:
outputLoss_derivative = (layer_2 - y)
In [0]:
outputLoss_derivative
In [0]:
layer_2_delta = (outputLoss_derivative*sigmoid(layer_2,derivative=True))
In [0]:
layer_2_delta
In [0]:
layer_1_error = (layer_2_delta.dot(synapse_1.T))
In [0]:
layer_1_error
In [0]:
layer_1_delta=layer_1_error*sigmoid(layer_1,derivative=True)
In [0]:
layer_1_delta
In [0]:
synapse_1 += layer_1.T.dot(layer_2_delta)
In [0]:
synapse_1.shape
In [0]:
synapse_0 += layer_0.T.dot(layer_1_delta)
In [0]:
synapse_0
In [0]:
training_steps = 50000
update_freq = 10
input_data = x
output_data = y
bias_val_1 = 1e-2
bias_val_2 = 10
learning_rate = 0.1
for t in range(training_steps):
# Creating the layers of the neural network:
layer_0 = input_data
layer_1 = sigmoid(np.dot(layer_0, synapse_0)+bias_val_1)
layer_2 = sigmoid(np.dot(layer_1, synapse_1)+bias_val_2)
# Backpropagation:
outputLoss_derivative = output_data - layer_2
loss_col.append(np.mean(np.abs(outputLoss_derivative)))
if ((t*update_freq) % training_steps == 0):
print ('Training step :' + str(t))
print ('Prediction error during training :' + str(np.mean(np.abs(outputLoss_derivative))))
# Layer-wsie delta function:
layer_2_delta = (learning_rate*outputLoss_derivative*sigmoid(layer_2, derivative = True))
layer_1_error = layer_2_delta.dot(synapse_1.T) # Matrix multiplication of the layer 2 delta with the transpose of the first synapse function.
layer_1_delta = (layer_1_error*learning_rate)*(sigmoid(layer_1, derivative = True))
# Updating synapses or weights:
synapse_1 += layer_1.T.dot(layer_2_delta)
synapse_0 += layer_0.T.dot(layer_1_delta)
del layer_0
del layer_1
print ('Training completed ...')
print ('Predictions :' + str (layer_2))
In [0]:
plt.plot(loss_col)
plt.show()
delete_model = True
if delete_model:
try:
del loss_col
except:
pass
try:
del input_data
except:
pass
try:
del output_data
except:
pass
try:
del x
except:
pass
try:
del y
except:
pass
try:
del layer_2
except:
pass
try:
del output_data
except:
pass
try:
del synapse_0
except:
pass
try:
del synapse_1
except:
pass
import gc
gc.collect()
In [0]:
import numpy as np
import matplotlib.pyplot as plt
In [0]:
def ReLU(x, h = None, derivative=False):
if derivative:
return x[h < 0]
x_relu = np.maximum(x, 0)
return x_relu
In [0]:
x = list(np.arange(-6.0, 6.0, 0.1))
y = []
for i in x:
y_i = ReLU(i)
y.append(y_i)
xmin= -6
xmax = 6
ymin = 0
ymax = 1
axis = [xmin, xmax, ymin, ymax]
plt.axhline(y=0.5, color='C2', alpha=0.5)
plt.axvline(x=0, color='C2', alpha=0.5)
plt.axis(axis)
plt.plot(x, y, linewidth=2.0)
In [0]:
x = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = np.array([[0],
[1],
[1],
[0]])
In [0]:
N, D_in, H, D_out = hidden_layer_size, x.shape[1], 30, 1
In [0]:
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
In [0]:
learning_rate = 0.002
update_freq = 10
training_steps = 200
loss_col = []
In [0]:
for t in range(training_steps):
# Forward pass: compute predicted y
h = x.dot(w1)
h_relu = np.maximum(h, 0) # using ReLU as activate function
y_pred = h_relu.dot(w2)
# Compute and print loss
loss = np.square(y_pred - y).sum() # squared error as the loss function
loss_col.append(loss)
if ((t*update_freq) % training_steps ==0):
print ('Training step :' + str(t))
print ('Loss function during training :' + str(loss))
# Backprop to compute gradients of w1 and w2 with respect to loss
grad_y_pred = 2.0 * (y_pred - y) # the last layer's error
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T) # the second layer's error
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0 # the derivate of ReLU
grad_w1 = x.T.dot(grad_h)
# Update weights
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
print ('Training completed ...')
print ('Predictions :' + str (y_pred))
In [0]:
plt.plot(loss_col)
plt.show()
In [0]:
delete_model = True
if delete_model:
try:
del loss_col
except:
pass
try:
del input_data
except:
pass
try:
del output_data
except:
pass
try:
del x
except:
pass
try:
del y
except:
pass
try:
del output_data
except:
pass
import gc
gc.collect()
In [0]:
import matplotlib.pyplot as plt # pip3 install matplotlib
import numpy as np # pip3 install numpy
import sklearn # pip3 install scikit-learn
import sklearn.datasets
import sklearn.linear_model
import matplotlib
In [0]:
def tanh(x, derivative=False):
if (derivative == True):
return (1 - (x ** 2))
return np.tanh(x)
In [0]:
x = list(np.arange(-6.0, 6.0, 0.1))
y = []
for i in x:
y_i = tanh(i)
y.append(y_i)
xmin=-6
xmax = 6
ymin = -1.1
ymax = 1.1
axis = [xmin, xmax, ymin, ymax]
plt.axhline(y=0, color='C2', alpha=0.5)
plt.axvline(x=0, color='C2', alpha=0.5)
plt.axis(axis)
plt.plot(x, y, linewidth=2.0)
In [0]:
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (10.0, 8.0)
In [0]:
np.random.seed(0)
X, y = sklearn.datasets.make_moons(200, noise=0.20)
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)
In [0]:
linear_classifier = sklearn.linear_model.LogisticRegressionCV()
linear_classifier.fit(X, y)
In [0]:
def plot_decision_boundary(prediction_function):
# Setting minimum and maximum values for giving the plot function some padding
x_min, x_max = X[:, 0].min() - .5, \
X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, \
X[:, 1].max() + .5
h = 0.01
# Generate a grid of points with distance h between them
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), \
np.arange(y_min, y_max, h))
# Predict the function value for the whole grid
Z = prediction_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# Plotting the contour and training examples
plt.contourf(xx, yy, Z, cmap=plt.cm.get_cmap("Spectral"))
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.get_cmap("Spectral"))
In [0]:
plot_decision_boundary(lambda x: linear_classifier.predict(x))
plt.title("Logistic Regression")
In [0]:
num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality
In [0]:
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength
In [0]:
def loss_function(model):
W1, b1, W2, b2 = model['W1'], \
model['b1'], \
model['W2'], \
model['b2']
z1 = X.dot(W1) + b1
a1 = np.tanh(z1)
z2 = a1.dot(W2) + b2
exp_scores = np.exp(z2)
probabilities = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
# Calculating the loss function:
corect_logprobs = -np.log(probabilities[range(num_examples), y])
data_loss = np.sum(corect_logprobs)
# Adding the regulatization term to the loss function
data_loss += reg_lambda/2 * (np.sum(np.square(W1)) + np.sum(np.square(W2)))
return 1./num_examples * data_loss
In [0]:
def predict(model, x):
W1, b1, W2, b2 = model['W1'], \
model['b1'], \
model['W2'], \
model['b2']
# Design a network with forward propagation
z1 = x.dot(W1) + b1
a1 = np.tanh(z1)
z2 = a1.dot(W2) + b2
exp_scores = np.exp(z2)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
return np.argmax(probs, axis=1)
In [0]:
def build_model(nn_hdim, num_passes=20000, print_loss=False):
# Initialize the parameters to random values. We need to learn these.
np.random.seed(0)
W1 = np.random.randn(nn_input_dim, nn_hdim) / np.sqrt(nn_input_dim)
b1 = np.zeros((1, nn_hdim))
W2 = np.random.randn(nn_hdim, nn_output_dim) / np.sqrt(nn_hdim)
b2 = np.zeros((1, nn_output_dim))
# This is what we return at the end
model = {}
# Gradient descent. For each batch...
for i in range(0, num_passes):
# Forward propagation
z1 = X.dot(W1) + b1
a1 = np.tanh(z1)
z2 = a1.dot(W2) + b2
exp_scores = np.exp(z2)
probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
# Backpropagation
delta3 = probs
delta3[range(num_examples), y] -= 1
dW2 = (a1.T).dot(delta3)
db2 = np.sum(delta3, axis=0, keepdims=True)
delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))
dW1 = np.dot(X.T, delta2)
db1 = np.sum(delta2, axis=0)
# Add regularization terms (b1 and b2 don't have regularization terms)
dW2 += reg_lambda * W2
dW1 += reg_lambda * W1
# Gradient descent parameter update
W1 += -epsilon * dW1
b1 += -epsilon * db1
W2 += -epsilon * dW2
b2 += -epsilon * db2
# Assign new parameters to the model
model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
# Optionally print the loss.
# This is expensive because it uses the whole dataset, so we don't want to do it too often.
if print_loss and i % 1000 == 0:
print("Loss after iteration %i: %f" %(i, loss_function(model)))
return model
In [0]:
model = build_model(50, print_loss=True)
In [0]:
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 50")
In [0]:
plt.figure(figsize=(16, 32))
hidden_layer_dimensions = [1, 2, 3, 4, 5, 20, 50]
for i, nn_hdim in enumerate(hidden_layer_dimensions):
plt.subplot(5, 2, i+1)
plt.title('Hidden Layer size %d' % nn_hdim)
model = build_model(nn_hdim)
plot_decision_boundary(lambda x: predict(model, x))
plt.show()
In [0]:
np.random.seed(0)
X, y = sklearn.datasets.make_moons(20000, noise=0.5)
plt.scatter(X[:,0], X[:,1], s=40, c=y, cmap=plt.cm.Spectral)
In [0]:
linear_classifier = sklearn.linear_model.LogisticRegressionCV()
linear_classifier.fit(X, y)
In [0]:
plot_decision_boundary(lambda x: linear_classifier.predict(x))
plt.title("Logistic Regression")
In [0]:
num_examples = len(X) # training set size
nn_input_dim = 2 # input layer dimensionality
nn_output_dim = 2 # output layer dimensionality
In [0]:
epsilon = 0.01 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength
In [0]:
model = build_model(50, print_loss=True)
In [0]:
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 50")
In [0]:
epsilon = 1e-6 # learning rate for gradient descent
reg_lambda = 0.01 # regularization strength
In [0]:
model = build_model(50, print_loss=True)
In [0]:
plot_decision_boundary(lambda x: predict(model, x))
plt.title("Decision Boundary for hidden layer size 50")
In [0]:
import tensorflow as tf
import numpy as np
In [0]:
x_data = np.float32(np.random.rand(2,500))
y_data = np.dot([0.5, 0.7], x_data) + 0.6
In [0]:
bias = tf.Variable(tf.zeros([1]))
synapses = tf.Variable(tf.random_uniform([1, 2], -1, 1))
In [0]:
y = tf.matmul(synapses, x_data) + bias
In [0]:
lr = 0.01
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(lr)
In [0]:
train = optimizer.minimize(loss)
In [0]:
init = tf.global_variables_initializer()
In [0]:
sess = tf.Session()
sess.run(init)
In [0]:
training_steps = 60000
for step in range (0, training_steps):
sess.run(train)
if step % 1000 == 0:
print ('Current training session: ' + str(step) + str(sess.run(synapses))+ str(sess.run(bias)))
In [0]:
import keras
import numpy as np
import os
from keras import Sequential
from keras.layers import Dense
from keras.optimizers import SGD
In [0]:
MODEL_PATH = './XOR_gate_keras_network.h5'
In [0]:
! wget https://github.com/rahulremanan/python_tutorial/raw/master/Fundamentals_of_deep-learning/weights/XOR_gate_keras_network.h5 -O XOR_gate_keras_network.h5
In [0]:
x = np.array([[0, 0],
[0, 1],
[1, 0],
[1, 1]])
y = np.array([[0],
[1],
[1],
[0]])
In [0]:
model = Sequential()
model.add(Dense(5, activation="relu",
input_shape=(2,)))
model.add(Dense(5, activation="relu"))
model.add(Dense(1, activation="relu"))
In [0]:
optimizer = keras.optimizers.SGD(lr=1e-4)
In [0]:
model.compile(optimizer=optimizer,
loss="binary_crossentropy",
metrics=['accuracy'])
In [0]:
if os.path.exists(MODEL_PATH):
model.load_weights(MODEL_PATH)
In [0]:
model.summary()
In [0]:
! apt-get install -y graphviz libgraphviz-dev && pip3 install pydot graphviz
In [0]:
from keras.utils import plot_model
import pydot
import graphviz # apt-get install -y graphviz libgraphviz-dev && pip3 install pydot graphviz
from IPython.display import SVG
from keras.utils.vis_utils import model_to_dot
In [0]:
output_dir = './'
plot_model(model, to_file= output_dir + '/model_summary_plot.png')
SVG(model_to_dot(model).create(prog='dot', format='svg'))
In [0]:
model.fit(x, y, batch_size=4,epochs=1000)
In [0]:
model.save_weights(MODEL_PATH)
In [0]:
model.predict(x)
In [0]:
from google.colab import files
files.download(MODEL_PATH)
In [0]:
import numpy as np
def sigmoid(x, derivative=False):
if (derivative == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))
def tanh(x, derivative=False):
if (derivative == True):
return (1 - (x ** 2))
return np.tanh(x)
def relu(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
pass # do nothing since it would be effectively replacing x with x
else:
x[i][k] = 0
return x
def arctan(x, derivative=False):
if (derivative == True):
return (np.cos(x) ** 2)
return np.arctan(x)
def step(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(len(x[i])):
if x[i][k] > 0:
x[i][k] = 0
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = 1
else:
x[i][k] = 0
return x
def squash(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(0, len(x[i])):
if x[i][k] > 0:
x[i][k] = (x[i][k]) / (1 + x[i][k])
else:
x[i][k] = (x[i][k]) / (1 - x[i][k])
return x
for i in range(0, len(x)):
for k in range(0, len(x[i])):
x[i][k] = (x[i][k]) / (1 + abs(x[i][k]))
return x
def gaussian(x, derivative=False):
if (derivative == True):
for i in range(0, len(x)):
for k in range(0, len(x[i])):
x[i][k] = -2* x[i][k] * np.exp(-x[i][k] ** 2)
for i in range(0, len(x)):
for k in range(0, len(x[i])):
x[i][k] = np.exp(-x[i][k] ** 2)
return x
This notebook is created to coincide the 90th birth anniversary of pioneering psychologist and artificial intelligence researcher, Frank Rosenblatt, born July 11, 1928 – died July 11, 1971. He is known for his work on connectionism, the incredible Mark 1 Perceptron. This notebook aims to remember the promise, the controversy and the resurgence of connectionism and neural networks as a tool in artificial intelligence.
Here is a brief biography of Frank Rosenblatt (Via Wikipedia):
Frank Rosenblatt was born in New Rochelle, New York as son of Dr. Frank and Katherine Rosenblatt. After graduating from The Bronx High School of Science in 1946, he attended Cornell University, where he obtained his A.B. in 1950 and his Ph.D. in 1956.
He then went to Cornell Aeronautical Laboratory in Buffalo, New York, where he was successively a research psychologist, senior psychologist, and head of the cognitive systems section. This is also where he conducted the early work on perceptrons, which culminated in the development and hardware construction of the Mark I Perceptron in 1960. This was essentially the first computer that could learn new skills by trial and error, using a type of neural network that simulates human thought processes.
Rosenblatt’s research interests were exceptionally broad. In 1959 he went to Cornell’s Ithaca campus as director of the Cognitive Systems Research Program and also as a lecturer in the Psychology Department. In 1966 he joined the Section of Neurobiology and Behavior within the newly formed Division of Biological Sciences, as associate professor. Also in 1966, he became fascinated with the transfer of learned behavior from trained to naive rats by the injection of brain extracts, a subject on which he would publish extensively in later years.
In 1970 he became field representative for the Graduate Field of Neurobiology and Behavior, and in 1971 he shared the acting chairmanship of the Section of Neurobiology and Behavior. Frank Rosenblatt died in July 1971 on his 43rd birthday, in a boating accident in Chesapeake Bay.