In [1]:
# numpy is the main library used by the book
import numpy as np
# cause graphs
%matplotlib inline
import matplotlib.pyplot as plt
This jupyter notebook contains my notes while reading Grokking Deep Learning by Andrew Trask. The book is sold as "a very gentle introduction to Deep Learning" and covers the intuition more than the theory.
Deep Learning (DL) is is an intersection of Machine Learning (ML) & Artificial Intelligence (AI). This book covers the science under the hood of the major DL frameworks so you can understand whats going on when you use popular DL frameworks like Torch, Tensorflow, Keras, etc.
The book covers everything past high school maths needed to grok DL.
To really grok this, find a personal problem I'm interested in which to apply DL to. This could be anything where there is a dataset to predict another. Trask (the author) used Twiter to predict the stock market, which led him from barely knowing programming to a job at a hedge fund in 18 months.
Some projects:
DL uses a subset of ML methods, primarily Artificial Neural Networks. ML is a subset of CS where machines learn tasks for which they weren't explicitly progeammed for.
Two main types of ML:
A parametric model has a fixed number of parameters to change, while a non-parametric model has infinite parameters.
Supervised parametric dl models take in input data, process them based on a fixed number of adjustable parameters and makes a prediction. The ml model learns the optimium parameters by comparing its predictions to the actual truth, then going back and tinkering with the parameters.
Unsupervised parametric dl models are similar to the supervised since they also use parameters, but they cluster the data into groups and come up with as many parameters as needed.
DL algos can be either supervised or unsupervised, and either parametric or non-parametric.
When using data to predict something, you need as many datapoints as you think the neural net needs to be accurate. For example, when trying to predict if something is in an image, you probabbly need to feed the neural net the entire image.
Always present enough information to the network, where "enough information" is definned loosely as how much a human might need to make the same prediction.
In [2]:
# the network
weight = 0.1
def neural_network(input, weight):
prediction = input * weight
return prediction
# using the network to predict something
number_of_toes = [8.5, 9.5, 10, 9]
input = number_of_toes[0]
pred = neural_network(input,weight)
pred
Out[2]:
this is super simple - the input is being multiplied by a weight and returned. The power of NN's lies in the weights and how we update them.
Now the above NN just takes in one input, but practically every thing has multiple inputs.
In [16]:
weights = [0.1, 0.2, 0]
def neural_network(input, weights):
pred = w_sum(input, weights)
return pred
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
def w_sum(a, b):
"""takes in 2 vectors of the same length,
multiples ith item of each list with each other
and adds them all up
OR - we pair up each number in the two vectors, multiply
every pair, then add up the results"""
assert(len(a) == len(b))
return sum([a[i]*b[i] for i in range(len(a))])
for input in zip(toes, wlrec, nfans):
print(neural_network(input, weights))
Before we move on, being able to manipulate vectors is a cornerstone technique for Deep Learning. Some functions to do vector math:
In [30]:
def ele_mul(vec_a, vec_b):
assert(len(vec_a) == len(vec_b))
return [vec_a[i] * vec_b[i] for i in range(len(vec_a))]
def vector_sum(vec_a):
return sum(vec_a)
def ele_add(vec_a, vec_b):
assert(len(vec_a) == len(vec_b))
return [vec_a[i] + vec_b[i] for i in range(len(vec_a))]
def vector_average(vec_a):
return sum(vec_a) / len(vec_a)
a = [2,2,4]
b = [3,3,9]
# to get the dot product of a and b
vector_sum(ele_mul(a,b))
Out[30]:
Testing the wikipedia example of a dot product:
In [22]:
a = [1, 3, -5]
b = [4, -2, -1]
print('my func:', vector_sum(elementwise_multiplication(a,b)))
print('numpy dot func:', np.dot(a,b))
So we've just made a function to do a dot product.
Looking at the 3 input NN above, the most imp thing is the weights:
weights = [0.1, 0.2, 0]
Crudely speaking, the weights tell us the importance of each of the 3 inputs. We can see that the third weight is zero, thus all the third inputs are essentially meningaless, since the dot product of input,weights will always make the third input zero.
The position of weights is important - as NN's get more complex be careful the weights don't get moved around.
In [24]:
weights = np.array([0.1, 0.2, 0])
def neural_network(input, weights):
pred = input.dot(weights)
return pred
toes = np.array([8.5, 9.5, 9.9, 9.0])
wlrec = np.array([0.65, 0.8, 0.8, 0.9])
nfans = np.array([1.2, 1.3, 0.5, 1.0])
# input corresponds to every entry for the first game of the season
# to go through all the inputs
for input in zip(toes,wlrec,nfans):
print(neural_network(np.array(input), weights))
Weights are a superpower. We can make many predictions from just a single input:
In [43]:
weights = [0.3, 0.2, 0.9]
def neural_network(input, weights):
pred = input * np.array(weights)
return pred
wlrec = [0.65, 0.9, 0.8, 0.8]
for input in wlrec:
print(neural_network(input, weights))
Note that the 3 predictions are completely seperate.
In [45]:
#toes #win #fans
weights = [[0.1, 0.1, -0.3],
[0.1, 0.2, 0.0],
[0.0, 1.3, 0.1]]
def neural_network(input, weights):
pred = vect_mat_mul(input,weights)
return pred
def vect_mat_mul(vect,matrix):
out = []
for m in matrix:
out.append(np.dot(vect,m))
return out
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65, 0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
for input in zip(toes,wlrec,nfans):
print(neural_network(input, weights))
In the above, each output has its own set of weights which it uses to take the dot product of with the input, calculating an output.
Each output node takes its own weighted sum of the input and makes a prediction.
The weights used here is a list of lists of weights. This is a matrix.
In [49]:
w_hid = [[0.1,0.2,-0.1],
[-0.1,0.1,0.9],
[0.1,0.4,0.1]]
w_out = [[0.3,1.1,-0.3],
[-0.1,0.2,0.0],
[0.0,1.3,0.1]]
weights = [w_hid, w_out]
def neural_net(input, weights):
hid = np.dot(input, weights[0])
# now we feed the output of the first layer into the next
pred = np.dot(hid, weights[1])
return pred
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
input = list(zip(toes, wlrec, nfans))[0]
neural_net(input,weights)
Out[49]:
In [57]:
a = np.array([0,1,2,3])
b = np.array([4,5,6,7])
c = np.array([[0,1,2,3],
[4,5,6,7]])
d = np.zeros((2,4))#(2x4 matrix of zeros)
e = np.random.rand(2,5) # random 2x5
a* c
Out[57]:
There is a lot more about numpy - see TK
to sum up chp 3, neural networks perform weighted sums of the input.
Above, we made simple NN's which used weights to make predictions. We need to set the weight values better so the NN makes better predictions. We do this by:
First, we need to measure error. A popular technique is the mean squared error: (prediction - actual) ** 2. This amplifies big errors and makes small errors smaller. Which kinda makes sense as we want to make the NN pay more attention to big errors. We square the errors as we only care how much we missed vs negative/positive, and we avoid getting negative/postivie errors cancelling each other out.
The simplest possible way of learning weights:
In [93]:
weight = 0.5
input = 0.5
target = 0.8
step_amount = 0.001
for iteration in range(1101):
pred = input * weight
err = (pred - target) ** 2
#print(f"error {err} Preds {pred}")
up_pred = input * (weight + step_amount)
up_error = (up_pred - target) ** 2
down_pred = input * (weight - step_amount)
down_error = (down_pred - target) ** 2
if up_error < down_error:
weight += step_amount
else:
weight -= step_amount
print(f'final prediction: {input * weight} | weight: {weight}')
Now this way is inefficient and it can be impossible to predict the goal, if say we had a large step value.
Another way is to calculate both the direction and amount to update the weigths from the error.
In [178]:
weight = 0.5
input = 0.5
target = 0.8
alpha = .01
predictions, errors = [], []
for i in range(1500):
pred = input * weight
error = (pred - target) ** 2
# delta measures by how much the target was missed
delta = pred - target
weight_delta = delta * input
weight = weight - weight_delta * alpha
predictions.append(pred)
errors.append(error)
#if i % 220 == 0:
# print(f'Step {i:2d} Error {error:f} Prediction {pred:f}')
plt.xlabel('iterations')
plt.grid()
plt.plot(errors, label="Errors")
plt.plot(predictions, label="Predictions")
plt.legend();
In [59]:
inputs = np.array([0.25,1.0,0.1])
target = 70 # my weight
weights = np.array([1,3,1])
learning_rate = 0.1
for epoch in range(40):
pred = np.dot(inputs, weights)
error = (pred - target)**2
delta_error = input * (pred - target)
weights = weights - (delta_error * learning_rate)
print('pred:', pred)
print('weights:', weights)
In [612]:
# step 1: empty network
weights = [ [0.1, 0.1, -0.3], #hurt
[0.1, 0.2, 0.0], #win
[0.0, 1.3, 0.1] ]#sad?
def neural_network(input, weights):
pred = np.dot(input, weights)
return pred
#Predict
toes = [8.5, 9.5, 9.9, 9.0]
wlrec = [0.65,0.8, 0.8, 0.9]
nfans = [1.2, 1.3, 0.5, 1.0]
hurt = [0.1, 0.0, 0.0, 0.1]
win =[1, 1, 0, 1]
sad = [0.1, 0.0, 0.1, 0.2]
alpha = 0.01
input = [toes[0],wlrec[0],nfans[0]]
true = [hurt[0], win[0], sad[0]]
pred = neural_network(input, weights)
error = [0,0,0]
delta = [0,0,0]
for i in range(len(true)):
error[i] = (pred[i] - true[i]) ** 2
delta[i] = pred[i] - true[i]
def outer_prod(vec_a, vec_b):
out = np.zeros((len(vec_a), len(vec_b)))
for i in range(len(vec_a)):
for j in range(len(vec_b)):
out[i][j] = vec_a[i]*vec_b[j]
return out
#updating the weights
weight_deltas = outer_prod(input,delta) * alpha
weights = np.array(weights) - weight_deltas
neural_network([toes[1],wlrec[1],nfans[1]], weights), [hurt[1], win[1], sad[1]]
Out[612]:
The digits dataset contains 1,797 8x8 images.
In [423]:
# Import datasets, classifiers and performance metrics
from sklearn import datasets, svm, metrics
# The digits dataset
digits = datasets.load_digits()
print(digits.data.shape)
digits.data # the actual data
Out[423]:
The dataset contains a visual representation of the image data:
In [428]:
plt.matshow(digits.images[0])
Out[428]:
digits.target contains what each datapoint represents:
In [427]:
print(digits.target.shape)
digits.target
Out[427]:
So the image above, which looks like a zero, so digits.data[0] should be equal to digits.target[0], is equal to 0.
So now to build out a nerual net to classify this.
In [474]:
inputs = digits.data
targets = digits.target
#one weight per input pixel
hidden_weights = np.random.random([64,10])
output_weights = np.random.random(10)
learning_rate = 0.01
epochs = 10
def sigmoid(x):
return 1 / (1 + np.exp(-x))
So the dot product of each input vector and the weights gives us 10 outputs:
In [473]:
hidden_input = np.dot(inputs[0], hidden_weights)
hidden_output = sigmoid(hidden_input)
hidden_output.shape
Out[473]:
And the dot product of the hi
In [471]:
np.dot(a, output_weights)
Now, the output numbers seem pretty high, so we might want to normalize the incoming data
In [422]:
#inputs = np.random.random([len(digits.data),64]) # a blank input
#for i, item in enumerate(digits.data):
# inputs[i] = ((item - digits.data.mean()) / digits.data.std())
In [386]:
print(inputs.shape)
inputs
Out[386]:
Now calculating the hidden layer
In [387]:
hidden_inputs = np.dot(inputs, hidden_weights)
print(hidden_inputs.shape)
hidden_inputs
Out[387]:
Using the sigmoid function to calculate the output of the hidden layer:
In [388]:
hidden_outputs = sigmoid(hidden_inputs)
print(hidden_outputs.shape)
hidden_outputs
Out[388]:
The hidden layer gives 10 outputs for each of the 1797 data points.
Now to predict the output neural layer:
In [446]:
output_pred = np.dot(hidden_outputs, output_weights)
final_outputs = output_pred.round(0)
print(final_outputs.shape)
final_outputs
Out[446]:
What are the output errors?
In [451]:
output_errors = (final_outputs - digits.target)**2
print(output_errors.shape)
output_errors
Out[451]:
In [454]:
output_delta = final_outputs - digits.target
output_delta[:10]
Out[454]:
Now how to adust the weights now that we have the output errors?
Now, to multiply each of the 1797 final errors with the output weights. So we need to iterate through each of the data arrays and corresponding targets:
In [560]:
#weights
ih_weight = np.random.random([64,10])
ho_weight = np.random.random(10)
learning_rate = 0.01
epochs = 20
for epoch in range(epochs):
for input, target in zip(digits.data, digits.target):
hidden_layer = np.dot(input, ih_weight)
hidden_outputs = sigmoid(hidden_inputs)
output_pred = np.dot(hidden_outputs, ho_weight)
output = output_pred.round(0)
# so now we have a prediction. time to back propagate!
#starting with the output errors:
error = output - target
output_error = error * output * (1 - output)
output_weight_delta = np.dot(ho_weight, output_delta) * learning_rate
ho_weight -= output_weight_delta * learning_rate
def neural_net(input)
In [561]:
output_weight_delta
Out[561]:
In [562]:
ho_weight
Out[562]:
In [563]:
ih_weight_delta.shape
Out[563]:
In [ ]: