First I'll introduce the theory behind neural nets. then we (emphasis we) will implement one from scratch in numpy, (which is installed on the uni computers) - just type this code into your text editor of choice. I'll also show you how to define a neural net in googles DL library Tensorflow(which is not installed on the uni computers) and train it to clasify handwritten digits.
You will understand things better if you're familiar with calculus and linear algebra, but the only thing you really need to know is basic programming. Don't worry if you don't understand the equations.
For our data-sciencey purposes, it's best to think of a neural network as a function approximator or a statistical model. Surprisingly enough they are made up of a network of neurons. What is a neuron?
WARNING: huge oversimplification that will make neuroscientists cringe.
This is what a neuron in your brain looks like. On the right are the axons, on the left are the dendrites, which recieve signals from the axons of other neurons. The dendrites are connected to the axons with synapses. If the neuron has enough voltage across, it will "spike" and send a signal through its axon to neighbouring neurons. Some synapses are excitory in that if a signal goes through them it will increase the voltage across the next neuron, making it more likely to spike. Others are inhibitory and do the opposite. We learn by changing the strengths of synapses(well, kinda), and that is also usually how artificial neural networks learn.
This is what a the simplest possible artificial neuron looks like. This neuron is connected to two other input neurons named $x_1 $ and $ x_2$ with "synapses" $w_1$ and $w_2$. All of these symbols are just numbers(real/float). To get the neurons output signal $h$, just sum the input neurons up, weighted by their "synapses" then put them through a nonlinear function $ f$: $$ h = f(x_1 w_1 + x_2 w_2)$$
$f$ can be anything that maps a real number to a real number, but for ML you want something nonlinear and smooth. For this neuron, $f$ is the sigmoid function:
$$\sigma(x) = \frac{1}{1+e^{-x}} $$
Sigmoid squashes its output into [0,1], so it's closer to "fully firing" the more positive it's input, and closer to "not firing" the more negative it's input.
If you like to think in terms of graph theory, neurons are nodes and weights are edges. If you have a stats background you might have noticed that this looks similar a logistic regression on two variables. That's because it is!
As you can see, these artificial neurons are only loosely inspired by biological neurons. That's ok, our goal is to have a good model, not simulate a brain.
There are many exciting ways to arange these neurons into a network, but we will focus on one of the easier, more useful topologies called a "two layer perceptron", which looks like this:
Neurons are arranged in layers, with the first hidden layer of neurons connected to a vector(think list of numbers) of input data, $x$, sometimes referred to as an "input layer". Every neuron in a given layer is connected to every neuron in the previous layer.
$$net = \sum_{i=0}^{N}x_i w_i = \vec{x} \cdot \vec{w}$$Where $\vec{x}$ is a vector of previous layer's neuron activations and $\vec{w} $ is a vector of the weights(synapses) for every $x \in \vec{x} $.
Look back at the diagram again. Each of these 4 hidden units will have a vector of 3 weights for each of the inputs. We can arrange them as a 3x4 matrix of row vectors, which we call $W$. Then we can multiply this matrix with $\vec{x}$ and apply our nonlinearity $f$ to get a vector of neuron activations: $$\vec{h} = f( \vec{x} \cdot W )$$ ..actually, in practice we add a unique learnable "bias" $b$ to every neurons weighted sum, which has the effect of shifting the nonlinearity left or right: $$\vec{h} = f( \vec{x} \cdot W + \vec{b} )$$
In [ ]:
In [ ]:
In [2]:
#What is a neural net? A nonlinear function approximator, which can (theoretically) learn to approximate
#any function.
#In this case, we want it to learn a function that maps from any 2d vector x = (length, width)to
# a scalar "y" which is the area of the rectangle represented by x.
#Neural nets learn by example, so we generate a dataset of 300 random rectangles and their areas.
N,D = 300,2 # number of examples, dimension of examples in training set X
X = np.random.uniform(size=(N,D),low=0,high=20)
y = [X[i,0] * X[i,1] for i in range(N)]
class TwoLayerPerceptron:
"""Simple implementation of the most basic neural net"""
def __init__(self,X,H,Y):
N,D = X.shape
N,O = y.shape
# initialize the weights, or "connections between neurons" to random values.
self.W1 = np.random.normal(size=(D,H))
self.b1 = np.zeros(size=(H,))
self.W2 = np.random.normal(size=(H,O))
self.b2 = np.random.normal(size=(O,))
def forward_pass(X):
hidden_inputs = X.dot(W1) + b #matrix multiply
hidden_activations = relu(hidden_inputs)
output = hidden_activations.dot(W2) + b
cache = [X, hidden_inputs, ]
return cache
def backwards_pass(self,cache):
[X,hidden_inputs, hidden_activations, output] = cache
return d_W1, d_W2, d_b1, d_b2
In [3]:
hidden_activations = relu(np.dot(X,W1) + b1)
output = np.dot(hidden_activations,W2)+b2
errors = 0.5 * (output - y)**2
d_h1 = np.dot((output - y),W2.T)
d_b1 = np.sum(d_h1,axis=1)
d_a1 = sigmoid()
d_W2 = np.dot(hidden_Activations, errors)
d_W1 = np.dot(d_h1, W1.T)
W_2 += d_W2
b1 += db1
W_1 += d_W1
In [ ]:
display(Math(r'h_1 = \sigma(X \cdot W_1 + b)'))
In [ ]: