Neural Network

Brief Introduction

Neural networks are a form of machine learning inspired by the biological function of the human brain. By giving the program known inputs and outputs of a function, the computer learns the pattern to an uknown function. Once the program has "learned", it is tested against testing inputs in where it predicts the solution. If the user known the solution to the testing inputs, the accuracy of the program can be computed.

Base Question

In the base code of this program, the computer is given a set of 1797 hand written digits from zero to nine. Each 64 pixel digit is given as a 2D array of values correspoding to the pixel shade; similar images will have similar arrays. Each of thse 1797 images also has a "solution" set of the correspond correct answers. After a training period is completed using a fraction of the 1797 image-solution pairs, the computer will be tested on the numerical value of the remaining 797 images. With careful construction of the neural network, the computer's accuracy on the testing set will be well above that of random guessing.

Additional Questions

The first additional question I am exploring was exploring how one would program multiple hidden layers, and if accuracy increases with multiple hidden layers. It will also be interesting to compare the number of neurons in a single hidden layer versus the number of neurons in two hidden layers to achieve maximum accuracy.

The second additional question I am addressing is how many iterations through the training set will it take for the neural net to memorize a "perfect" data set. In this case, I'm defining a "perfect" data set to be one in which there is a unique input for every output. This then means that the neural net will be tested on the exact material it was trained on.



In [1]:

    
import NNpix as npx
from IPython.display import HTML
from IPython.display import display

When a Neuron Fires

The value that a neuron receives is the product of the incoming signal and the weights. The neuron will fire if its "threshold" value is met.



In [2]:

    
npx.neuron_signal









    Out[2]:



In [3]:

    
npx.neuron_no_signal









    Out[3]:

The Activation Funtion

The "threshold" value can be thought of as a step function. We refer to this as the activation function. If the values mutiply to a value greater than 0, the neuron fires. Otherwise it does not. Because we want a smooth function, we can use the sigmoid function as the activate function.

Sigmoid Function: $$ \sigma (x) = \frac{1}{1+ e^{-x}} $$



In [4]:

    
npx.activation









    Out[4]:

Gradient Descent on Untrained Weights

$\hat{y}$ is the untrained network's solution
In an untrained network $\hat{y} \neq y $
Weights need to be changed based on this error.

Error Function: $$C = \frac{1}{2}(y-\hat{y})^{2} $$



In [5]:

    
npx.derivative1









    Out[5]:



In [6]:

    
f = open("HTMLTable.html")



In [7]:

    
display(HTML(f.read()))









    





     
        
         Diagram Equations 
         Partial Derivatives
        
    
    
          $\hat{y}$ 
         $\hat{y}=\sigma(M)$ 
         $\frac{\partial\hat{y}}{\partial M} = \sigma\prime(M) $
        
    
    
         M 
        $ M = \sigma(L) \times w2 $
         $ \frac{\partial M}{\partial L} = \sigma\prime(L) \times w_2 $ 
         $ \frac{\partial M}{\partial w_2} = \sigma(L) $
    
    
          L 
        $ L = K \times w1 $ 
         $ \frac{\partial L}{\partial w_1} = K $
        
    

    




    
        
        Gradients with Chain Rule
        Gradients with Substitution
    
    
         $w_2$ 
         $ \frac{\partial{C}}{\partial{w_2}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{M}} \frac{\partial{M}}{\partial{w_2}}$
        $ \frac{\partial{C}}{\partial{w_2}} = -(y - \hat{y}) \times \sigma \prime(M) \times \sigma(L) $ 
    
    
         $w_1$ 
         $ \frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{M}} \frac{\partial{M}}{\partial{L}} \frac{\partial{L}}{\partial{w_1}}$ 
         $ \frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \times \sigma\prime(M) \times \sigma\prime(L) \times w_2 \times K $



In [8]:

    
f.close()

Changing the weights:

Multiple gradients by a learning rate between 0-1.
Iterate through the training inputs and solutions, the weights change slightly to reduce error each time
Use final "trained" weights to test network

Neural Network with multiple neurons in each layer:

The main difference is arrays are used and the input for each nueron is the sum of all the previous neurons multiplied by their respective weights. If working with arrays, once all the arrays cast together as expected, the neural net may only require minor to no debugging.



In [9]:

    
npx.cneuron1









    Out[9]:

Notebooks to Follow:

Base Code
2 Hidden Layers
Morse Net

Citations:

Photos:

http://www.bbc.co.uk for photo of morse code

Basic Understanding of Neural Nets:

MIT OpenCourseWare https://www.youtube.com/watch?v=q0pm3BrIUFo
Stephen Welch https://www.youtube.com/user/Taylorns34
Jeff Heaton https://www.youtube.com/user/HeatonResearch
"mathematicalmonk" https://www.youtube.com/user/mathematicalmonk
Encog Project http://www.heatonresearch.com/wiki/Main_Page
Neural Networks and Deep Learning by Michael Nielsen.
The Nature of Code by Daniel Shiffman.

Coding Operations:

Using PIL http://effbot.org/imagingbook/image.htm
Basic python operations https://docs.python.org



In [ ]:

	Diagram Equations	Partial Derivatives
$\hat{y}$	$\hat{y}=\sigma(M)$	$\frac{\partial\hat{y}}{\partial M} = \sigma\prime(M) $
M	$ M = \sigma(L) \times w2 $	$ \frac{\partial M}{\partial L} = \sigma\prime(L) \times w_2 $	$ \frac{\partial M}{\partial w_2} = \sigma(L) $
L	$ L = K \times w1 $	$ \frac{\partial L}{\partial w_1} = K $

	Gradients with Chain Rule	Gradients with Substitution
$w_2$	$ \frac{\partial{C}}{\partial{w_2}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{M}} \frac{\partial{M}}{\partial{w_2}}$	$ \frac{\partial{C}}{\partial{w_2}} = -(y - \hat{y}) \times \sigma \prime(M) \times \sigma(L) $
$w_1$	$ \frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \frac{\partial{\hat{y}}}{\partial{M}} \frac{\partial{M}}{\partial{L}} \frac{\partial{L}}{\partial{w_1}}$	$ \frac{\partial{C}}{\partial{w_1}} = -(y-\hat{y}) \times \sigma\prime(M) \times \sigma\prime(L) \times w_2 \times K $