In order to verify that the program is working as expected, I have split the data into seperate parts. The first 1200 given images will be training data, and the remaining 597 will be used to test the program's accuracy. The accuracy will be output and graphed over several epochs of training. A successful program will exhibit high accuracy in later epochs.

The network itself will be set up with 64 inputs (one for each pixel, varying from 0.0 to 1.0 based on the grayscale value), a yet undetermined number of hidden neurons, possibly in multiple layers, and 10 outputs (one for each number, with the highest value being treated as the correct answer). The structure of the hidden layers is the subject of the first additional question.

```
In [1]:
```%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

```
In [4]:
```"""One of the key parts of the program is the sigmoid function,
which is used in the activation function as the output of the weighted inputs."""
def sigmoid(z):
return 1/(1 + np.exp(-z))
plt.plot(np.linspace(-10, 10, 100), sigmoid(np.linspace(-10, 10, 100)))

```
Out[4]:
```

```
In [ ]:
```'''These are used to put the training data in the right format'''
trainingdata = digits.data[0:1200]
traininganswers = digits.target[0:1200]
lc = 0.02
'''This converts the outputs into length 10 vectors, representing the
output layer. eg. 6 -> [0, 0, 0, 0, 0, 0, 1, 0, 0, 0]'''
traininganswervectors = np.zeros((1796,10))
for n in range(1796):
traininganswervectors[n][digits.target[n]] = 1

```
In [ ]:
```'''This function is the feedforward function, it calculates the value of
a neuron based on its inputs and bias'''
def feedforward(x, weights, biases):
for w in weights:
for b in biases:
result = np.vectorize(sigmoid(np.dot(a, w) + b))
return result

```
In [ ]:
```'''This is used to find the "minimum" of the error.'''
def GradientDescent(inputs, results, batchsize, lc, epochs):
for n in range(epochs):
#pick random locations for input/result data
locations = np.random.randint(0, len(inputs), batchsize)
minibatch = []
#create tuples (inputs, result) based on random locations
for n2 in batchsize:
minibatch.append((inputs[locations[n2]], results[locations[n2]]))
for n3 in range(batchsize):
train(minibatch[n3], lc)

In order to find the most efficient setup for this problem, I will run several tests with different numbers of hidden neurons and graph them based on accuracy and time. I will also find which learning constant does the same, creating the most accurate or fastest program. From this, I will try to identify an "optimal" program, with the best balance between speed and accuracy.

I will test the following hidden network setups:

20 neurons in 1 layer

10 neurons in 2 layers

5 neurons in 4 layers

4 neurons in 5 layers

These all have the same total number of neurons, but in different layers, so this will primarily test the optimal "shape" of a network.

The learning constant is simpler to vary, I will test values between 0.01 and 0.1. If these end up too similar, I will use a larger variation.

```
In [ ]:
```

Examining the sigmoid function will work similarly to the previous problem. I will test various types of sigmoid function with the program, looking for changes in accuracy and speed. By default, my program is using the function $\frac{1}{1 - e^-x}$ for the sigmoid, but I will also use the following functions:

$\frac{x}{\sqrt{1 + x^2}}$

$tanh(x)$

A piecewise function:

-1 when $x < -1$

x when $-1 <= x <= 1$

1 when $x > 1$

I will see if any of these functions seem to work faster or provide better accuracy.

```
In [ ]:
```