Artificial Neural Network

Basic

Singal Layer
Optimizer: GradientDescentOptimizer



In [1]:

    
import tensorflow as tf
from math import exp

Download images and labels into mnist.test (10K images+labels) and mnist.train (60K images+labels)



In [2]:

    
from tensorflow.examples.tutorials.mnist import \
    input_data as mnist_data

mnist = mnist_data.read_data_sets(
    "data",
    one_hot=True,
    reshape=False,
    validation_size=0
)









    



Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz

Constants



In [3]:

    
width = 28
height = 28
area = width * height
final_nodes = 10

lr = .003



In [4]:

    
# Clear Tensor Names. ( Not required)
tf.reset_default_graph()

Placeholders

Parameters for future functions that will require data to be passed

Input Image: [28, 28, 1] 28px by 28px by 1 channel(grey scale)
- Reshaped to [784] 28 x 28 x 1
Output: [10] prediction form 1-10



In [5]:

    
X = tf.placeholder(tf.float32, [None, width, height, 1]) 
Y_ = tf.placeholder(tf.float32, [None, final_nodes])
# Input Images as a list of pixels 
XX = tf.reshape(X, [-1, area])

Weights and Bias (v1)

Variables for use by the network. These get initialized at the start and train over time.

Two single arrays to be initialized as zero



In [6]:

    
W = tf.Variable(tf.zeros([area, final_nodes]))
B = tf.Variable(tf.zeros([final_nodes]))

Activation Functions

Requires more layers

Regression Function (V1)

Converts network results to prediction



In [7]:

    
Y = tf.nn.softmax(tf.matmul(XX, W) + B)

Loss Function (V1-3)

Calculates the loss by comparing the target and prediction.

There are many ways to compute loss and the result of this function will determine how off the prediction was.

Reduced Mean of: | Y_ * tf.log(Y)



In [8]:

    
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

Optimizer (V1-3)

This loss calculated above will be fed into the optimaizer. The greater the loss metric, the bigger increase to the weights and biases with error.

Back-propagation function for adjusting weights and biases
Uses Gradient Descent



In [9]:

    
optimizer = tf.train.GradientDescentOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)

Accuracy (V1-6)

% of correct answers found in batch



In [10]:

    
is_correct = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

Start Tensor Graph (V1-5)



In [64]:

    
# init = tf.initialize_all_variables() # Deprecated
init = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init)

# AN interactive session
# sess = tf.InteractiveSession()

Training (V1-4)



In [14]:

    
for i in range(1000):

    batch_X, batch_Y = mnist.train.next_batch(100)
    train_data = {X: batch_X, Y_: batch_Y}

    sess.run(train_step, feed_dict=train_data)

Test (V1-6)

Load batch of test images / correct answers and calculate accuracy



In [15]:

    
test_data={X: mnist.test.images, Y_: mnist.test.labels}
a, _ = sess.run([accuracy, cross_entropy], feed_dict=test_data)
print(a)

Version 2: Another layer and better starting points

This section will add another layer of nodes in the middle (a hidden layer).

Multilayer Perceptron (2 layers)
- New starting points for weights and biases
Activation function: sigmoid
Optimizer: GradientDescentOptimizer

Constants (v2)

Adding a hidden layer



In [16]:

    
hidden_layer = 200

Weights and Bias (v2)

Adding Layers: The more layers, the more filters to pick up features in your data.
Weights: truncated_normal() is used to provide various starting weights.
Bias: tf.ones() is now used giving the init values an average starting point instead of zero



In [41]:

    
W1 = tf.Variable(
    tf.truncated_normal(
        [area, hidden_layer], 
        stddev=0.1
    )
)
W2 = tf.Variable(
    tf.truncated_normal(
        [hidden_layer, final_nodes], 
        stddev=0.1
    )
)

B1 = tf.Variable(tf.ones([hidden_layer])/10)
B2 = tf.Variable(tf.ones([final_nodes])/10)

Activation Function (V2)

Activation functions are used on layers to determine the importance of information.

Products of nodes with small values have increased chances of being ignored.



In [42]:

    
Y = tf.nn.sigmoid(tf.matmul(XX, W1) + B1)

Regression Function V2



In [43]:

    
Y = tf.nn.softmax(tf.matmul(Y, W2) + B2)

Version 2a

This section will show how to add x amount of layers.

NOTE: These new layers will cause a drop in accuracy. Why?

Multilayer Perceptron (2 layers)

Constants (v3)

Dynamic number of layers



In [44]:

    
layers = [
        area,
        200,
        100,
        60,
        30,
        final_nodes
    ]

Placeholders

Assign to Y for looping



In [45]:

    
Y = XX

Weights and Bias (v3-5)

A list of Weights and Biases that loop through the layers



In [46]:

    
WW = [
    tf.Variable(tf.truncated_normal(
        [layers[i], layers[i+1]],
        stddev=0.1,
        name="Weights"
    ))
    for i in range(len(layers)-1)
]

BB = [
    tf.Variable(tf.ones([layers[i]])/10, "Biases")
    for i in range(1, len(layers))
]

Activation Function (v2a)

Looping Activations



In [47]:

    
i = 0
for i in range(len(layers)-2):
    name = "activate_" + str(i)
    Y = tf.nn.sigmoid(tf.matmul(Y, WW[i], name=name) + BB[i])

Regression Function (v2a)

Formats the output into a format we can use for training against the target



In [48]:

    
prediction = tf.matmul(Y, WW[i+1]) + BB[i+1]
Y = tf.nn.softmax(prediction)

Version 3: Function Swapping

Multilayer Perceptron (5 layers)
Activation Function: relu
- Replaces sigmoid with relu
Optimizer: AdamOptimizer
- replace GradientDescentOptimiser with AdamOptimizer
Loss Function: Use activation results instead of softmax.
- Adjust function to handle 0 which softmax never returned



In [49]:

    
Y = XX

Activation Functions (v3)

Using Relu



In [50]:

    
i = 0
for i in range(len(layers)-2):
    preactivate = tf.matmul(Y, WW[i], name="Product") + BB[i]
    Y = tf.nn.relu(preactivate)

Regression Functions (v3)

Break out logits for loss function



In [51]:

    
logits = tf.matmul(Y, WW[i+1]) + BB[i+1]
Y = tf.nn.softmax(logits)

Loss Function (V3)

Loss function based upon Activation and not Regression.

NOTE Fixes the issue where the tf.log function tries to compute 0



In [52]:

    
logits = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y_)
cross_entropy = tf.reduce_mean(logits) * 100

Optimizer (V3)

Tensorflow has many optimizers and AdamOptimizer works well with large dimensional layers.



In [53]:

    
optimizer = tf.train.AdamOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)

Version 4: Learning Rates

Multilayer Perceptron (5 layers)
Dynamic Learning Rate: reduces as time goes on. (from .003 to 0.00001)
Activation Function: relu
Optimizer: AdamOptimizer

Constants (v4)



In [54]:

    
lrmax = 0.003
lrmin = 0.00001
decay_speed = 2000.0

Placeholders (V4)

For Learning Curve



In [59]:

    
L = tf.placeholder(tf.float32)



In [34]:

    
# Training (V5-6)
# - Learning rate decreases as time goes on.

for i in range(1000):
    batch_X, batch_Y = mnist.train.next_batch(100)
    learning_rate = lrmin + (lrmax - lrmin) * exp(-i / decay_speed)
    train_data = {X: batch_X, Y_: batch_Y, L: learning_rate}

    # train
    sess.run(train_step, feed_dict=train_data)

Version 6: Dropoff

Multilayer Perceptron (5 layers)
Drop-off: 90% change of keeping a node for training
- Prevents over fitting (The network could find unrelated data important)
Learning Rate: Dynamically reduces as time goes on. (from .003 to 0.00001)
Activation Function: relu
Optimizer: AdamOptimizer

Constants (V5)



In [61]:

    
keep_ratio = 0.9

Placeholders (V5)

For dropoff



In [62]:

    
keep_prob = tf.placeholder(tf.float32)

Activation Functions (v5)

Turns off some nodes. Prevents false positives from being piked up



In [63]:

    
Y = XX
i = 0
for i in range(len(layers)-2):
    name = "activate_" + str(i)
    Y = tf.nn.relu(tf.matmul(Y, WW[i], name=name) + BB[i])
    Y = tf.nn.dropout(Y, keep_prob)

Training (V6)

Turns off some nodes. Prevents false positives from being piked up



In [65]:

    
for i in range(1000):
    batch_X, batch_Y = mnist.train.next_batch(100)
    learning_rate = lrmin + (lrmax - lrmin) * exp(-i / decay_speed)
    train_data = {
        X: batch_X,
        Y_: batch_Y,
        L: learning_rate,
        keep_prob: keep_ratio
    }

    sess.run(train_step, feed_dict=train_data)



In [39]:

    
test_data = {X: mnist.test.images, Y_: mnist.test.labels, keep_prob: 1.0}
a,c = sess.run([accuracy, cross_entropy], feed_dict=test_data)
print(a)



In [ ]: