Artificial Neural Network

Basic

  • Singal Layer
  • Optimizer: GradientDescentOptimizer

In [1]:
import tensorflow as tf
from math import exp

Download images and labels into mnist.test (10K images+labels) and mnist.train (60K images+labels)


In [2]:
from tensorflow.examples.tutorials.mnist import \
    input_data as mnist_data

mnist = mnist_data.read_data_sets(
    "data",
    one_hot=True,
    reshape=False,
    validation_size=0
)


Extracting data/train-images-idx3-ubyte.gz
Extracting data/train-labels-idx1-ubyte.gz
Extracting data/t10k-images-idx3-ubyte.gz
Extracting data/t10k-labels-idx1-ubyte.gz

Constants


In [3]:
width = 28
height = 28
area = width * height
final_nodes = 10

lr = .003

In [4]:
# Clear Tensor Names. ( Not required)
tf.reset_default_graph()

Placeholders

Parameters for future functions that will require data to be passed

  • Input Image: [28, 28, 1] 28px by 28px by 1 channel(grey scale)
    • Reshaped to [784] 28 x 28 x 1
  • Output: [10] prediction form 1-10

In [5]:
X = tf.placeholder(tf.float32, [None, width, height, 1]) 
Y_ = tf.placeholder(tf.float32, [None, final_nodes])
# Input Images as a list of pixels 
XX = tf.reshape(X, [-1, area])

Weights and Bias (v1)

Variables for use by the network. These get initialized at the start and train over time.

  • Two single arrays to be initialized as zero

In [6]:
W = tf.Variable(tf.zeros([area, final_nodes]))
B = tf.Variable(tf.zeros([final_nodes]))

Activation Functions

Requires more layers

Regression Function (V1)

Converts network results to prediction


In [7]:
Y = tf.nn.softmax(tf.matmul(XX, W) + B)

Loss Function (V1-3)

Calculates the loss by comparing the target and prediction.

There are many ways to compute loss and the result of this function will determine how off the prediction was.

  • Reduced Mean of: | Y_ * tf.log(Y)

In [8]:
cross_entropy = -tf.reduce_sum(Y_ * tf.log(Y))

Optimizer (V1-3)

This loss calculated above will be fed into the optimaizer. The greater the loss metric, the bigger increase to the weights and biases with error.

  • Back-propagation function for adjusting weights and biases
  • Uses Gradient Descent

In [9]:
optimizer = tf.train.GradientDescentOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)

Accuracy (V1-6)

% of correct answers found in batch


In [10]:
is_correct = tf.equal(tf.argmax(Y, 1), tf.argmax(Y_, 1))
accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32))

Start Tensor Graph (V1-5)


In [64]:
# init = tf.initialize_all_variables() # Deprecated
init = tf.global_variables_initializer()

sess = tf.Session()
sess.run(init)

# AN interactive session
# sess = tf.InteractiveSession()

Training (V1-4)


In [14]:
for i in range(1000):

    batch_X, batch_Y = mnist.train.next_batch(100)
    train_data = {X: batch_X, Y_: batch_Y}

    sess.run(train_step, feed_dict=train_data)

Test (V1-6)

Load batch of test images / correct answers and calculate accuracy


In [15]:
test_data={X: mnist.test.images, Y_: mnist.test.labels}
a, _ = sess.run([accuracy, cross_entropy], feed_dict=test_data)
print(a)


0.923

Version 2: Another layer and better starting points

This section will add another layer of nodes in the middle (a hidden layer).

  • Multilayer Perceptron (2 layers)
    • New starting points for weights and biases
  • Activation function: sigmoid
  • Optimizer: GradientDescentOptimizer

Constants (v2)

Adding a hidden layer


In [16]:
hidden_layer = 200

Weights and Bias (v2)

  • Adding Layers: The more layers, the more filters to pick up features in your data.
  • Weights: truncated_normal() is used to provide various starting weights.
  • Bias: tf.ones() is now used giving the init values an average starting point instead of zero

In [41]:
W1 = tf.Variable(
    tf.truncated_normal(
        [area, hidden_layer], 
        stddev=0.1
    )
)
W2 = tf.Variable(
    tf.truncated_normal(
        [hidden_layer, final_nodes], 
        stddev=0.1
    )
)

B1 = tf.Variable(tf.ones([hidden_layer])/10)
B2 = tf.Variable(tf.ones([final_nodes])/10)

Activation Function (V2)

Activation functions are used on layers to determine the importance of information.

Products of nodes with small values have increased chances of being ignored.


In [42]:
Y = tf.nn.sigmoid(tf.matmul(XX, W1) + B1)

Regression Function V2


In [43]:
Y = tf.nn.softmax(tf.matmul(Y, W2) + B2)

Version 2a

This section will show how to add x amount of layers.

NOTE: These new layers will cause a drop in accuracy. Why?

  • Multilayer Perceptron (2 layers)

Constants (v3)

Dynamic number of layers


In [44]:
layers = [
        area,
        200,
        100,
        60,
        30,
        final_nodes
    ]

Placeholders

Assign to Y for looping


In [45]:
Y = XX

Weights and Bias (v3-5)

A list of Weights and Biases that loop through the layers


In [46]:
WW = [
    tf.Variable(tf.truncated_normal(
        [layers[i], layers[i+1]],
        stddev=0.1,
        name="Weights"
    ))
    for i in range(len(layers)-1)
]

BB = [
    tf.Variable(tf.ones([layers[i]])/10, "Biases")
    for i in range(1, len(layers))
]

Activation Function (v2a)

Looping Activations


In [47]:
i = 0
for i in range(len(layers)-2):
    name = "activate_" + str(i)
    Y = tf.nn.sigmoid(tf.matmul(Y, WW[i], name=name) + BB[i])

Regression Function (v2a)

Formats the output into a format we can use for training against the target


In [48]:
prediction = tf.matmul(Y, WW[i+1]) + BB[i+1]
Y = tf.nn.softmax(prediction)

Version 3: Function Swapping

  • Multilayer Perceptron (5 layers)
  • Activation Function: relu
    • Replaces sigmoid with relu
  • Optimizer: AdamOptimizer
    • replace GradientDescentOptimiser with AdamOptimizer
  • Loss Function: Use activation results instead of softmax.
    • Adjust function to handle 0 which softmax never returned

In [49]:
Y = XX

Activation Functions (v3)

Using Relu


In [50]:
i = 0
for i in range(len(layers)-2):
    preactivate = tf.matmul(Y, WW[i], name="Product") + BB[i]
    Y = tf.nn.relu(preactivate)

Regression Functions (v3)

Break out logits for loss function


In [51]:
logits = tf.matmul(Y, WW[i+1]) + BB[i+1]
Y = tf.nn.softmax(logits)

Loss Function (V3)

Loss function based upon Activation and not Regression.

NOTE Fixes the issue where the tf.log function tries to compute 0


In [52]:
logits = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=Y_)
cross_entropy = tf.reduce_mean(logits) * 100

Optimizer (V3)

Tensorflow has many optimizers and AdamOptimizer works well with large dimensional layers.


In [53]:
optimizer = tf.train.AdamOptimizer(lr)
train_step = optimizer.minimize(cross_entropy)

Version 4: Learning Rates

  • Multilayer Perceptron (5 layers)
  • Dynamic Learning Rate: reduces as time goes on. (from .003 to 0.00001)
  • Activation Function: relu
  • Optimizer: AdamOptimizer

Constants (v4)


In [54]:
lrmax = 0.003
lrmin = 0.00001
decay_speed = 2000.0

Placeholders (V4)

For Learning Curve


In [59]:
L = tf.placeholder(tf.float32)

In [34]:
# Training (V5-6)
# - Learning rate decreases as time goes on.

for i in range(1000):
    batch_X, batch_Y = mnist.train.next_batch(100)
    learning_rate = lrmin + (lrmax - lrmin) * exp(-i / decay_speed)
    train_data = {X: batch_X, Y_: batch_Y, L: learning_rate}

    # train
    sess.run(train_step, feed_dict=train_data)

Version 6: Dropoff

  • Multilayer Perceptron (5 layers)
  • Drop-off: 90% change of keeping a node for training
    • Prevents over fitting (The network could find unrelated data important)
  • Learning Rate: Dynamically reduces as time goes on. (from .003 to 0.00001)
  • Activation Function: relu
  • Optimizer: AdamOptimizer

Constants (V5)


In [61]:
keep_ratio = 0.9

Placeholders (V5)

For dropoff


In [62]:
keep_prob = tf.placeholder(tf.float32)

Activation Functions (v5)

Turns off some nodes. Prevents false positives from being piked up


In [63]:
Y = XX
i = 0
for i in range(len(layers)-2):
    name = "activate_" + str(i)
    Y = tf.nn.relu(tf.matmul(Y, WW[i], name=name) + BB[i])
    Y = tf.nn.dropout(Y, keep_prob)

Training (V6)

Turns off some nodes. Prevents false positives from being piked up


In [65]:
for i in range(1000):
    batch_X, batch_Y = mnist.train.next_batch(100)
    learning_rate = lrmin + (lrmax - lrmin) * exp(-i / decay_speed)
    train_data = {
        X: batch_X,
        Y_: batch_Y,
        L: learning_rate,
        keep_prob: keep_ratio
    }

    sess.run(train_step, feed_dict=train_data)

In [39]:
test_data = {X: mnist.test.images, Y_: mnist.test.labels, keep_prob: 1.0}
a,c = sess.run([accuracy, cross_entropy], feed_dict=test_data)
print(a)


0.098

In [ ]: