Tensorflow versus Poisonous Mushrooms

After the Keras Example, let's build a tensorflow-based model as a comparision.

Feature Extraction

This example uses the same feature extraction techniques as the Keras Example.

In summary, the data prep follows these steps...

  1. Load a pandas dataframe from a csv file.
  2. Transform categorial data to one-hot representation.
  3. Split the training and test data sets.
  4. Extract edibility as labels.

In [1]:
from pandas import read_csv
srooms_df = read_csv('../data/agaricus-lepiota.data.csv')
from sklearn_pandas import DataFrameMapper
import sklearn
import numpy as np

mappings = ([
    ('edibility', sklearn.preprocessing.LabelEncoder()),
    ('odor', sklearn.preprocessing.LabelBinarizer()),
    ('habitat', sklearn.preprocessing.LabelBinarizer()),
    ('spore-print-color', sklearn.preprocessing.LabelBinarizer())
])

mapper = DataFrameMapper(mappings)
srooms_np = mapper.fit_transform(srooms_df.copy()).astype(np.float32)

from sklearn.model_selection import train_test_split
train, test = train_test_split(srooms_np, test_size = 0.2, random_state=7)
train_labels = train[:,0:1]
train_data = train[:,1:]
test_labels = test[:,0:1]
test_data = test[:,1:]

Model Definition

Tensorflow requies a bit more work than Keras to define the network because we need to define the model's parameters (i.e. the weights and biases). Here is a Keras code snippnet for comparison:

from keras.models import Sequential
from keras.layers import Dense, Dropout

model = Sequential()
model.add(Dense(20, activation='relu', input_dim=25))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Here are the key differences:

  1. Tensorflow uses name scoping to logically separate the layers.
  2. Each dense layer defines and initializes weights and biases variables (implictly done in Keras).
  3. Tensorflow doesn't use a sequential model. It uses a graph. The model defines Tensor references between layers.

In [2]:
import tensorflow as tf
import math
def inference(samples, input_dim, dense1_units, dense2_units):
    with tf.name_scope('dense_1'):
        weights = tf.Variable(
            tf.truncated_normal([input_dim, dense1_units],
                                stddev=1.0 / math.sqrt(float(input_dim))),
            name='weights')
        biases = tf.Variable(tf.zeros([dense1_units]),
                             name='biases')
        dense1 = tf.nn.relu(tf.nn.xw_plus_b(samples, weights, biases))
        
    with tf.name_scope('dropout'):
        dropout = tf.nn.dropout(dense1, 0.5)
        
    with tf.name_scope('dense_2'):
        weights = tf.Variable(
            tf.truncated_normal([dense1_units, dense2_units],
                                stddev=1.0 / math.sqrt(float(dense2_units))),
            name='weights')
        biases = tf.Variable(tf.zeros([dense2_units]),
                             name='biases')
        output = tf.sigmoid(tf.nn.xw_plus_b(dropout, weights, biases))
        
    return output

Model Compile

Unlike Keras, TensorFlow doesn't provide pre-canned functions for training. The model needs the following functions defined.

  1. Define a loss function. The functions convert probabilities to logits. The clip function prevents a log(0).
  2. Define a training function. Uses the loss to compute the gradients.
  3. Define a accuracy function as a training metric.

Again, Keras hides these details by providing pre-canned loss and accuracy functions. The same definition in keras is a one liner.

model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

In [3]:
def loss(output, labels, from_logits=False):
  if not from_logits:
    epsilon = 10e-8
    output = tf.clip_by_value(output, epsilon, 1 - epsilon)
    output = tf.log(output / (1 - output))
    
  xentropy = tf.nn.sigmoid_cross_entropy_with_logits(labels=labels, logits=output)
  return tf.reduce_mean(xentropy)

def training(loss):
    tf.summary.scalar('loss', loss)
    optimizer = tf.train.AdamOptimizer()
    global_step = tf.Variable(0, name='global_step', trainable=False)
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op


def predict(output):
    return tf.round(output)

def accuracy(output, labels):
    return tf.reduce_mean(tf.to_float(tf.equal(predict(output),labels)))

Training

This entire code block represents a single line in Keras...

model.fit(train_data, train_labels, epochs=10, batch_size=32, callbacks=[tensor_board])

So, what's going on here?

  1. Define an input producer to batch samples and shuffle examples between epochs.
  2. Create SummaryWriter to write TensorBoard logs
  3. Iterate over each batch
    • Print accuracy and loss every epoch
    • Write out accuracy and loss to a log every epoch
  4. Save parameters when done.

In [4]:
import time

log_dir = './logs/tensor_srooms'

num_epochs=10
batch_size=64

with tf.Graph().as_default():
    with tf.name_scope('input'):
        features_initializer = tf.placeholder(dtype=tf.float32, shape=train_data.shape)
        labels_initializer = tf.placeholder(dtype=tf.float32, shape=train_labels.shape)
        input_features = tf.Variable(features_initializer, trainable=False, collections=[])
        input_labels = tf.Variable(labels_initializer, trainable=False, collections=[])

        # Shuffle the training data between epochs and train in batchs
        feature, label = tf.train.slice_input_producer([input_features, input_labels], num_epochs=num_epochs)
        features, labels = tf.train.batch([feature, label], batch_size=batch_size)

    # Define layers dimensions
    output = inference(features, 25, 20, 1)
    
    loss_op = loss(output, labels)
    train_op = training(loss_op)

    # Define the metrics op
    acc_op = accuracy(predict(output), labels)

    # Initialize all variables op
    init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())

    summary_op = tf.summary.merge_all()
    
    # Saver for the weights 
    saver = tf.train.Saver()
    print('create saver')
    
    # Start Session
    sess = tf.Session()
    sess.run(init_op)
    print('session started')
    
    # Load up the data.
    sess.run(input_features.initializer, feed_dict={features_initializer: train_data})
    sess.run(input_labels.initializer, feed_dict={labels_initializer: train_labels})
    print('loaded data')
    
    # Write the summary for tensorboard
    summary_writer = tf.summary.FileWriter(log_dir, sess.graph)
    
    # coordinate reading threads
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(sess=sess, coord=coord)
    
    try:
        step = 0
        while not coord.should_stop():
            start_time = time.time()

            # Run one step of the model.
            _, loss_value, acc_value = sess.run([train_op, loss_op, acc_op])

            duration = time.time() - start_time

            # Write the summaries and print an overview fairly often.
            if step % 100 == 0:
                # Print status to stdout.
                print('Step %d: loss = %.2f, acc = %.3f (%.3f sec)' % (step, loss_value, acc_value, duration))
                # Update the events file.
                summary_str = sess.run(summary_op)
                summary_writer.add_summary(summary_str, step)

            step += 1
    except tf.errors.OutOfRangeError:
        print('Saving')
        saver.save(sess, log_dir, global_step=step)
        print('Done training for %d epochs, %d steps.' % (num_epochs, step))
    finally:
        # When done, ask the threads to stop.
        coord.request_stop()

    # Wait for threads to finish.
    coord.join(threads)
    sess.close()


create saver
session started
loaded data
Step 0: loss = 0.88, acc = 0.531 (0.033 sec)
Step 100: loss = 0.37, acc = 0.875 (0.018 sec)
Step 200: loss = 0.26, acc = 0.922 (0.018 sec)
Step 300: loss = 0.26, acc = 0.938 (0.019 sec)
Step 400: loss = 0.18, acc = 0.922 (0.021 sec)
Step 500: loss = 0.08, acc = 1.000 (0.011 sec)
Step 600: loss = 0.11, acc = 0.969 (0.018 sec)
Step 700: loss = 0.04, acc = 1.000 (0.018 sec)
Step 800: loss = 0.07, acc = 1.000 (0.018 sec)
Step 900: loss = 0.04, acc = 1.000 (0.019 sec)
Step 1000: loss = 0.04, acc = 0.984 (0.018 sec)
Saving
Done training for 10 epochs, 1004 steps.

In [ ]:


In [ ]: