Feline Neural Network

Author(s): kozyr@google.com, bfoo@google.com

Reviewer(s):

Let's train a basic convolutional neural network to recognize cats.

Setup

Download all of your image sets to the VM. Then set aside a couple thousand training images for debugging.

mkdir -p ~/data/training_images
gsutil -m cp gs://$BUCKET/catimages/training_images/*.png ~/data/training_images/
mkdir -p ~/data/validation_images
gsutil -m cp gs://$BUCKET/catimages/validation_images/*.png ~/data/validation_images/
mkdir -p ~/data/test_images
gsutil -m cp gs://$BUCKET/catimages/test_images/*.png ~/data/test_images/
mkdir -p ~/data/debugging_images
mv ~/data/training_images/000*.png ~/data/debugging_images/
mv ~/data/training_images/001*.png ~/data/debugging_images/
echo "done!"

If you've already trained the model below once, SSH into your VM and run the following: rm -r ~/data/output_cnn_big so that you can start over.



In [0]:

    
# Enter your username:
YOUR_GMAIL_ACCOUNT = '******' # Whatever is before @gmail.com in your email address



In [0]:

    
# Libraries for this section:
import os
import datetime
import numpy as np
import pandas as pd
import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import tensorflow as tf
from tensorflow.contrib.learn import RunConfig, Experiment
from tensorflow.contrib.learn.python.learn import learn_runner



In [0]:

    
# Directory settings:
TRAIN_DIR = os.path.join('/home', YOUR_GMAIL_ACCOUNT, 'data/training_images/')  # Directory where the training dataset lives.
DEBUG_DIR = os.path.join('/home', YOUR_GMAIL_ACCOUNT, 'data/debugging_images/')  # Directory where the debugging dataset lives.
VALID_DIR = os.path.join('/home', YOUR_GMAIL_ACCOUNT, 'data/validation_images/')  # Directory where the validation dataset lives.
TEST_DIR = os.path.join('/home', YOUR_GMAIL_ACCOUNT, 'data/test_images/')  # Directory where the test dataset lives.
OUTPUT_DIR = os.path.join('/home', YOUR_GMAIL_ACCOUNT, 'data/output_cnn_big/')  # Directory where we store our logging and models.

# TensorFlow setup:
NUM_CLASSES = 2  # This code can be generalized beyond 2 classes (binary classification).
QUEUE_CAP = 5000  # Number of images the TensorFlow queue can store during training.
# For debugging, QUEUE_CAP is ignored in favor of using all images available.
TRAIN_BATCH_SIZE = 500  # Number of images processed every training iteration.
DEBUG_BATCH_SIZE = 100  # Number of images processed every debugging iteration.
TRAIN_STEPS = 3000 # Number of batches to use for training.
DEBUG_STEPS = 2 # Number of batches to use for debugging.
# Example: If dataset is 5 batches ABCDE, train_steps = 2 uses AB, train_steps = 7 uses ABCDEAB).

# Monitoring setup:
TRAINING_LOG_PERIOD_SECS = 60  # How often we want to log training metrics (from training hook in our model_fn).
CHECKPOINT_PERIOD_SECS = 60  # How often we want to save a checkpoint.

# Hyperparameters we'll tune in the tutorial:
DROPOUT = 0.6  # Regularization parameter for neural networks - must be between 0 and 1.

# Additional hyperparameters:
LEARNING_RATE = 0.001  # Rate at which weights update.
CNN_KERNEL_SIZE = 3  # Receptive field will be square window with this many pixels per side.
CNN_STRIDES = 2  # Distance between consecutive receptive fields.
CNN_FILTERS = 16  # Number of filters (new receptive fields to train, i.e. new channels) in first convolutional layer.
FC_HIDDEN_UNITS = 512  # Number of hidden units in the fully connected layer of the network.

Let's visualize what we're working with and get the pixel count for our images. They should be square for this to work, but luckily we padded them with black pixels where needed previously.



In [0]:

    
def show_inputs(dir, filelist=None, img_rows=1, img_cols=3, figsize=(20, 10)):
  """Display the first few images.
  
  Args:
    dir: directory where the files are stored
    filelist: list of filenames to pull from, if left as default, all files will be used
    img_rows: number of rows of images to display
    img_cols: number of columns of images to display
    figsize: sizing for inline plots
    
  Returns:
    pixel_dims: pixel dimensions (height and width) of the image
  """
  if filelist is None:
    filelist = os.listdir(dir)  # Grab all the files in the directory
  filelist = np.array(filelist)
  plt.close('all')
  fig = plt.figure(figsize=figsize)
  print('File names:')

  for i in range(img_rows * img_cols):
    print(str(filelist[i]))
    a=fig.add_subplot(img_rows, img_cols,i + 1)
    img = mpimg.imread(os.path.join(dir, str(filelist[i])))
    plt.imshow(img)
  plt.show()
  return np.shape(img)



In [0]:

    
pixel_dim = show_inputs(TRAIN_DIR)
print('Images have ' + str(pixel_dim[0]) + 'x' + str(pixel_dim[1]) + ' pixels.')
pixels = pixel_dim[0] * pixel_dim[1]

Step 5 - Get tooling for training convolutional neural networks

Here is where we enable training convolutional neural networks on data inputs like ours. We'll build it using a TensorFlow estimator. TensorFlow (TF) is designed for scale, which means it doesn't pull all our data into memory all at once, but instead it's all about lazy execution. We'll write functions which it will run when it's efficient to do so. TF will pull in batches of our image data and run the functions we wrote.

In order to make this work, we need to write code for the following:

Input function: generate_input_fn()
Neural network architecture: cnn()
Model function: generate_model_fn()
Estimator: tf.estimator.Estimator()
Experiment: generate_experiment_fn()
Prediction generator: cat_finder()

Input function

The input function tells TensorFlow what format of feature and label data to expect. We'll set ours up to pull in all images in a directory we point it at. It expects images with filenames in the following format: number_number_label.extension, so if your file naming scheme is different, please edit the input function.



In [0]:

    
# Input function:
def generate_input_fn(dir, batch_size, queue_capacity):
  """Return _input_fn for use with TF Experiment.
  
  Will be called in the Experiment section below (see _experiment_fn).
  
  Args:
    dir: directory we're taking our files from, code is written to collect all files in this dir.
    batch_size: number of rows ingested in each training iteration.
    queue_capacity: number of images the TF queue can store.
    
  Returns:
    _input_fn: a function that returns a batch of images and labels.
  """

  file_pattern = os.path.join(dir, '*')  # We're pulling in all files in the directory.

  def _input_fn():
    """A function that returns a batch of images and labels.

    Args:
      None

    Returns:
      image_batch: 4-d tensor collection of images.
      label_batch: 1-d tensor of corresponding labels.
    """

    height, width, channels = [pixel_dim[0], pixel_dim[1], 3]  # [height, width, 3] because there are 3 channels per image.
    filenames_tensor = tf.train.match_filenames_once(file_pattern)  # Collect the filenames
    # Queue that periodically reads in images from disk:
    # When ready to run iteration, TF will take batch_size number of images out of filename_queue.
    filename_queue = tf.train.string_input_producer(
      filenames_tensor,
      shuffle=False)  # Do not shuffle order of the images ingested.
    # Convert filenames from queue into contents (png images pulled into memory):
    reader = tf.WholeFileReader()
    filename, contents = reader.read(filename_queue)
    # Decodes contents pulled in into 3-d tensor per image:
    image = tf.image.decode_png(contents, channels=channels)
    # If dimensions mismatch, pad with zeros (black pixels) or crop to make it fit:
    image = tf.image.resize_image_with_crop_or_pad(image, height, width)
    # Parse out label from filename:
    label = tf.string_to_number(tf.string_split([tf.string_split([filename], '_').values[-1]], '.').values[0])
    # All your filenames should be in this format number_number_label.extension where label is 0 or 1.
    # Execute above in a batch of batch_size to create a 4-d tensor of collection of images:
    image_batch, label_batch = tf.train.batch(
      [image, label],
      batch_size,
      num_threads=1,  # We'll decline the multithreading option so that everything stays in filename order.
      capacity=queue_capacity)
    # Normalization for better training:
    # Change scale from pixel uint8 values between 0 and 255 into normalized float32 values between 0 and 1:
    image_batch = tf.to_float(image_batch) / 255
    # Rescale from (0,1) to (-1,1) so that the "center" of the image range is 0:
    image_batch = (image_batch * 2) - 1
    return image_batch, label_batch
  return _input_fn

Neural network architecture

This is where we define the architecture of the neural network we're using, such are the number of hidden layers and units.



In [0]:

    
# CNN architecture:
def cnn(features, dropout, reuse, is_training):
  """Defines the architecture of the neural network.
  
  Will be called within generate_model_fn() below.
  
  Args: 
    features: feature data as 4-d tensor (of batch_size) pulled in when_input_fn() is executed.
    dropout: regularization parameter in last layer (between 0 and 1, exclusive).
    reuse: a scoping safeguard. First time training: set to False, after that, set to True.
    is_training: if True then fits model and uses dropout, if False then doesn't consider the dropout
    
  Returns:
    2-d tensor: each images [logit(1-p), logit(p)] where p=Pr(1),
                i.e. probability that class is 1 (cat in our case).
                Note: logit(p) = logodds(p) = log(p / (1-p))
  """

  # Next, we define a scope for reusing our variables, choosing our network architecture and naming our layers.
  with tf.variable_scope('cnn', reuse=reuse):
    layer_1 = tf.layers.conv2d(  # 2-d convolutional layer; size of output image is (pixels/stride) a side with channels = filters.
      inputs=features,  # previous layer (inputs) is features argument to the main function
      kernel_size=CNN_KERNEL_SIZE,  # 3x3(x3 because we have 3 channels) receptive field (only square ones allowed)
      strides=CNN_STRIDES,  # distance between consecutive receptive fields
      filters=CNN_FILTERS,  # number of receptive fields to train; think of this as a CNN_FILTERS-channel image which is input to next layer)
      padding='SAME',  # SAME uses zero padding if not all CNN_KERNEL_SIZE x CNN_KERNEL_SIZE positions are filled, VALID will ignore missing
      activation=tf.nn.relu)  # activation function is ReLU which is f(x) = max(x, 0) 
    
    # For simplicity, this neural network doubles the number of receptive fields (filters) with each layer.
    # By using more filters, we are able to preserve the spatial dimensions better by storing more information.
    #
    # To determine how much information is preserved by each layer, consider that with each layer,
    # the output width and height is decimated by the `strides` value.
    # When strides=2 for example, the input width W and height H is reduced by 2x, resulting in
    # an "image" (formally, an activation field) for each filter output with dimensions W/2 x H/2.
    # By doubling the number of filters compared to the input number of filters, the total output
    # dimension becomes W/2 x H/2 x CNN_FILTERS*2, essentially compressing the input of the layer
    # (W x H x CNN_FILTERS) to half as many total "pixels" (hidden units) at the output.
    #
    # On the other hand, increasing the number of filters will also increase the training time proportionally,
    # as there are that more weights and biases to train and convolutions to perform.
    #
    # As an exercise, you can play around with different numbers of filters, strides, and kernel_sizes.
    # To avoid very long training time, make sure to keep kernel sizes small (under 5),
    # strides at least 2 but no larger than kernel sizes (or you will skip pixels),
    # and bound the number of filters at each level (no more than 512).
    #
    # When modifying these values, it is VERY important to keep track of the size of your layer outputs,
    # i.e. the number of hidden units, since the final layer will need to be flattened into a 1D vector with size
    # equal to the total number of hidden units. For this reason, using strides that are divisible by the width
    # and height of the input may be the easiest way to avoid miscalculations from rounding.
    layer_2 = tf.layers.conv2d(
      inputs=layer_1,
      kernel_size=CNN_KERNEL_SIZE,
      strides=CNN_STRIDES,
      filters=CNN_FILTERS * (2 ** 1),  # Double the number of filters from previous layer
      padding='SAME',
      activation=tf.nn.relu)
    
    layer_3 = tf.layers.conv2d(
      inputs=layer_2,
      kernel_size=CNN_KERNEL_SIZE,
      strides=CNN_STRIDES,
      filters=CNN_FILTERS * (2 ** 2),  # Double the number of filters from previous layer
      padding='SAME',
      activation=tf.nn.relu)
    
    layer_4 = tf.layers.conv2d(
      inputs=layer_3,
      kernel_size=CNN_KERNEL_SIZE,
      strides=CNN_STRIDES,
      filters=CNN_FILTERS * (2 ** 3),  # Double the number of filters from previous layer
      padding='SAME',
      activation=tf.nn.relu)
    
    layer_5 = tf.layers.conv2d(
      inputs=layer_4,
      kernel_size=CNN_KERNEL_SIZE,
      strides=CNN_STRIDES,
      filters=CNN_FILTERS * (2 ** 4),  # Double the number of filters from previous layer
      padding='SAME',
      activation=tf.nn.relu)
    
    layer_5_flat = tf.reshape(  # Flattening to 2-d tensor (1-d per image row for feedforward fully-connected layer)
      layer_5, 
      shape=[-1, # Reshape final layer to 1-d tensor per image.
             CNN_FILTERS * (2 ** 4) *  # Number of filters (depth), times...
             pixels  / (CNN_STRIDES ** 5) / (CNN_STRIDES ** 5)])  # Number of hidden units per filter (input pixels / width decimation / height decimation)
    
    dense_layer= tf.layers.dense(  # fully connected layer
      inputs=layer_5_flat,
      units=FC_HIDDEN_UNITS,  # number of hidden units
      activation=tf.nn.relu)
    
    dropout_layer = tf.layers.dropout(  # Dropout layer randomly keeps only dropout*100% of the dense layer's hidden units in training and autonormalizes during prediction.
      inputs=dense_layer, 
      rate=dropout,
      training=is_training)  

    return tf.layers.dense(inputs=dropout_layer, units=NUM_CLASSES)  # 2-d tensor: [logit(1-p), logit(p)] for each image in batch.

Model function

The model function tells TensorFlow how to call the model we designed above and what to do when we're in training vs evaluation vs prediction mode. This is where we define the loss function, the optimizer, and the performance metric (which we picked in Step 1).



In [0]:

    
# Model function:
def generate_model_fn(dropout):
  """Return a function that determines how TF estimator operates.

  The estimator has 3 modes of operation:
  * train (fitting and updating the model)
  * eval (collecting and returning validation metrics)
  * predict (using the model to label unlabeled images)

  The returned function _cnn_model_fn below determines what to do depending
  on the mode of operation, and returns specs telling the estimator what to
  execute for that mode.

  Args:
    dropout: regularization parameter in last layer (between 0 and 1, exclusive)

  Returns:
    _cnn_model_fn: a function that returns specs for use with TF estimator
  """

  def _cnn_model_fn(features, labels, mode):
    """A function that determines specs for the TF estimator based on mode of operation.
    
    Args: 
      features: actual data (which goes into scope within estimator function) as 4-d tensor (of batch_size),
                pulled in via tf executing _input_fn(), which is the output to generate_input_fn() and is in memory
      labels: 1-d tensor of 0s and 1s
      mode: TF object indicating whether we're in train, eval, or predict mode.
      
    Returns:
           estim_specs: collections of metrics and tensors that are required for training (e.g. prediction values, loss value, train_op tells model weights how to update)
    """

    # Use the cnn() to compute logits:
    logits_train = cnn(features, dropout, reuse=False, is_training=True)
    logits_eval = cnn(features, dropout, reuse=True, is_training=False)
    # We'll be evaluating these later.

    # Transform logits into predictions:
    pred_classes = tf.argmax(logits_eval, axis=1)  # Returns 0 or 1, whichever has larger logit.
    pred_prob = tf.nn.softmax(logits=logits_eval)[:, 1]  # Applies softmax function to return 2-d probability vector.
    # Note: we're not outputting pred_prob in this tutorial, that line just shows you
    # how to get it if you want it. Softmax[i] = exp(logit[i]) / sum(exp((logit[:]))

    # If we're in prediction mode, early return predicted class (0 or 1):
    if mode == tf.estimator.ModeKeys.PREDICT:
      return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

    # If we're not in prediction mode, define loss function and optimizer.

    # Loss function:
    # This is what the algorithm minimizes to learn the weights.
    # tf.reduce_mean() just takes the mean over a batch, giving back a scalar.
    # Inside tf.reduce_mean() we'll select any valid binary loss function we want to use.
    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
      logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))

    # Optimizer:
    # This is the scheme the algorithm uses to update the weights.
    # AdamOptimizer is adaptive moving average, feel free to replace with one you prefer.
    optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE)

    # The minimize method below doesn't minimize anything, it just takes a step.
    train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())

    # Performance metric:
    # Should be whatever we chose as we defined in Step 1.  This is what you said you care about!
    # This output is for reporting only, it is not optimized directly.
    acc = tf.metrics.accuracy(labels=labels, predictions=pred_classes)

    # Hooks - pick what to log and show:
    # Hooks are designed for monitoring; every time TF writes a summary, it'll append these.
    logging_hook = tf.train.LoggingTensorHook({
      'x-entropy loss': loss,
      'training accuracy': acc[0],
    }, every_n_secs=TRAINING_LOG_PERIOD_SECS)

    # Stitch everything together into the estimator specs, which we'll output here so it can
    # later be passed to tf.estimator.Estimator()
    estim_specs = tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=pred_classes,
      loss=loss,
      train_op=train_op,
      training_hooks=[logging_hook],
      eval_metric_ops={
        'accuracy':  acc,  # This line is Step 7!
      }
    )

    # TF estim_specs defines a huge dict that stores different metrics and operations for useby TF Estimator.
    # This gives you the interaction between your architecture in cnn() and the weights, etc. in the current iteration which
    # will be used as input in the next iteration.
    return estim_specs

  return _cnn_model_fn

TF Estimator

This is where it all comes together: TF Estimator takes in as input everything we've created thus far and when executed it will output everything that is necessary for training (fits a model), evaluation (outputs metrics), or prediction (outputs predictions).



In [0]:

    
# TF Estimator:
# WARNING: Don't run this block of code more than once without first changing OUTPUT_DIR.
estimator = tf.estimator.Estimator(
  model_fn=generate_model_fn(DROPOUT),  # Call our generate_model_fn to create model function
  model_dir=OUTPUT_DIR,  # Where to look for data and also to paste output.
  config=RunConfig(
    save_checkpoints_secs=CHECKPOINT_PERIOD_SECS,
    keep_checkpoint_max=20,
    save_summary_steps=100,
    log_step_count_steps=100)
)

TF Experiment

A TF Experiment defines how to run your TF estimator during training and debugging only. TF Experiments are not necessary for prediction once training is complete.

TERMINOLOGY WARNING: The word "experiment" here is not used the way it is used by typical scientists and statisticians.



In [0]:

    
# TF Experiment:
def experiment_fn(output_dir):
    """Create _experiment_fn which returns a TF experiment

    To be used with learn_runner, which we imported from tf.

    Args: 
      output_dir: which is where we write our models to.
    Returns: 
      a TF Experiment
    """

    return Experiment(
      estimator=estimator,  # What is the estimator?
      train_input_fn=generate_input_fn(TRAIN_DIR, TRAIN_BATCH_SIZE, QUEUE_CAP),  # Generate input function designed above.
      eval_input_fn=generate_input_fn(DEBUG_DIR, DEBUG_BATCH_SIZE, QUEUE_CAP),
      train_steps=TRAIN_STEPS,  # Number of batches to use for training.
      eval_steps=DEBUG_STEPS, # Number of batches to use for eval.
      min_eval_frequency=1, # Run eval once every min_eval_frequency number of checkpoints.
      local_eval_frequency=1
    )

Step 6 - Train a model!

Let's run our lovely creation on our training data. In order to train, we need learn_runner(), which we imported from TensorFlow above. For prediction, we will only need estimator.predict().



In [0]:

    
# Enable TF verbose output:
tf.logging.set_verbosity(tf.logging.INFO)
start_time = datetime.datetime.now()
print('It\'s {:%H:%M} in London'.format(start_time) + ' --- Let\'s get started!')
# Let the learning commence!  Run the TF Experiment here.
learn_runner.run(experiment_fn, OUTPUT_DIR)
# Output lines using the word "Validation" are giving our metric on the non-training dataset (from DEBUG_DIR).
end_time = datetime.datetime.now()
print('\nIt was {:%H:%M} in London when we started.'.format(start_time))
print('\nWe\'re finished and it\'s {:%H:%M} in London'.format(end_time))
print('\nCongratulations!  Training is complete!')



In [0]:

    
print('\nIt was {:%H:%M} in London when we started.'.format(start_time))
print('\nWe\'re finished and it\'s {:%H:%M} in London'.format(end_time))
print('\nCongratulations!  Training is complete!')



In [0]:

    
# Observed labels from filenames:
def get_labels(dir):
  """Get labels from filenames.
  
  Filenames must be in the following format: number_number_label.png
  
  Args:
    dir: directory containing image files
    
  Returns:
    labels: 1-d np.array of binary labels
  """
  filelist = os.listdir(dir)  # Use all the files in the directory
  labels = np.array([])
  for f in filelist:
    split_filename = f.split('_')
    label = int(split_filename[-1].split('.')[0])
    labels = np.append(labels, label)
  return labels



In [0]:

    
# Cat_finder function for getting predictions:
def cat_finder(dir, model_version):
  """Get labels from model.

  Args:
    dir: directory containing image files

  Returns:
    predictions: 1-d np array of binary labels
  """

  num_predictions = len(os.listdir(dir))
  predictions = []  # Initialize array.

  # Estimator.predict() returns a generator g. Call next(g) to retrieve the next value.
  prediction_gen = estimator.predict(
    input_fn=generate_input_fn(dir=dir,
                               batch_size=TRAIN_STEPS,
                               queue_capacity=QUEUE_CAP
                              ),
    checkpoint_path=model_version
  )

  # Use generator to ensure ordering is preserved and predictions match order of validation_labels:
  i = 1
  for pred in range(0, num_predictions):
    predictions.append(next(prediction_gen)) #Append the next value of the generator to the prediction array
    i += 1
    if i % 1000 == 0:
      print('{:d} predictions completed (out of {:d})...'.format(i, len(os.listdir(dir))))
  print('{:d} predictions completed (out of {:d})...'.format(len(os.listdir(dir)), len(os.listdir(dir))))
  return np.array(predictions)

Get training accuracy



In [0]:

    
def get_accuracy(truth, predictions, threshold=0.5, roundoff = 2):
  """Compares labels with model predictions and returns accuracy.
  
  Args:
    truth: can be bool (False, True), int (0, 1), or float (0, 1)
    predictions: number between 0 and 1, inclusive
    threshold: we convert the predictions to 1s if they're above this value
    roundoff: report accuracy to how many decimal places?
    
  Returns:
    accuracy: number correct divided by total predictions
  """
  truth = np.array(truth) == (1|True)
  predicted = np.array(predictions) >= threshold
  matches = sum(predicted == truth)
  accuracy = float(matches) / len(truth)
  return round(accuracy, roundoff)

Get predictions and performance metrics

Create functions for outputting observed labels, predicted labels, and accuracy. Filenames must be in the following format: number_number_label.extension



In [0]:

    
files = os.listdir(TRAIN_DIR)
model_version = OUTPUT_DIR + 'model.ckpt-' + str(TRAIN_STEPS)
observed = get_labels(TRAIN_DIR)
predicted = cat_finder(TRAIN_DIR, model_version)



In [0]:

    
print('Training accuracy is ' + str(get_accuracy(observed, predicted)))

Step 7 - Debugging and Tuning

Debugging

It's worth taking a look to see if there's something special about the images we misclassified.



In [0]:

    
files = os.listdir(DEBUG_DIR)
predicted = cat_finder(DEBUG_DIR, model_version)
observed = get_labels(DEBUG_DIR)



In [0]:

    
print('Debugging accuracy is ' + str(get_accuracy(observed, predicted)))



In [0]:

    
df = pd.DataFrame({'files': files, 'predicted': predicted, 'observed': observed})
hit = df.files[df.observed == df.predicted]
miss = df.files[df.observed != df.predicted]



In [0]:

    
# Show successful classifications:
show_inputs(DEBUG_DIR, hit, 3)



In [0]:

    
# Show unsuccessful classifications:
show_inputs(DEBUG_DIR, miss, 3)

Step 8 - Validation

Apply cat_finder() to the validation dataset. Since this is validation, we'll only look at the final performance metric (accuracy) and nothing else.



In [0]:

    
files = os.listdir(VALID_DIR)
predicted = cat_finder(VALID_DIR, model_version)
observed = get_labels(VALID_DIR)
print('\nValidation accuracy is ' + str(get_accuracy(observed, predicted)))

Step 9 - Statistical Testing

Apply cat_finder() to the test dataset ONE TIME ONLY. Since this is testing, we'll only look at the final performance metric (accuracy) and the results of the statistical hypothesis test.



In [0]:

    
# Hypothesis test we'll use:
from statsmodels.stats.proportion import proportions_ztest

# Testing setup:
SIGNIFICANCE_LEVEL = 0.05
TARGET_ACCURACY = 0.80



In [0]:

    
files = os.listdir(TEST_DIR)
predicted = cat_finder(TEST_DIR, model_version)
observed = get_labels(TEST_DIR)
print('\nTest accuracy is ' + str(get_accuracy(observed, predicted, roundoff=4)))



In [0]:

    
# Using standard notation for a one-sided test of one population proportion:
n = len(predicted)
x = round(get_accuracy(observed, predicted, roundoff=4) * n)
p_value = proportions_ztest(count=x, nobs=n, value=TARGET_ACCURACY, alternative='larger')[1]
if p_value < SIGNIFICANCE_LEVEL:
    print('Congratulations! Your model is good enough to build. It passes testing. Awesome!')
else:
    print('Too bad.  Better luck next project.  To try again, you need a pristine test dataset.')