Transfer Learning using Inception V3 Model and Convolutional Autoencoder on STL-10 Dataset

Introduction

Deep Learning is becoming one of the most sought after skills in the current industry. Many companies are adopting AI into their produccts, emerging markets are found in Self-driving vehicles, image based product recognition and recommendation, face recognition for Snapchat filters, and many more. For all of these applications, the one major requirement is "Data"- Lots of it! To be precise, the AI models designed need Labeled data for training them. The hiccups in getting this dataset are below:

  1. Labeled data is hard to obtain due to the limited amount available
  2. Labeled data is expensive to acquire as a human (Amazon Mechanical Turk) must label them manually

Even if the labeled dataset were to be obtained, training a sophisticated model to the job would take a lot of time, money, and resources. The workarounds are simple.

  1. To overcome the data scarcity: Obtaining unlabeled images of almost any class from the internet is easy and the amount is abundant. Once the unlabeled dataset is obtained of similar, correlated classes, building and training a model just to learn features (not classify) from the images is the next step. After learning the weights of the model, the limited number of labeled dataset can then be fed to the model to classify the images.

  2. To overcome the limitation of resources and time: Pre-trained models such as Google's Inception model, and VGG16 Model are trained extensively on high power computers for weeks on ImageNet dataset and are capable of predicting 1000 classes given an image. Most of the predictions done by these models are accurate due to the depth of the model. These models can be modified slightly (discussed below) and can be made to classify images suitable for custom purposes with a far higher accuracy, with as little computation possible, and in a short amount of time. Again, this method also makes up for limited labeled dataset as the model does not need to learn weights of the images.

The above listed methods are called "Transfer Learning". Transfer learning is helpful in mitigating the above two scenarios. And hence the motivation to take up this challenge. This notebook deals with Transfer Learning on Stanford's STL-10 dataset using Google Inception Model.

The dataset can be found here: https://cs.stanford.edu/~acoates/stl10/

The source codes can be found here: https://github.com/skalkur/Transfer_learning_Startup.ML

Required Libraries: TensorFlow (v.1.2.1) and Numpy, and Inception V3

Inception V3 Model and how to perform Transfer Learning on it

The Graph network of the Inception model is as shown in the Figure below [1]

Inception V3 model is a Deep Convolutional Network trained on the dataset of ImageNet and can classify an image into 1000 classes. There are many Convolution layers, Average Pooling, Max pooling, Dropout, finally a Fully-connected layer and Softmax layer for classification. The layer which is interesting for Transfer learning is the final pooling layer just before the Dense and Softmax layers. Because, until this layer, named 'pool_3', the model only does Feature extraction which is the computationally expensive part as this needs to be tuned as per the input images.

The Penultimate two layers will be modified in this project to predict the 10 classes of STL-10 rather than ImageNet's 1000 classes. It can be observed that this slight modification yields a much better accuracy which a regular CNN would not be able to achieve due to less number of training examples.

The Steps which are followed to achieve the Final results are as follows:

Step 1: Download the Inception V3 model and extract the Model Graph


In [ ]:
from tensorflow.python.platform import gfile
import tensorflow as tf
import numpy as np

model='../inception/classify_image_graph_def.pb'

def create_graph():
    
    '''
    Function to extract GraphDef of Inception model.
    Returns: Extracted GraphDef
    
    '''    
    with tf.Session() as sess:
        with gfile.FastGFile(model,'rb') as f:
            graph_def=tf.GraphDef()
            graph_def.ParseFromString(f.read())
            _=tf.import_graph_def(graph_def,name='')
            
    return sess.graph

This will load the TensorFlow's default graph with the Inception's graph.

Step 2: Bottleneck the Training and Testing Images

To perform Transfer Learning, we need to perform Bottlenecking. Bottleneck is a process where every image is fed to the Inception model and the output is taken from the intermediate Bottleneck layer in the graph rather than the output layer. In this case, we bottleneck every image until the pool_3 layer as that is the last layer performing feature extraction. In simpler terms, we "freeze" the model until this point. The code cell below shows a method that takes in a batch of images and outputs the bottlenecked version of them.


In [ ]:
def batch_pool3_features(sess,X_input):
    
    '''
    Function to extract features for a given batch of images by
    passing it through Inception model until pool_3 layer to get bottlenecks
    
    Args: Current Session, Batch of Images of size:batch_sizex96x96x3
    Returns: Array of 2048 features extracted for every image by Inception
    '''
    n_train=X_input.shape[0]
    pool3=sess.graph.get_tensor_by_name('pool_3:0')
    x_pool3=[]
    for i in range(n_train):
        print ("Iteration: "+str(i))
        features=sess.run(pool3,{'DecodeJpeg:0':X_input[i,:]})
        x_pool3.append(np.squeeze(features))
    return np.array(x_pool3)

This is done for all the images (Train and Test) and they are saved as serialized Numpy arrays.


In [ ]:
def bottleneck_data(sess):
    
    '''
    Function to load STL data and process them to bottleneck them
    
    Args: TensorFlow session
    '''
    X_train, Y_train, X_test, Y_test=load_stl_data(one_hot=True)
    bottleneck_pool3(sess,X_train, './X_train.npy')
    bottleneck_pool3(sess,X_test, './X_test.npy')
    np.save('./Y_train.npy',Y_train)
    np.save('./Y_test.npy',Y_test)

This step usually takes some time if the dataset is actually large (which partially defeats the purpose). Therefore, its best to save them on to the disk and make it a one-time process. This would now yield us a dataset of size from $None\times96\times96\times3$ to $None\times2048$. Which is a great deal of reduction data size. This means that, what was a 3-channel $96\times96$ image is now feature extracted efficiently by Inception and is represented by a vector of size 2048. Thus paving way for a simple Neural Network for the classification task.

Step 3: Create a Final Training Layer for Classification and Evaluation method

With the Inception doing its job at feature extraction, and bottlenecking. It is now time for us to take over and perform the classification task between the 10 classes of STL-10 dataset. A fully connected model of structure 2048-1024-512-10 is constructed. To avoid overfitting due to low dimensions of data and less training data, Dropout layers are added to every layer with a retention probability of 0.75. The Intermediate layers have a Tanh activation function and the final, output layer comprises of Softmax activation. The Loss function used is Cross Entropy defined by: $$Loss = -\frac{1}{n_{examples}} \sum\limits_x (y_{ground truth} \ln y_{pred} + (1-y_{ground truth}) \ln (1-y_{pred}))$$

The Input dimensions are batch_size$\times$2048 and output is an one-hot encoded representation of the 10 classes, therefore the output layer has 10 neurons. The Optimizer used is the regular Gradient Descent Optimizer.


In [ ]:
BOTTLENECK_TENSOR_NAME='pool_3'
BOTTLENECK_TENSOR_SIZE=2048

In [ ]:
def add_final_training_layer(class_count, final_tensor_name,\
                             ground_truth_tensor_name, learning_rate=1e-3):
    
    '''
    Function to define the FC, Softmax classifier model to Classify the serialized
    images. Has Gradient Descent Optimizer.
    Includes Dropout layers and a 2048-1024-512-10 network
    
    Args: No. of classes, final tensor name of the FC network, 
    Ground Truth Tensor name, Learning rate for Optimizer
    
    Returns: Train Op and Cost of the model
    '''
    layers=[1024, 512, 10]
    keep_prob=0.75
    bottleneck_input=tf.placeholder(tf.float32,\
                                    shape=[None, BOTTLENECK_TENSOR_SIZE], name='BottleneckInput')
    currentInput=bottleneck_input
    n_input=BOTTLENECK_TENSOR_SIZE
    for layer, output_size in enumerate(layers):
        with tf.variable_scope('fc/layer{}'.format(layer)):
            W=tf.get_variable(name='W', shape=[n_input, output_size], \
                              initializer=tf.random_normal_initializer(mean=0.0,stddev=0.01))
            b=tf.get_variable(name='b',shape=[output_size],\
                              initializer=tf.constant_initializer([0]))
            h=tf.matmul(currentInput,W)+b
            n_input=output_size
            if output_size!=layers[2]:
                h=tf.nn.tanh(h,name='h')
            else:
                final_tensor=tf.nn.softmax(h, name=final_tensor_name)
            h=tf.nn.dropout(h,keep_prob)
            currentInput=h

    Y=tf.placeholder(tf.float32, shape=[None,class_count],\
                     name=ground_truth_tensor_name)
    cross_entropy=tf.nn.softmax_cross_entropy_with_logits(logits=h, labels=Y)
    cost=tf.reduce_mean(cross_entropy)
    train_step=tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)
    
    return train_step, cost

For the evaluation and monitoring of the model's performance throughout training and the final test accuracy, the following method is employed.


In [ ]:
def evaluation_step(graph, final_tensor_name, ground_truth_tensor_name):
    
    '''
    Function to evaluate the performance of the model by calculating the 
    accuracy of prediction
    
    Args: Final Tensor and Ground Truth Tensor Name, TensorFlow Graph
    
    Return: Evaluation Tensor
    '''
    result_tensor=graph.get_tensor_by_name(ensure_port(final_tensor_name))
    Y_tensor=graph.get_tensor_by_name(ensure_port(ground_truth_tensor_name))
    correct_pred=tf.equal(tf.argmax(result_tensor,1),tf.argmax(Y_tensor,1))
    
    eval_step=tf.reduce_mean(tf.cast(correct_pred,'float'))
    return eval_step

Step 4: Training the Fully Connected model

Final step in this process is training the model which is now going to classify the images. The number of epochs ran are 3500 with batch size of 250. One thing to note here is that, though the model got trained for 3500 epochs, which sounds like a lot of time, got completed in minutes, albeit running it on a GPU. The training accuracy and cost are printed every epoch.


In [ ]:
def trainer(sess, X_input, Y_input, X_test, Y_test):
    
    '''
    Function to train the FC model with a Softmax activation for output layer
    
    Args: TensorFlow Session, Bottlenecked Images for training and testing
    and corresponding labels
    '''
    ground_truth_tensor_name='ground_truth'
    
    # Define Batch size
    mini_batch_size=250
    n_train=X_input.shape[0]

    graph=create_graph()
    
    # Get the train op and loss function
    train_step,cross_entropy=add_final_training_layer\
    (n_classes, final_tensor_name, ground_truth_tensor_name, learning_rate)
    # Intiliaze all variables
    sess.run(tf.global_variables_initializer())
    # Get evaluation tensor
    eval_step=evaluation_step(graph, \
    'fc/layer2/'+final_tensor_name, ground_truth_tensor_name)

    # Get tensors for Input and Output    
    bottleneck_input=graph.get_tensor_by_name(ensure_port('BottleneckInput'))
    Y=graph.get_tensor_by_name(ensure_port(ground_truth_tensor_name))
    
    # Define number of epochs
    epochs=3500
    
    # Perform training for number of epochs defined
    for epoch in range(epochs):
        
        # Shuffle the examples
        shuffle=np.random.permutation(n_train)
        
        shuffle_X=X_input[shuffle,:]
        shuffle_Y=Y_input[shuffle]
        
        # Perform batch training
        for Xi, Yi in iterate_batches(shuffle_X, shuffle_Y, mini_batch_size):
            sess.run(train_step, feed_dict={bottleneck_input:Xi, Y:Yi})
        
        # Print out model's performance after every epoch
        train_accuracy, train_cross_entropy=\
        sess.run([eval_step,cross_entropy], \
                 feed_dict={bottleneck_input:X_input, Y:Y_input})
        print ("Epoch %d: Train accuracy:%0.2f, Cross Entropy:%0.2f"\
               %(epoch,train_accuracy*100,train_cross_entropy))
                
    # Get the test accuracy after training is complete        
    test_accuracy=sess.run(eval_step, \
                           feed_dict={bottleneck_input:X_test, Y:Y_test})
    print('Final Test Accuracy:%0.2f' %(test_accuracy*100))

Finally, run the trainer method after loading the saved bottleneck data.


In [ ]:
n_classes=10

X_train, Y_train, X_test, Y_test= load_bottleneck_data()

final_tensor_name='final_result'
learning_rate=0.001

# Create TensorFlow session and train model 
sess=tf.InteractiveSession()
trainer(sess, X_train, Y_train, X_test, Y_test)

The model, on 5000 training samples achieved a very high train accuracy of 98.68% with a loss value of 0.03 and the Test accuracy on 8000 samples was 86.83%. It can be observed that when compared to state-of-the-art accuracy on STL-10 dataset, which is a semi-supervised work is at 74.3%, the accuracy obtained with a pre-trained network with very little computing work done, is much better and faster.

Semi-Supervised Transfer Learning with CAE and STL-10 dataset

Introduction

Although, Inception model did perform splendidly on the limited dataset available, it would be a bad choice for other uncorrelated dataset, say, speech. Transfer Learning with the pre-trained network always work best if the pre-trained network is fed correlated data. Let us take a minute to understand STL-10's intention to exist- Emphasize training semi-supervised! The dataset has 3 datasets:

  1. Train X and Train Y (5000 examples)
  2. Test X and Test Y (8000 examples)
  3. Unlabeled X (100000 examples)

This is close to a real-world scenario. Getting unlabeled dataset from the internet is not hard. However, obtaining that many labeled data is hard, getting the exact data as the labeled data as unlabeled might be hard too. We need to make the best of what we have. Hence, STL-10.

To give a brief information about how the dataset is advised to be used (Transfer Learning POV):

  • Perform Feature extraction on the Unlabeled dataset and learn weights. The unlabeled dataset does not contain the same images or same class of images as that of labeled (be it train or test) but it contains similar data
  • Perform supervised learning on this model with the labeled dataset.

Convolutional Autoencoder (CAE)

In order to perform feature extraction from scratch, the model chosen to do is is the CAE. Convolution is used to extract information from related pixels in an image rather than considering a single pixel at a time. This gives a better spatial information at higher levels. CAE is a network which has two parts as shown below [2]: The convoluting encoder and de-convoluting decoder.

The main goal of CAE is image compression and lossy regeneration of it.

In this project, since we only worry about the feature extraction, we only require the encoder layers of the model which gives us the compressed representation of the image. In order to accomodate the training of the CAE with a huge dataset, the model was trained on Google Cloud Platform for 300 epochs. Below cell will have the code snippet used to train. The model is saved every 10 epochs

Requires: Google Cloud Platform account, (Money to train, if not on free trial), Google Cloud SDK.


In [ ]:
'''
Python script which is the trainer task to train the CAE on Google Cloud 
Platform.

Requires Google Cloud Platform account, Training data and scripts are to be 
placed inside Cloud Storage Bucket
'''

import numpy as np
import tensorflow as tf
from tensorflow.python.lib.io import file_io
from datetime import datetime
import logging
import argparse, os
from StringIO import StringIO


tf.reset_default_graph()

# Batch size to be inputted
batch_size=500
# Filter window size for every layer
filter_size=[4,4,4,4]


def iterate_batches(x_in, batch_size):
    '''
    Function to randomly shuffle and yield batches for training
    
    Args: Unlabeled images and batch size
    Returns: Batch of images, shuffled
    
    '''
    new_perm=np.random.permutation(range(len(x_in)))
    epoch_images=x_in[new_perm, ...]
   
    current_batch_id=0
    while current_batch_id < len(x_in):
        end=min(current_batch_id+batch_size,len(x_in))
        batch_images={'images': epoch_images[current_batch_id:end]}
        current_batch_id+=batch_size
        yield batch_images['images']
    
                                 
def train_model(train_file='../Unlabeled_X.npy', job_dir='./tmp/autoencoder', \
                output_dir='../output/', learning_rate=0.001, n_epochs=300, **args):
    
    '''
    Function to train the CAE by taking in batches of images. Requires
    arguments to be passed while initiating the job on GCP. Saves the model in 
    the Bucket every 10 epochs
    
    Args: Location of Training data (Cloud Storage Bucket), job-directory to 
    output logs of the job, learning rate and number of iterations for training
    
    
    '''
    logs_path=job_dir+'/logs/'+datetime.now().isoformat()
    output_file=os.path.join(output_dir,'saved-autoencoder-model')
    logging.info('_____________________')
    logging.info('Using Train File located at {}'.format(train_file))
    logging.info('Using Logs_path located at {}'.format(logs_path))
    logging.info('_____________________')
    file_string=StringIO(file_io.read_file_to_string(train_file))
    with tf.Graph().as_default():
        sess=tf.InteractiveSession()
        X_input=np.load(file_string)
        idx=range(len(X_input))
        
        # Shuffle Data
        rand_idxs=np.random.permutation(idx)
        X_input=X_input[rand_idxs,...]


        logging.info('Unlabeled Dataset loaded')
        
        features=X_input.shape[1]

        # Number of filters for every layer
        n_filters=[64,64,64,64]
    
        # Create placeholder for image tensor
        X=tf.placeholder(tf.float32, shape=[None, features], name='X')
        X_image_tensor=tf.reshape(X, [-1, 96, 96, 3])
    
        currentInput=X_image_tensor
        n_input=currentInput.get_shape().as_list()[3]
        Ws=[]
        shapes=[]
        
        # Build a 4-layer convolutional encoder model by appending weights
        # dimensions for decoder
        for layer, output_size in enumerate(n_filters):
            with tf.variable_scope("encoder/layer_{}".format(layer)):
                shapes.append(currentInput.get_shape().as_list())
                W=tf.get_variable(name='W', shape=[filter_size[layer],\
                                                   filter_size[layer],\
                                                    n_input, output_size],\
                                                    initializer=\
                                                    tf.random_normal_initializer(mean=0.0,stddev=0.01))
                b=tf.get_variable(name='b', shape=[output_size], initializer=\
                                  tf.constant_initializer([0]))
                h=(tf.add(tf.nn.conv2d(currentInput, W, strides=[1,2,2,1],\
                               padding='SAME'),b))
                h=tf.nn.relu(h,name='h')
                currentInput=h
                Ws.append(W)
                n_input=output_size
        
        # Reverse weights matrix and shape matrix for decoder
        Ws.reverse()
        shapes.reverse()
        n_filters.reverse()
        n_filters=n_filters[1:]+[3]
        
        # Decoder for reconstruction of images
        for layer, output_size in enumerate(shapes):
            with tf.variable_scope('decoder/layer_{}'.format(layer)):
                W=Ws[layer]
                b = tf.Variable(tf.zeros([W.get_shape().as_list()[2]]))
                output_shape=tf.stack([tf.shape(X)[0], \
                                       output_size[1],output_size[2],output_size[3]])
                h=(tf.add(tf.nn.conv2d_transpose(currentInput, W, output_shape=output_shape, \
                                         strides=[1,2,2,1],padding='SAME'),b))
                h=tf.nn.relu(h,name='h')
                currentInput=h
                
        # Final Placeholder        
        Y=currentInput
        Y=tf.reshape(Y,[-1,96*96*3])
        
        cost=tf.reduce_mean(tf.reduce_mean(tf.squared_difference(X,Y),1))
        optimizer=tf.train.AdamOptimizer(float(learning_rate)).minimize(cost)
        
        # Initiate Saver Instance
        saver=tf.train.Saver()
        
        # Initialize variables
        sess.run(tf.global_variables_initializer())
        
        # Start training
        for i in range(int(n_epochs)):
            for batch_img in iterate_batches(X_input, batch_size=batch_size):
                sess.run(optimizer,feed_dict={X:batch_img})
            # Every 10 epochs, report performance and save model graph and weights
            if i%10==0:    
                logging.info('Epoch:{0}, Cost={1}'.format(i, \
                             sess.run(cost, feed_dict={X: batch_img})))
                saver.save(sess, output_file, global_step=0)
                logging.info('Model Saved')

                
                
if __name__=='__main__':
    parser=argparse.ArgumentParser()
    parser.add_argument('--train-file', help='GCS or local paths to train data',\
                        required=True)
    parser.add_argument('--job-dir', help='GCS location to write \
    checkpoints and export models', required=True)
    parser.add_argument('--output_dir', help='GCS location \
    to write model', required=True)
    parser.add_argument('--learning-rate', help='Learning Rate', required=True)
    parser.add_argument('--n-epochs', help='Number of epochs', required=True)
    
    args=parser.parse_args()
    arguments=args.__dict__
    
    train_model(**arguments)

The script was run with a standard-GPU (Single Tesla K80 GPU) tier on the GCP's Cloud ML Engine API for training. There are 4 convolutional layers with filter window size of 4$\times$4 for every filter in every layer. Below are the montage of images of every layer of encoder for an image that is obtained after training.

The above figures show the montage of filter outputs after the ReLU functions of every encoder layer. Notice how the heat map is red for that particular filter's activation features.

And the original image fed to the network is as shown below:

Transfer Learning using a Fully Connected Model for supervised learning

Now that we have a trained model with a fairly decent performance given the resources, the second step is the same as Inception but with slight modifications.


In [ ]:
def create_graph(sess):
    
    '''
    Function to extract Graph and model from the trained CAE.
    
    '''
    
    saver=tf.train.import_meta_graph(model_meta)
    saver.restore(sess, model)

This loads the TensorFlow default graph with our model's graph and weights. Unlike Inception's pool_3, we choose our bottleneck layer to be the last encoder layer, the 'encoder/layer_3/h' which is the ReLU output


In [ ]:
def extract_features(sess, X_input):
    
    '''
    Function to extract features for a given batch of images by
    passing it through CAE model until the layer 3 of ReLu of encoder to get bottlenecks
    
    Args: Current Session, Images
    Returns: Array of 2304 features extracted for every image by Inception
    '''

    encoder_relu=sess.graph.get_tensor_by_name('encoder/layer_3/h:0')
    features=sess.run(encoder_relu, feed_dict={'X:0':X_input})
    return features

The rest remain the same, i.e., saving bottlenecks, addition of final layer, evaluation step definition, and finally train the Fully connected model with the training examples. Below is the Terminal output of training the Fully Connected model. For clarity, every 50 epochs data is displayed


In [ ]:
C:\Users\Sharath\Documents\Machine Learning\Startup.ML\gcloud\transfer learning>python fc_train.py
2017-08-17 10:53:33.952634: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.952990: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.954768: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.955172: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.956779: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.958236: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.959671: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:33.960981: W c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-17 10:53:35.855224: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:940] Found device 0 with properties:
name: GeForce GT 740M
major: 3 minor: 5 memoryClockRate (GHz) 1.0325
pciBusID 0000:0a:00.0
Total memory: 2.00GiB
Free memory: 1.67GiB
2017-08-17 10:53:35.855834: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:961] DMA: 0
2017-08-17 10:53:35.860241: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:971] 0:   Y
2017-08-17 10:53:35.862274: I c:\tf_jenkins\home\workspace\release-win\m\windows-gpu\py\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 740M, pci bus id: 0000:0a:00.0)
Epoch 0: Train accuracy:13.04, Cross Entropy:2.30

Epoch 50: Train accuracy:35.78, Cross Entropy:1.81
Epoch 100: Train accuracy:41.82, Cross Entropy:1.66
Epoch 150: Train accuracy:59.64, Cross Entropy:1.35
Epoch 200: Train accuracy:68.78, Cross Entropy:1.14
Epoch 250: Train accuracy:63.56, Cross Entropy:1.26
Epoch 300: Train accuracy:82.28, Cross Entropy:0.74
Epoch 350: Train accuracy:77.56, Cross Entropy:0.88
Epoch 400: Train accuracy:91.80, Cross Entropy:0.49
Epoch 450: Train accuracy:85.94, Cross Entropy:0.62
Epoch 500: Train accuracy:92.42, Cross Entropy:0.46
Epoch 550: Train accuracy:97.96, Cross Entropy:0.33

Final Test Accuracy for CAE Transfer Learning:43.63

The test accuracy obtained was 43.63% which is quite poor when compared to the Inception version of the project. This is due to the CAE's simplicity in extracting the features and high training cost due to the poor reconstruction. But this opens the door to create our own model to pre-train using the related unlabeled dataset we could obtain and apply supervised learning to the little training examples we might possess. Also, the State-of-the-art (SWWAE[4]) accuracy obtained on any semi-supervised work done on STL-10 as per data on [3] is 74.33% and the last on the list[5] is having accuracy of 58.28%. Comparitively, the model has a decent performance. The performance can be more enhanced by having a deeper model and more filters to extract features from the layers in the CAE. The Graph below shows the accuracy comparisons between the model designed using pre-trained Inception model, the SWWAE, Pooling Invariant Image Feature Learning, and the CAE.

Future Work

The Inception version of the model can be made to work better by bottlenecking earlier layers rather than the final 'pool_3' layer. This would render more work on our end to fine-tune the model. One other possibility in the case of STL-10 is Data Augmentation to increase the number of training examples and prevent chances of over fitting. The key interest would be to improve the accuracy of the CAE designed. Increasing the complexity of the model by adding deeper encoder/decoder layers, increasing the number of Conv Filters per layer and also the window size for every filter would render a better feature extraction model. However, the caviet here would be requirement for a powerful machine and longer training time. Since the CAE was trained on the cloud, the training might require a powerful tier of compute engine, rendering the process expensive.

Conclusion

In this work, STL-10 Dataset was used to perform Transfer Learning on Pre-trained networks such as Inception V3 model. Later, we built our own feature extraction model using CAE using STL's unlabeled dataset and the last layer of Encoder layer was extracted to perform the supervised learning on the labeled dataset by bottlenecking the images into None$\times$2304 thereby making it a simpler representation of the dataset. The bottlenecks were then trained using a simple Fully connected network to yield a test accuracy of 43.63%. One interesting thing to note here is that, the choice of model depends on the user's needs. In the case of Inception, we were lucky to have images as examples since the Inception was trained on the same. Any other data would have yielded poor performance on the Inception. However, since we had a large dataset of unlabeled dataset of similar images, we were able to train a CAE which is an unsupervised convolutional model used to generate lossy images given an image, we performed transfer learning on this model to obtain a fairly decent accuracy. This approach gives the user freedom to design his/her own pre-trained network no matter what the data is. The only 3 requirements being, possession of a large unlabeled dataset, Compute heavy engine, and Time. It is up to the user to use pre-trained networks like Inception or VGGNet or build their own given the circumstances they are in.

References

  1. Google Inception Graph. [Image Courtesy: www.research.googleblog.com]
  2. Convolutional Autoencoder Structure. [Image Courtesy: www.researchgate.net]
  3. Classification Dataset Results- Discover the current State of the art in Objects classification URL:http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#53544c2d3130
  4. Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun, "Stacked What-Where Auto-encoders" [arXiv:1506.02351 [stat.ML]]
  5. Yangqing Jia, et.al. "Pooling-Invariant Image Feature Learning"