Getting started with Caffe

This class was created by Allison Gray and Jon Barker.

The following timer counts down to a five minute warning before the lab instance shuts down. You should get a pop up at the five minute warning reminding you to save your work! If you are about to run out of time, please see the Post-Lab section for saving this lab to view offline later.

Before we begin, let's verify WebSockets are working on your system. To do this, execute the cell block below by giving it focus (clicking on it with your mouse), and hitting Ctrl-Enter, or pressing the play button in the toolbar above. If all goes well, you should see some output returned below the grey cell. If not, please consult the Self-paced Lab Troubleshooting FAQ to debug the issue.



In [1]:

    
print "The answer should be three: " + str(1+2)









    



The answer should be three: 3

Let's execute the cell below to display information about the GPUs running on the server.



In [2]:

    
!nvidia-smi









    



Fri Aug 28 15:12:17 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GRID K520           On   | 0000:00:03.0     Off |                  N/A |
| N/A   50C    P8    17W / 125W |     10MiB /  4095MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Introduction

Caffe is a deep learning framework developed by the Berkely Vision and Learning Center (BVLC) and community contributors. Caffe is released under the BSD 2-Clause license. Caffe emphasizes easy application of deep learning. All neural networks and optimization parameters are defined by configuration files without any hard-coding and Caffe offers a command line interface as well as scripting interfaces in Python and MATLAB. Caffe is fast due it's C++ and CUDA foundation, but the code is extensible fostering active development. There is a large open-source community contributing many significant changes and state-of-the-art features back into Caffe. Neural networks trained using Caffe are also saved into a well-defined binary format that makes them easy to share - in fact, there is a model zoo hosted here where you can download cutting edge pre-trained neural networks.

The objectives of this class are to learn how to complete the following tasks in Caffe:

Build and train a convolutional neural network (CNN) for classifying images.
Evaluate the classification performance of a trained CNN under different training parameter configurations.
Modify the network configuration to improve classification performance.
Visualize the features that a trained network has learned.
Classify new test images using a trained network.

This is an introductory class and is part of NVIDIA's five class Introduction to Deep Learning course. It is assumed that you have completed the previous modules "Introduction to Deep Learning" and "Getting Started with DIGITS interactive training system for image classification" before starting this class.

Training and Classifying with Caffe

You are provided with a subset of the ImageNet dataset. The images are from two different categories, cats and dogs. Below are sample images from both categories. The cats category includes domestic cats as well as large breeds like lions and tigers. The dog category is comprised of domestic dogs including pugs, basenji and great pyrenees. There are approximately 13,000 images in total.

In this class we will be creating and training a deep neural network in Caffe that can accurately classify images from these two categories, i.e. can label an image of a dog as "dog" and an image of a cat as "cat".

Task 1 - Creating a Database and Mean Image

Before we train a neural network using Caffe we will move the training and validation images into a database. The database allows Caffe to efficiently iterate over the image data during training. The training and validation datasets are independent subsets of the original image dataset. We will train the network using the training dataset and then test the networks performance using the validation dataset; that way we can be sure that the network performs well for images that it has never been trained on.

To minimize the training time during this class we have resized all of the images to 32x32 pixels for you. The image files can be located anywhere on the filesystem. Caffe knows which images belong to the training and validation sets and which class each image belongs to by referring to text files train.txt and val.txt. These files simply list the relative filename of each image tab seperated from a natural number representing the class the image belongs to. For example, train.txt contains the following rows:

cat/cat_0_32.jpg 0 cat/cat_1000_32.jpg 0 dog/dog_9990_32.jpg 1 dog/dog_9991_32.jpg 1 ...

We also create a mean image from the training data. This is the image obtained by taking the mean value of each pixel across all of the training dataset images. We do this so that we can extract that mean image from each training and validation image before it is fed into the neural network. This is an important pre-processing step for achieving fast and effective training. It has the effect of removing the average brightness (intensity) of each point in the image so that the network learns about image content rather than illumination conditions.

We complete each of these tasks using command line tools that come with Caffe. A number of useful utilities for data pre-processing, network training and network deployment can be found in the Caffe installation folder in $CAFFE_ROOT/build/tools

Execute the cell below to create a mean image of the training data. (Note: you can ignore the "Failed to initialize libdc1394" warning messages in the output)

You will know the lab is processing when you see a solid circle in the top-right of the window that looks like this: Otherwise, when it is idle, you will see the following: If you ever feel like a cell has run for to long, you can stop it with the stop button in the toolbar. For troubleshooting, please see Self-paced Lab Troubleshooting FAQ to debug the issue.



In [3]:

    
%%bash

rm -rf train_lmdb val_lmdb

#Setup environment variables
TOOLS=/home/ubuntu/caffe/build/tools
TRAIN_DATA_ROOT=/home/ubuntu/data/dog_cat/dog_cat_32/train/
VAL_DATA_ROOT=/home/ubuntu/data/dog_cat/dog_cat_32/val/

#Create the training database
$TOOLS/convert_imageset \
--shuffle \
$TRAIN_DATA_ROOT \
$TRAIN_DATA_ROOT/train.txt \
train_lmdb

#Create the validation database
$TOOLS/convert_imageset \
--shuffle \
$VAL_DATA_ROOT \
$VAL_DATA_ROOT/val.txt \
val_lmdb

#Create the mean image database
$TOOLS/compute_image_mean train_lmdb mean.binaryproto









    



E0828 15:13:31.193488  2814 convert_imageset.cpp:143] Processed 1000 files.
E0828 15:13:32.258235  2814 convert_imageset.cpp:143] Processed 2000 files.
E0828 15:13:33.201022  2814 convert_imageset.cpp:143] Processed 3000 files.
E0828 15:13:34.090423  2814 convert_imageset.cpp:143] Processed 4000 files.
E0828 15:13:35.128525  2814 convert_imageset.cpp:143] Processed 5000 files.
E0828 15:13:36.202548  2814 convert_imageset.cpp:143] Processed 6000 files.
E0828 15:13:37.077045  2814 convert_imageset.cpp:143] Processed 7000 files.
E0828 15:13:38.085916  2814 convert_imageset.cpp:143] Processed 8000 files.
E0828 15:13:39.035594  2814 convert_imageset.cpp:143] Processed 9000 files.
E0828 15:13:39.914156  2814 convert_imageset.cpp:143] Processed 10000 files.
E0828 15:13:40.806207  2814 convert_imageset.cpp:143] Processed 11000 files.
E0828 15:13:41.776578  2814 convert_imageset.cpp:143] Processed 12000 files.
E0828 15:13:42.638947  2814 convert_imageset.cpp:149] Processed 12904 files.
E0828 15:13:44.051775  2825 convert_imageset.cpp:143] Processed 1000 files.
E0828 15:13:44.164707  2825 convert_imageset.cpp:149] Processed 1138 files.

Run the cell below to see what the mean image looks like. Strangely, it looks a little like a mouse...



In [4]:

    
import caffe
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline

blob = caffe.proto.caffe_pb2.BlobProto()
data = open('/home/ubuntu/notebook/mean.binaryproto','rb').read()
blob.ParseFromString(data)
arr = np.array(caffe.io.blobproto_to_array(blob))[0,:,:,:].mean(0)
plt.imshow(arr, cmap=cm.Greys_r)
plt.show()

Task 2 - Training your network

We are now going to configure our network and start training. For this exercise we will borrow our network architecture from the cifar10 design that is provided with Caffe. You can find the original network in $CAFFE_ROOT/examples/cifar10. In the first class in this series "Introduction to Deep Learning on GPUs" we used this network to classify the CIFAR-10 images themselves. The only modification we will make initially is to change the final output softmax layer, which does the actual classification, to only have two classes for our dogs and cats problem rather than the ten classes needed for the CIFAR data.

Below we see an image of the cifar10 network architecture. Hopefully you will be familiar with this type of network diagram from your introduction to DIGITS in the last class. As we can see the cifar10 network is a convolutional neural network (CNN) with three convolutional layers (each with pooling, ReLU activation and normalization) followed by a single fully-connected layer performing the final classification.

We see that the network input data has dimensions 100x3x32x32. This means that we have batches of 100 training images each with three-channels, i.e. it's color, and 32x32 pixels.

The first convolutional layer applies 32 5x5 filters. As the filters have size 5x5 we must pad the edges of the input input images with zeros to ensure that the output of the layer has the same size.

After the convolutional layer max pooling is applied to reduce the size of the output images by half, a rectified linear unit (ReLU) activation function is applied and local response normalization (LRN) is applied.

Layers 2 and 3 are convolutional layers repeating this pattern but gradually reducing the output size of the layers and using average pooling.

The layers is a fully connected layer two neurons which together with the image labels feeds into a softmax layer which performs the classification and computes the classification loss. The two final neurons in the output layer correspond to our two classes, dogs and cats.

Caffe encodes deep neural network architectures like this in text files called prototxt files. You can see the corresponding cifar10 prototxt file, called train_val.prototxt, below. As you will see, the file is human readable and it is easy to map the sections of the prototxt file to the network layers in the diagram above. You can get more information about the types of layers that can be defined in a network prototxt file here.

Your browser does not support iframes.

We now have our datasets and a network configuration we are nearly ready to train the network. The final thing that Caffe needs before we can train is a specification of the learning algorithm parameters. This specification is also made in a prototxt file but with a simpler structure. If you open the solver.prototxt file in the text editor above, you'll see it clearly specifying the learning parameters. You can get more information about the range of parameters that can be set in a solver prototxt file here.

We are now ready to train the network. Again, this is carried out using a command line tool that comes with Caffe, this time it is the binary caffe itself with the train option. Execute the cell below to begin training - it should take just under a minute to train - be sure to scroll down through the complete Caffe output.



In [5]:

    
%%bash
#Set the location of the caffe tools folder
TOOLS=/home/ubuntu/caffe/build/tools
#Train the network
$TOOLS/caffe train -gpu 0 -solver src/solver.prototxt









    



I0828 15:17:18.277583  2844 caffe.cpp:99] Use GPU with device ID 0
I0828 15:17:18.417570  2844 caffe.cpp:107] Starting Optimization
I0828 15:17:18.417754  2844 solver.cpp:32] Initializing solver from parameters: 
test_iter: 100
test_interval: 250
base_lr: 0.01
display: 20
max_iter: 750
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
stepsize: 500
snapshot: 250
snapshot_prefix: "checkpoints/snapshot"
solver_mode: GPU
net: "src/train_val.prototxt"
solver_type: SGD
I0828 15:17:18.417837  2844 solver.cpp:70] Creating training net from net file: src/train_val.prototxt
I0828 15:17:18.419575  2844 net.cpp:253] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I0828 15:17:18.419605  2844 net.cpp:253] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0828 15:17:18.419729  2844 net.cpp:42] Initializing net from parameters: 
state {
  phase: TRAIN
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 32
    mean_file: "mean.binaryproto"
  }
  data_param {
    source: "train_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
    decay_mult: 250
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}
I0828 15:17:18.419838  2844 layer_factory.hpp:74] Creating layer data
I0828 15:17:18.422015  2844 net.cpp:76] Creating Layer data
I0828 15:17:18.422039  2844 net.cpp:334] data -> data
I0828 15:17:18.422073  2844 net.cpp:334] data -> label
I0828 15:17:18.422094  2844 net.cpp:105] Setting up data
I0828 15:17:18.422185  2844 db.cpp:34] Opened lmdb train_lmdb
I0828 15:17:18.422245  2844 data_layer.cpp:67] output data size: 100,3,32,32
I0828 15:17:18.422262  2844 data_transformer.cpp:22] Loading mean file from: mean.binaryproto
I0828 15:17:18.423231  2844 net.cpp:112] Top shape: 100 3 32 32 (307200)
I0828 15:17:18.423266  2844 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:17:18.423275  2844 layer_factory.hpp:74] Creating layer conv1
I0828 15:17:18.423323  2844 net.cpp:76] Creating Layer conv1
I0828 15:17:18.423337  2844 net.cpp:372] conv1 <- data
I0828 15:17:18.423353  2844 net.cpp:334] conv1 -> conv1
I0828 15:17:18.423365  2844 net.cpp:105] Setting up conv1
I0828 15:17:23.204232  2844 net.cpp:112] Top shape: 100 32 32 32 (3276800)
I0828 15:17:23.204298  2844 layer_factory.hpp:74] Creating layer pool1
I0828 15:17:23.204326  2844 net.cpp:76] Creating Layer pool1
I0828 15:17:23.204339  2844 net.cpp:372] pool1 <- conv1
I0828 15:17:23.204349  2844 net.cpp:334] pool1 -> pool1
I0828 15:17:23.204360  2844 net.cpp:105] Setting up pool1
I0828 15:17:23.205667  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.205685  2844 layer_factory.hpp:74] Creating layer relu1
I0828 15:17:23.205695  2844 net.cpp:76] Creating Layer relu1
I0828 15:17:23.205700  2844 net.cpp:372] relu1 <- pool1
I0828 15:17:23.205706  2844 net.cpp:323] relu1 -> pool1 (in-place)
I0828 15:17:23.205714  2844 net.cpp:105] Setting up relu1
I0828 15:17:23.205762  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.205775  2844 layer_factory.hpp:74] Creating layer norm1
I0828 15:17:23.205787  2844 net.cpp:76] Creating Layer norm1
I0828 15:17:23.205796  2844 net.cpp:372] norm1 <- pool1
I0828 15:17:23.205803  2844 net.cpp:334] norm1 -> norm1
I0828 15:17:23.205816  2844 net.cpp:105] Setting up norm1
I0828 15:17:23.207016  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.207031  2844 layer_factory.hpp:74] Creating layer conv2
I0828 15:17:23.207043  2844 net.cpp:76] Creating Layer conv2
I0828 15:17:23.207053  2844 net.cpp:372] conv2 <- norm1
I0828 15:17:23.207061  2844 net.cpp:334] conv2 -> conv2
I0828 15:17:23.207070  2844 net.cpp:105] Setting up conv2
I0828 15:17:23.208168  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.208194  2844 layer_factory.hpp:74] Creating layer relu2
I0828 15:17:23.208206  2844 net.cpp:76] Creating Layer relu2
I0828 15:17:23.208211  2844 net.cpp:372] relu2 <- conv2
I0828 15:17:23.208217  2844 net.cpp:323] relu2 -> conv2 (in-place)
I0828 15:17:23.208223  2844 net.cpp:105] Setting up relu2
I0828 15:17:23.208278  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.208291  2844 layer_factory.hpp:74] Creating layer pool2
I0828 15:17:23.208298  2844 net.cpp:76] Creating Layer pool2
I0828 15:17:23.208304  2844 net.cpp:372] pool2 <- conv2
I0828 15:17:23.208312  2844 net.cpp:334] pool2 -> pool2
I0828 15:17:23.208324  2844 net.cpp:105] Setting up pool2
I0828 15:17:23.208454  2844 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:17:23.208472  2844 layer_factory.hpp:74] Creating layer norm2
I0828 15:17:23.208482  2844 net.cpp:76] Creating Layer norm2
I0828 15:17:23.208492  2844 net.cpp:372] norm2 <- pool2
I0828 15:17:23.208498  2844 net.cpp:334] norm2 -> norm2
I0828 15:17:23.208508  2844 net.cpp:105] Setting up norm2
I0828 15:17:23.208523  2844 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:17:23.208534  2844 layer_factory.hpp:74] Creating layer conv3
I0828 15:17:23.208542  2844 net.cpp:76] Creating Layer conv3
I0828 15:17:23.208549  2844 net.cpp:372] conv3 <- norm2
I0828 15:17:23.208559  2844 net.cpp:334] conv3 -> conv3
I0828 15:17:23.208571  2844 net.cpp:105] Setting up conv3
I0828 15:17:23.210523  2844 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:17:23.210547  2844 layer_factory.hpp:74] Creating layer relu3
I0828 15:17:23.210556  2844 net.cpp:76] Creating Layer relu3
I0828 15:17:23.210561  2844 net.cpp:372] relu3 <- conv3
I0828 15:17:23.210567  2844 net.cpp:323] relu3 -> conv3 (in-place)
I0828 15:17:23.210579  2844 net.cpp:105] Setting up relu3
I0828 15:17:23.210631  2844 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:17:23.210644  2844 layer_factory.hpp:74] Creating layer pool3
I0828 15:17:23.210651  2844 net.cpp:76] Creating Layer pool3
I0828 15:17:23.210656  2844 net.cpp:372] pool3 <- conv3
I0828 15:17:23.210693  2844 net.cpp:334] pool3 -> pool3
I0828 15:17:23.210703  2844 net.cpp:105] Setting up pool3
I0828 15:17:23.210754  2844 net.cpp:112] Top shape: 100 64 4 4 (102400)
I0828 15:17:23.210767  2844 layer_factory.hpp:74] Creating layer ip1
I0828 15:17:23.210778  2844 net.cpp:76] Creating Layer ip1
I0828 15:17:23.210788  2844 net.cpp:372] ip1 <- pool3
I0828 15:17:23.210798  2844 net.cpp:334] ip1 -> ip1
I0828 15:17:23.210813  2844 net.cpp:105] Setting up ip1
I0828 15:17:23.210896  2844 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:17:23.210911  2844 layer_factory.hpp:74] Creating layer loss
I0828 15:17:23.210922  2844 net.cpp:76] Creating Layer loss
I0828 15:17:23.210928  2844 net.cpp:372] loss <- ip1
I0828 15:17:23.210933  2844 net.cpp:372] loss <- label
I0828 15:17:23.210942  2844 net.cpp:334] loss -> loss
I0828 15:17:23.210952  2844 net.cpp:105] Setting up loss
I0828 15:17:23.210961  2844 layer_factory.hpp:74] Creating layer loss
I0828 15:17:23.211030  2844 net.cpp:112] Top shape: 1 1 1 1 (1)
I0828 15:17:23.211042  2844 net.cpp:118]     with loss weight 1
I0828 15:17:23.211071  2844 net.cpp:163] loss needs backward computation.
I0828 15:17:23.211076  2844 net.cpp:163] ip1 needs backward computation.
I0828 15:17:23.211081  2844 net.cpp:163] pool3 needs backward computation.
I0828 15:17:23.211086  2844 net.cpp:163] relu3 needs backward computation.
I0828 15:17:23.211089  2844 net.cpp:163] conv3 needs backward computation.
I0828 15:17:23.211092  2844 net.cpp:163] norm2 needs backward computation.
I0828 15:17:23.211097  2844 net.cpp:163] pool2 needs backward computation.
I0828 15:17:23.211102  2844 net.cpp:163] relu2 needs backward computation.
I0828 15:17:23.211107  2844 net.cpp:163] conv2 needs backward computation.
I0828 15:17:23.211110  2844 net.cpp:163] norm1 needs backward computation.
I0828 15:17:23.211114  2844 net.cpp:163] relu1 needs backward computation.
I0828 15:17:23.211119  2844 net.cpp:163] pool1 needs backward computation.
I0828 15:17:23.211122  2844 net.cpp:163] conv1 needs backward computation.
I0828 15:17:23.211127  2844 net.cpp:165] data does not need backward computation.
I0828 15:17:23.211132  2844 net.cpp:201] This network produces output loss
I0828 15:17:23.211144  2844 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0828 15:17:23.211158  2844 net.cpp:213] Network initialization done.
I0828 15:17:23.211161  2844 net.cpp:214] Memory required for data: 36046004
I0828 15:17:23.211580  2844 solver.cpp:154] Creating test net (#0) specified by net file: src/train_val.prototxt
I0828 15:17:23.211640  2844 net.cpp:253] The NetState phase (1) differed from the phase (0) specified by a rule in layer data
I0828 15:17:23.211782  2844 net.cpp:42] Initializing net from parameters: 
state {
  phase: TEST
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    crop_size: 32
    mean_file: "mean.binaryproto"
  }
  data_param {
    source: "val_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
    decay_mult: 250
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip1"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
I0828 15:17:23.211894  2844 layer_factory.hpp:74] Creating layer data
I0828 15:17:23.211912  2844 net.cpp:76] Creating Layer data
I0828 15:17:23.211918  2844 net.cpp:334] data -> data
I0828 15:17:23.211930  2844 net.cpp:334] data -> label
I0828 15:17:23.211942  2844 net.cpp:105] Setting up data
I0828 15:17:23.211989  2844 db.cpp:34] Opened lmdb val_lmdb
I0828 15:17:23.212025  2844 data_layer.cpp:67] output data size: 100,3,32,32
I0828 15:17:23.212038  2844 data_transformer.cpp:22] Loading mean file from: mean.binaryproto
I0828 15:17:23.212601  2844 net.cpp:112] Top shape: 100 3 32 32 (307200)
I0828 15:17:23.212615  2844 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:17:23.212620  2844 layer_factory.hpp:74] Creating layer label_data_1_split
I0828 15:17:23.212630  2844 net.cpp:76] Creating Layer label_data_1_split
I0828 15:17:23.212636  2844 net.cpp:372] label_data_1_split <- label
I0828 15:17:23.212645  2844 net.cpp:334] label_data_1_split -> label_data_1_split_0
I0828 15:17:23.212652  2844 net.cpp:334] label_data_1_split -> label_data_1_split_1
I0828 15:17:23.212661  2844 net.cpp:105] Setting up label_data_1_split
I0828 15:17:23.212666  2844 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:17:23.212671  2844 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:17:23.212674  2844 layer_factory.hpp:74] Creating layer conv1
I0828 15:17:23.212683  2844 net.cpp:76] Creating Layer conv1
I0828 15:17:23.212693  2844 net.cpp:372] conv1 <- data
I0828 15:17:23.212702  2844 net.cpp:334] conv1 -> conv1
I0828 15:17:23.212709  2844 net.cpp:105] Setting up conv1
I0828 15:17:23.213052  2844 net.cpp:112] Top shape: 100 32 32 32 (3276800)
I0828 15:17:23.213075  2844 layer_factory.hpp:74] Creating layer pool1
I0828 15:17:23.213085  2844 net.cpp:76] Creating Layer pool1
I0828 15:17:23.213093  2844 net.cpp:372] pool1 <- conv1
I0828 15:17:23.213099  2844 net.cpp:334] pool1 -> pool1
I0828 15:17:23.213104  2844 net.cpp:105] Setting up pool1
I0828 15:17:23.213239  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.213255  2844 layer_factory.hpp:74] Creating layer relu1
I0828 15:17:23.213263  2844 net.cpp:76] Creating Layer relu1
I0828 15:17:23.213268  2844 net.cpp:372] relu1 <- pool1
I0828 15:17:23.213279  2844 net.cpp:323] relu1 -> pool1 (in-place)
I0828 15:17:23.213291  2844 net.cpp:105] Setting up relu1
I0828 15:17:23.213342  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.213354  2844 layer_factory.hpp:74] Creating layer norm1
I0828 15:17:23.213361  2844 net.cpp:76] Creating Layer norm1
I0828 15:17:23.213366  2844 net.cpp:372] norm1 <- pool1
I0828 15:17:23.213371  2844 net.cpp:334] norm1 -> norm1
I0828 15:17:23.213395  2844 net.cpp:105] Setting up norm1
I0828 15:17:23.213433  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.213445  2844 layer_factory.hpp:74] Creating layer conv2
I0828 15:17:23.213454  2844 net.cpp:76] Creating Layer conv2
I0828 15:17:23.213457  2844 net.cpp:372] conv2 <- norm1
I0828 15:17:23.213466  2844 net.cpp:334] conv2 -> conv2
I0828 15:17:23.213475  2844 net.cpp:105] Setting up conv2
I0828 15:17:23.214570  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.214593  2844 layer_factory.hpp:74] Creating layer relu2
I0828 15:17:23.214601  2844 net.cpp:76] Creating Layer relu2
I0828 15:17:23.214611  2844 net.cpp:372] relu2 <- conv2
I0828 15:17:23.214617  2844 net.cpp:323] relu2 -> conv2 (in-place)
I0828 15:17:23.214623  2844 net.cpp:105] Setting up relu2
I0828 15:17:23.214679  2844 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:17:23.214691  2844 layer_factory.hpp:74] Creating layer pool2
I0828 15:17:23.214701  2844 net.cpp:76] Creating Layer pool2
I0828 15:17:23.214710  2844 net.cpp:372] pool2 <- conv2
I0828 15:17:23.214716  2844 net.cpp:334] pool2 -> pool2
I0828 15:17:23.214728  2844 net.cpp:105] Setting up pool2
I0828 15:17:23.214783  2844 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:17:23.214795  2844 layer_factory.hpp:74] Creating layer norm2
I0828 15:17:23.214803  2844 net.cpp:76] Creating Layer norm2
I0828 15:17:23.214813  2844 net.cpp:372] norm2 <- pool2
I0828 15:17:23.214819  2844 net.cpp:334] norm2 -> norm2
I0828 15:17:23.214825  2844 net.cpp:105] Setting up norm2
I0828 15:17:23.214844  2844 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:17:23.214854  2844 layer_factory.hpp:74] Creating layer conv3
I0828 15:17:23.214864  2844 net.cpp:76] Creating Layer conv3
I0828 15:17:23.214872  2844 net.cpp:372] conv3 <- norm2
I0828 15:17:23.214879  2844 net.cpp:334] conv3 -> conv3
I0828 15:17:23.214885  2844 net.cpp:105] Setting up conv3
I0828 15:17:23.216828  2844 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:17:23.216850  2844 layer_factory.hpp:74] Creating layer relu3
I0828 15:17:23.216858  2844 net.cpp:76] Creating Layer relu3
I0828 15:17:23.216864  2844 net.cpp:372] relu3 <- conv3
I0828 15:17:23.216872  2844 net.cpp:323] relu3 -> conv3 (in-place)
I0828 15:17:23.216883  2844 net.cpp:105] Setting up relu3
I0828 15:17:23.217011  2844 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:17:23.217027  2844 layer_factory.hpp:74] Creating layer pool3
I0828 15:17:23.217034  2844 net.cpp:76] Creating Layer pool3
I0828 15:17:23.217041  2844 net.cpp:372] pool3 <- conv3
I0828 15:17:23.217049  2844 net.cpp:334] pool3 -> pool3
I0828 15:17:23.217061  2844 net.cpp:105] Setting up pool3
I0828 15:17:23.217113  2844 net.cpp:112] Top shape: 100 64 4 4 (102400)
I0828 15:17:23.217125  2844 layer_factory.hpp:74] Creating layer ip1
I0828 15:17:23.217133  2844 net.cpp:76] Creating Layer ip1
I0828 15:17:23.217139  2844 net.cpp:372] ip1 <- pool3
I0828 15:17:23.217144  2844 net.cpp:334] ip1 -> ip1
I0828 15:17:23.217156  2844 net.cpp:105] Setting up ip1
I0828 15:17:23.217234  2844 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:17:23.217248  2844 layer_factory.hpp:74] Creating layer ip1_ip1_0_split
I0828 15:17:23.217257  2844 net.cpp:76] Creating Layer ip1_ip1_0_split
I0828 15:17:23.217265  2844 net.cpp:372] ip1_ip1_0_split <- ip1
I0828 15:17:23.217272  2844 net.cpp:334] ip1_ip1_0_split -> ip1_ip1_0_split_0
I0828 15:17:23.217279  2844 net.cpp:334] ip1_ip1_0_split -> ip1_ip1_0_split_1
I0828 15:17:23.217290  2844 net.cpp:105] Setting up ip1_ip1_0_split
I0828 15:17:23.217298  2844 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:17:23.217301  2844 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:17:23.217306  2844 layer_factory.hpp:74] Creating layer loss
I0828 15:17:23.217313  2844 net.cpp:76] Creating Layer loss
I0828 15:17:23.217317  2844 net.cpp:372] loss <- ip1_ip1_0_split_0
I0828 15:17:23.217322  2844 net.cpp:372] loss <- label_data_1_split_0
I0828 15:17:23.217329  2844 net.cpp:334] loss -> loss
I0828 15:17:23.217334  2844 net.cpp:105] Setting up loss
I0828 15:17:23.217358  2844 layer_factory.hpp:74] Creating layer loss
I0828 15:17:23.217447  2844 net.cpp:112] Top shape: 1 1 1 1 (1)
I0828 15:17:23.217463  2844 net.cpp:118]     with loss weight 1
I0828 15:17:23.217473  2844 layer_factory.hpp:74] Creating layer accuracy
I0828 15:17:23.217491  2844 net.cpp:76] Creating Layer accuracy
I0828 15:17:23.217502  2844 net.cpp:372] accuracy <- ip1_ip1_0_split_1
I0828 15:17:23.217509  2844 net.cpp:372] accuracy <- label_data_1_split_1
I0828 15:17:23.217516  2844 net.cpp:334] accuracy -> accuracy
I0828 15:17:23.217525  2844 net.cpp:105] Setting up accuracy
I0828 15:17:23.217537  2844 net.cpp:112] Top shape: 1 1 1 1 (1)
I0828 15:17:23.217543  2844 net.cpp:165] accuracy does not need backward computation.
I0828 15:17:23.217546  2844 net.cpp:163] loss needs backward computation.
I0828 15:17:23.217550  2844 net.cpp:163] ip1_ip1_0_split needs backward computation.
I0828 15:17:23.217555  2844 net.cpp:163] ip1 needs backward computation.
I0828 15:17:23.217561  2844 net.cpp:163] pool3 needs backward computation.
I0828 15:17:23.217563  2844 net.cpp:163] relu3 needs backward computation.
I0828 15:17:23.217568  2844 net.cpp:163] conv3 needs backward computation.
I0828 15:17:23.217572  2844 net.cpp:163] norm2 needs backward computation.
I0828 15:17:23.217576  2844 net.cpp:163] pool2 needs backward computation.
I0828 15:17:23.217581  2844 net.cpp:163] relu2 needs backward computation.
I0828 15:17:23.217584  2844 net.cpp:163] conv2 needs backward computation.
I0828 15:17:23.217587  2844 net.cpp:163] norm1 needs backward computation.
I0828 15:17:23.217593  2844 net.cpp:163] relu1 needs backward computation.
I0828 15:17:23.217597  2844 net.cpp:163] pool1 needs backward computation.
I0828 15:17:23.217600  2844 net.cpp:163] conv1 needs backward computation.
I0828 15:17:23.217605  2844 net.cpp:165] label_data_1_split does not need backward computation.
I0828 15:17:23.217609  2844 net.cpp:165] data does not need backward computation.
I0828 15:17:23.217614  2844 net.cpp:201] This network produces output accuracy
I0828 15:17:23.217618  2844 net.cpp:201] This network produces output loss
I0828 15:17:23.217630  2844 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0828 15:17:23.217643  2844 net.cpp:213] Network initialization done.
I0828 15:17:23.217648  2844 net.cpp:214] Memory required for data: 36048408
I0828 15:17:23.217692  2844 solver.cpp:42] Solver scaffolding done.
I0828 15:17:23.217718  2844 solver.cpp:222] Solving 
I0828 15:17:23.217730  2844 solver.cpp:223] Learning Rate Policy: step
I0828 15:17:23.217741  2844 solver.cpp:266] Iteration 0, Testing net (#0)
I0828 15:17:24.037974  2844 solver.cpp:315]     Test net output #0: accuracy = 0.5052
I0828 15:17:24.038013  2844 solver.cpp:315]     Test net output #1: loss = 0.693151 (* 1 = 0.693151 loss)
I0828 15:17:24.049453  2844 solver.cpp:189] Iteration 0, loss = 0.693152
I0828 15:17:24.049481  2844 solver.cpp:204]     Train net output #0: loss = 0.693152 (* 1 = 0.693152 loss)
I0828 15:17:24.049501  2844 solver.cpp:470] Iteration 0, lr = 0.01
I0828 15:17:24.502413  2844 solver.cpp:189] Iteration 20, loss = 0.681894
I0828 15:17:24.502451  2844 solver.cpp:204]     Train net output #0: loss = 0.681894 (* 1 = 0.681894 loss)
I0828 15:17:24.502463  2844 solver.cpp:470] Iteration 20, lr = 0.01
I0828 15:17:24.956553  2844 solver.cpp:189] Iteration 40, loss = 0.687463
I0828 15:17:24.956593  2844 solver.cpp:204]     Train net output #0: loss = 0.687463 (* 1 = 0.687463 loss)
I0828 15:17:24.956600  2844 solver.cpp:470] Iteration 40, lr = 0.01
I0828 15:17:25.411460  2844 solver.cpp:189] Iteration 60, loss = 0.651783
I0828 15:17:25.411504  2844 solver.cpp:204]     Train net output #0: loss = 0.651783 (* 1 = 0.651783 loss)
I0828 15:17:25.411512  2844 solver.cpp:470] Iteration 60, lr = 0.01
I0828 15:17:25.865824  2844 solver.cpp:189] Iteration 80, loss = 0.675132
I0828 15:17:25.865874  2844 solver.cpp:204]     Train net output #0: loss = 0.675132 (* 1 = 0.675132 loss)
I0828 15:17:25.865882  2844 solver.cpp:470] Iteration 80, lr = 0.01
I0828 15:17:26.320744  2844 solver.cpp:189] Iteration 100, loss = 0.616854
I0828 15:17:26.320791  2844 solver.cpp:204]     Train net output #0: loss = 0.616854 (* 1 = 0.616854 loss)
I0828 15:17:26.320801  2844 solver.cpp:470] Iteration 100, lr = 0.01
I0828 15:17:26.775269  2844 solver.cpp:189] Iteration 120, loss = 0.647655
I0828 15:17:26.775307  2844 solver.cpp:204]     Train net output #0: loss = 0.647655 (* 1 = 0.647655 loss)
I0828 15:17:26.775315  2844 solver.cpp:470] Iteration 120, lr = 0.01
I0828 15:17:27.229985  2844 solver.cpp:189] Iteration 140, loss = 0.623226
I0828 15:17:27.230020  2844 solver.cpp:204]     Train net output #0: loss = 0.623226 (* 1 = 0.623226 loss)
I0828 15:17:27.230028  2844 solver.cpp:470] Iteration 140, lr = 0.01
I0828 15:17:27.684448  2844 solver.cpp:189] Iteration 160, loss = 0.688067
I0828 15:17:27.684484  2844 solver.cpp:204]     Train net output #0: loss = 0.688067 (* 1 = 0.688067 loss)
I0828 15:17:27.684492  2844 solver.cpp:470] Iteration 160, lr = 0.01
I0828 15:17:28.139037  2844 solver.cpp:189] Iteration 180, loss = 0.659497
I0828 15:17:28.139083  2844 solver.cpp:204]     Train net output #0: loss = 0.659497 (* 1 = 0.659497 loss)
I0828 15:17:28.139092  2844 solver.cpp:470] Iteration 180, lr = 0.01
I0828 15:17:28.593624  2844 solver.cpp:189] Iteration 200, loss = 0.651895
I0828 15:17:28.593677  2844 solver.cpp:204]     Train net output #0: loss = 0.651895 (* 1 = 0.651895 loss)
I0828 15:17:28.593685  2844 solver.cpp:470] Iteration 200, lr = 0.01
I0828 15:17:29.048219  2844 solver.cpp:189] Iteration 220, loss = 0.701343
I0828 15:17:29.048256  2844 solver.cpp:204]     Train net output #0: loss = 0.701343 (* 1 = 0.701343 loss)
I0828 15:17:29.048265  2844 solver.cpp:470] Iteration 220, lr = 0.01
I0828 15:17:29.502557  2844 solver.cpp:189] Iteration 240, loss = 0.661681
I0828 15:17:29.502596  2844 solver.cpp:204]     Train net output #0: loss = 0.661681 (* 1 = 0.661681 loss)
I0828 15:17:29.502605  2844 solver.cpp:470] Iteration 240, lr = 0.01
I0828 15:17:29.721758  2844 solver.cpp:334] Snapshotting to checkpoints/snapshot_iter_250.caffemodel
I0828 15:17:29.723247  2844 solver.cpp:342] Snapshotting solver state to checkpoints/snapshot_iter_250.solverstate
I0828 15:17:29.723891  2844 solver.cpp:266] Iteration 250, Testing net (#0)
I0828 15:17:30.541066  2844 solver.cpp:315]     Test net output #0: accuracy = 0.6383
I0828 15:17:30.541111  2844 solver.cpp:315]     Test net output #1: loss = 0.639875 (* 1 = 0.639875 loss)
I0828 15:17:30.776960  2844 solver.cpp:189] Iteration 260, loss = 0.592327
I0828 15:17:30.776993  2844 solver.cpp:204]     Train net output #0: loss = 0.592327 (* 1 = 0.592327 loss)
I0828 15:17:30.777001  2844 solver.cpp:470] Iteration 260, lr = 0.01
I0828 15:17:31.232069  2844 solver.cpp:189] Iteration 280, loss = 0.656723
I0828 15:17:31.232105  2844 solver.cpp:204]     Train net output #0: loss = 0.656723 (* 1 = 0.656723 loss)
I0828 15:17:31.232113  2844 solver.cpp:470] Iteration 280, lr = 0.01
I0828 15:17:31.686733  2844 solver.cpp:189] Iteration 300, loss = 0.583883
I0828 15:17:31.686769  2844 solver.cpp:204]     Train net output #0: loss = 0.583883 (* 1 = 0.583883 loss)
I0828 15:17:31.686777  2844 solver.cpp:470] Iteration 300, lr = 0.01
I0828 15:17:32.141825  2844 solver.cpp:189] Iteration 320, loss = 0.659658
I0828 15:17:32.141861  2844 solver.cpp:204]     Train net output #0: loss = 0.659658 (* 1 = 0.659658 loss)
I0828 15:17:32.141870  2844 solver.cpp:470] Iteration 320, lr = 0.01
I0828 15:17:32.596743  2844 solver.cpp:189] Iteration 340, loss = 0.629679
I0828 15:17:32.596779  2844 solver.cpp:204]     Train net output #0: loss = 0.629679 (* 1 = 0.629679 loss)
I0828 15:17:32.596788  2844 solver.cpp:470] Iteration 340, lr = 0.01
I0828 15:17:33.051514  2844 solver.cpp:189] Iteration 360, loss = 0.675324
I0828 15:17:33.051547  2844 solver.cpp:204]     Train net output #0: loss = 0.675324 (* 1 = 0.675324 loss)
I0828 15:17:33.051554  2844 solver.cpp:470] Iteration 360, lr = 0.01
I0828 15:17:33.506405  2844 solver.cpp:189] Iteration 380, loss = 0.644526
I0828 15:17:33.506439  2844 solver.cpp:204]     Train net output #0: loss = 0.644526 (* 1 = 0.644526 loss)
I0828 15:17:33.506486  2844 solver.cpp:470] Iteration 380, lr = 0.01
I0828 15:17:33.961424  2844 solver.cpp:189] Iteration 400, loss = 0.640598
I0828 15:17:33.961457  2844 solver.cpp:204]     Train net output #0: loss = 0.640598 (* 1 = 0.640598 loss)
I0828 15:17:33.961465  2844 solver.cpp:470] Iteration 400, lr = 0.01
I0828 15:17:34.416865  2844 solver.cpp:189] Iteration 420, loss = 0.593572
I0828 15:17:34.416900  2844 solver.cpp:204]     Train net output #0: loss = 0.593572 (* 1 = 0.593572 loss)
I0828 15:17:34.416909  2844 solver.cpp:470] Iteration 420, lr = 0.01
I0828 15:17:34.871320  2844 solver.cpp:189] Iteration 440, loss = 0.649619
I0828 15:17:34.871354  2844 solver.cpp:204]     Train net output #0: loss = 0.649619 (* 1 = 0.649619 loss)
I0828 15:17:34.871363  2844 solver.cpp:470] Iteration 440, lr = 0.01
I0828 15:17:35.326445  2844 solver.cpp:189] Iteration 460, loss = 0.667463
I0828 15:17:35.326479  2844 solver.cpp:204]     Train net output #0: loss = 0.667463 (* 1 = 0.667463 loss)
I0828 15:17:35.326488  2844 solver.cpp:470] Iteration 460, lr = 0.01
I0828 15:17:35.781114  2844 solver.cpp:189] Iteration 480, loss = 0.72425
I0828 15:17:35.781147  2844 solver.cpp:204]     Train net output #0: loss = 0.72425 (* 1 = 0.72425 loss)
I0828 15:17:35.781155  2844 solver.cpp:470] Iteration 480, lr = 0.01
I0828 15:17:36.227710  2844 solver.cpp:334] Snapshotting to checkpoints/snapshot_iter_500.caffemodel
I0828 15:17:36.228917  2844 solver.cpp:342] Snapshotting solver state to checkpoints/snapshot_iter_500.solverstate
I0828 15:17:36.229590  2844 solver.cpp:266] Iteration 500, Testing net (#0)
I0828 15:17:37.046573  2844 solver.cpp:315]     Test net output #0: accuracy = 0.6462
I0828 15:17:37.046612  2844 solver.cpp:315]     Test net output #1: loss = 0.631418 (* 1 = 0.631418 loss)
I0828 15:17:37.055331  2844 solver.cpp:189] Iteration 500, loss = 0.644819
I0828 15:17:37.055359  2844 solver.cpp:204]     Train net output #0: loss = 0.644819 (* 1 = 0.644819 loss)
I0828 15:17:37.055369  2844 solver.cpp:470] Iteration 500, lr = 0.001
I0828 15:17:37.510678  2844 solver.cpp:189] Iteration 520, loss = 0.641667
I0828 15:17:37.510713  2844 solver.cpp:204]     Train net output #0: loss = 0.641667 (* 1 = 0.641667 loss)
I0828 15:17:37.510721  2844 solver.cpp:470] Iteration 520, lr = 0.001
I0828 15:17:37.966207  2844 solver.cpp:189] Iteration 540, loss = 0.596534
I0828 15:17:37.966238  2844 solver.cpp:204]     Train net output #0: loss = 0.596534 (* 1 = 0.596534 loss)
I0828 15:17:37.966246  2844 solver.cpp:470] Iteration 540, lr = 0.001
I0828 15:17:38.421916  2844 solver.cpp:189] Iteration 560, loss = 0.582544
I0828 15:17:38.421958  2844 solver.cpp:204]     Train net output #0: loss = 0.582544 (* 1 = 0.582544 loss)
I0828 15:17:38.421967  2844 solver.cpp:470] Iteration 560, lr = 0.001
I0828 15:17:38.877056  2844 solver.cpp:189] Iteration 580, loss = 0.637338
I0828 15:17:38.877086  2844 solver.cpp:204]     Train net output #0: loss = 0.637338 (* 1 = 0.637338 loss)
I0828 15:17:38.877094  2844 solver.cpp:470] Iteration 580, lr = 0.001
I0828 15:17:39.331491  2844 solver.cpp:189] Iteration 600, loss = 0.668723
I0828 15:17:39.331527  2844 solver.cpp:204]     Train net output #0: loss = 0.668723 (* 1 = 0.668723 loss)
I0828 15:17:39.331535  2844 solver.cpp:470] Iteration 600, lr = 0.001
I0828 15:17:39.786267  2844 solver.cpp:189] Iteration 620, loss = 0.636438
I0828 15:17:39.786300  2844 solver.cpp:204]     Train net output #0: loss = 0.636438 (* 1 = 0.636438 loss)
I0828 15:17:39.786309  2844 solver.cpp:470] Iteration 620, lr = 0.001
I0828 15:17:40.241165  2844 solver.cpp:189] Iteration 640, loss = 0.591464
I0828 15:17:40.241197  2844 solver.cpp:204]     Train net output #0: loss = 0.591464 (* 1 = 0.591464 loss)
I0828 15:17:40.241206  2844 solver.cpp:470] Iteration 640, lr = 0.001
I0828 15:17:40.696089  2844 solver.cpp:189] Iteration 660, loss = 0.556208
I0828 15:17:40.696123  2844 solver.cpp:204]     Train net output #0: loss = 0.556208 (* 1 = 0.556208 loss)
I0828 15:17:40.696172  2844 solver.cpp:470] Iteration 660, lr = 0.001
I0828 15:17:41.150617  2844 solver.cpp:189] Iteration 680, loss = 0.55537
I0828 15:17:41.150670  2844 solver.cpp:204]     Train net output #0: loss = 0.55537 (* 1 = 0.55537 loss)
I0828 15:17:41.150678  2844 solver.cpp:470] Iteration 680, lr = 0.001
I0828 15:17:41.604890  2844 solver.cpp:189] Iteration 700, loss = 0.631278
I0828 15:17:41.604941  2844 solver.cpp:204]     Train net output #0: loss = 0.631278 (* 1 = 0.631278 loss)
I0828 15:17:41.604950  2844 solver.cpp:470] Iteration 700, lr = 0.001
I0828 15:17:42.059494  2844 solver.cpp:189] Iteration 720, loss = 0.590273
I0828 15:17:42.059543  2844 solver.cpp:204]     Train net output #0: loss = 0.590273 (* 1 = 0.590273 loss)
I0828 15:17:42.059552  2844 solver.cpp:470] Iteration 720, lr = 0.001
I0828 15:17:42.513933  2844 solver.cpp:189] Iteration 740, loss = 0.587233
I0828 15:17:42.513983  2844 solver.cpp:204]     Train net output #0: loss = 0.587233 (* 1 = 0.587233 loss)
I0828 15:17:42.513991  2844 solver.cpp:470] Iteration 740, lr = 0.001
I0828 15:17:42.732890  2844 solver.cpp:334] Snapshotting to checkpoints/snapshot_iter_750.caffemodel
I0828 15:17:42.735106  2844 solver.cpp:342] Snapshotting solver state to checkpoints/snapshot_iter_750.solverstate
I0828 15:17:42.736713  2844 solver.cpp:266] Iteration 750, Testing net (#0)
I0828 15:17:43.554018  2844 solver.cpp:315]     Test net output #0: accuracy = 0.6925
I0828 15:17:43.554065  2844 solver.cpp:315]     Test net output #1: loss = 0.589741 (* 1 = 0.589741 loss)
I0828 15:17:43.554075  2844 solver.cpp:253] Optimization Done.
I0828 15:17:43.554078  2844 caffe.cpp:121] Optimization Done.

Q #1:

After 250, 500, and 750 iterations what is your training accuracy?

A: See Answer #1 below

It turns out that we can achieve much higher accuracy and a lower loss against both the training and validation datasets for this task. This means our network is underfitting the data. Essentially, this means that our ability to approximate the function that maps raw images to dog and cat labels is constrained by the number of trainable parameters in our network.

Task 3 - Modifying your Network

Many network configurations that you may have heard of such as Alexnet, GoogLeNet and VGG are significantly larger than the three layer architecture used above and have proven to be very accurate at classifying the ImageNet images. You are going to increase the complexity of this network to improve the accuracy.

There are many knobs that one can turn in choosing a neural network architecture. For example, you can add layers, increased the number of learned weights, change the learning rate or introduce a more complex policy to modify the learning rate as training progresses. We will experiment with some of these modifications to see the effect on classificaiton accuracy.

First modify the network architecture in the train_val2.protoxt file below to increase the number of outputs in the convolutional layers to 64 for layer 1, 256 for layer 2 and 256 for layer 3.

Also modify the learning parameters in the solver2.prototxt file to decrease the learning rate to 0.005, while keeping everything else the same.

If you have any difficulties with making the right modifications you can find the answers in train_val2.answer.prototxt and solver2.answer.prototxt.

Your browser does not support iframes.

Once you have saved your changes we can retrain the network by executing the cell below. This time it will take about 90 seconds to train due to the increased network size.



In [ ]:

    
%%bash
TOOLS=/home/ubuntu/caffe/build/tools
#Train your modified network configuration
$TOOLS/caffe train -gpu 0 -solver src/solver2.prototxt

You have now trained a network architecture with a much larger number of trainable parameters.

Q #2:

Did you notice any performance improvements? If so, what is your accuracy now?

A: See Answer #2 below

Good work, we are underfitting less. Let's make one more modification to the network before training a final time. In the last modification we increased the number of neurons in our convolutional layers. Another way to increase the number of trainable parameters in our network is to make it deeper by adding more layers. This time you should add two new fully-connected layers with 100 outputs each - call them ip2 and ip3. These layers should come after the third pooling layer, pool3, but before the existing fully-connected layer ip1. In Caffe, fully-connected layers are implemented using the inner product layer construct. After the new fully-connected (inner product) layer you also need a ReLU activation layer and a dropout layer. The dropout layer will prevent the network from overfitting, i.e. getting really good at classifying the training data but not able to classify the validation data.

Here is what ip2 should look like when inserted after pool3. You'll need to modify the layer names to create ip3 after ip2.

REMEMBER: Don't forget to change the input of your existing fully-connected layer ip1 to be the output of the new layer ip3!

layer { name: "ip2" type: "InnerProduct" bottom: "pool3" top: "ip2" inner_product_param { num_output: 100 weight_filler { type: "gaussian" std: 0.1 } bias_filler { type: "constant" value:0.9 } } } layer { name: "reluip2" type: "ReLU" bottom: "ip2" top: "ip2" } layer { name: "dropip2" type: "Dropout" bottom: "ip2" top: "ip2" dropout_param { dropout_ratio: 0.5 } }

Modify the train_val3.prototxt file below by adding the new layers and then activate the subsequent cell to train one more time. Again, this model will take slightly longer to train due to the increased size.

Again, if you have any problems modifying the file, you can look at the answer in train_val3.answer.prototxt.

Your browser does not support iframes.



In [6]:

    
%%bash
TOOLS=/home/ubuntu/caffe/build/tools
#Train your modified network configuration
$TOOLS/caffe train -gpu 0 -solver src/solver3.prototxt









    



I0828 15:19:13.167608  4007 caffe.cpp:99] Use GPU with device ID 0
I0828 15:19:13.291929  4007 caffe.cpp:107] Starting Optimization
I0828 15:19:13.292081  4007 solver.cpp:32] Initializing solver from parameters: 
test_iter: 100
test_interval: 250
base_lr: 0.01
display: 20
max_iter: 750
lr_policy: "step"
gamma: 0.1
momentum: 0.9
stepsize: 500
snapshot: 750
snapshot_prefix: "checkpoints/snapshot"
solver_mode: GPU
net: "src/train_val3.prototxt"
solver_type: SGD
I0828 15:19:13.292137  4007 solver.cpp:70] Creating training net from net file: src/train_val3.prototxt
I0828 15:19:13.292619  4007 net.cpp:253] The NetState phase (0) differed from the phase (1) specified by a rule in layer data
I0828 15:19:13.292655  4007 net.cpp:253] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0828 15:19:13.292790  4007 net.cpp:42] Initializing net from parameters: 
state {
  phase: TRAIN
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 32
    mean_file: "mean.binaryproto"
  }
  data_param {
    source: "train_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
    decay_mult: 250
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}
I0828 15:19:13.292922  4007 layer_factory.hpp:74] Creating layer data
I0828 15:19:13.292963  4007 net.cpp:76] Creating Layer data
I0828 15:19:13.292991  4007 net.cpp:334] data -> data
I0828 15:19:13.293045  4007 net.cpp:334] data -> label
I0828 15:19:13.293074  4007 net.cpp:105] Setting up data
I0828 15:19:13.293184  4007 db.cpp:34] Opened lmdb train_lmdb
I0828 15:19:13.293253  4007 data_layer.cpp:67] output data size: 100,3,32,32
I0828 15:19:13.293278  4007 data_transformer.cpp:22] Loading mean file from: mean.binaryproto
I0828 15:19:13.294265  4007 net.cpp:112] Top shape: 100 3 32 32 (307200)
I0828 15:19:13.294303  4007 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:19:13.294317  4007 layer_factory.hpp:74] Creating layer conv1
I0828 15:19:13.294340  4007 net.cpp:76] Creating Layer conv1
I0828 15:19:13.294358  4007 net.cpp:372] conv1 <- data
I0828 15:19:13.294383  4007 net.cpp:334] conv1 -> conv1
I0828 15:19:13.294409  4007 net.cpp:105] Setting up conv1
I0828 15:19:13.347303  4007 net.cpp:112] Top shape: 100 32 32 32 (3276800)
I0828 15:19:13.347380  4007 layer_factory.hpp:74] Creating layer pool1
I0828 15:19:13.347411  4007 net.cpp:76] Creating Layer pool1
I0828 15:19:13.347429  4007 net.cpp:372] pool1 <- conv1
I0828 15:19:13.347445  4007 net.cpp:334] pool1 -> pool1
I0828 15:19:13.347466  4007 net.cpp:105] Setting up pool1
I0828 15:19:13.347636  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.347654  4007 layer_factory.hpp:74] Creating layer relu1
I0828 15:19:13.347668  4007 net.cpp:76] Creating Layer relu1
I0828 15:19:13.347676  4007 net.cpp:372] relu1 <- pool1
I0828 15:19:13.347688  4007 net.cpp:323] relu1 -> pool1 (in-place)
I0828 15:19:13.347702  4007 net.cpp:105] Setting up relu1
I0828 15:19:13.347764  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.347779  4007 layer_factory.hpp:74] Creating layer norm1
I0828 15:19:13.347797  4007 net.cpp:76] Creating Layer norm1
I0828 15:19:13.347815  4007 net.cpp:372] norm1 <- pool1
I0828 15:19:13.347827  4007 net.cpp:334] norm1 -> norm1
I0828 15:19:13.347852  4007 net.cpp:105] Setting up norm1
I0828 15:19:13.347905  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.347920  4007 layer_factory.hpp:74] Creating layer conv2
I0828 15:19:13.347937  4007 net.cpp:76] Creating Layer conv2
I0828 15:19:13.347954  4007 net.cpp:372] conv2 <- norm1
I0828 15:19:13.347967  4007 net.cpp:334] conv2 -> conv2
I0828 15:19:13.347985  4007 net.cpp:105] Setting up conv2
I0828 15:19:13.349048  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.349076  4007 layer_factory.hpp:74] Creating layer relu2
I0828 15:19:13.349089  4007 net.cpp:76] Creating Layer relu2
I0828 15:19:13.349100  4007 net.cpp:372] relu2 <- conv2
I0828 15:19:13.349112  4007 net.cpp:323] relu2 -> conv2 (in-place)
I0828 15:19:13.349125  4007 net.cpp:105] Setting up relu2
I0828 15:19:13.349179  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.349194  4007 layer_factory.hpp:74] Creating layer pool2
I0828 15:19:13.349205  4007 net.cpp:76] Creating Layer pool2
I0828 15:19:13.349215  4007 net.cpp:372] pool2 <- conv2
I0828 15:19:13.349225  4007 net.cpp:334] pool2 -> pool2
I0828 15:19:13.349246  4007 net.cpp:105] Setting up pool2
I0828 15:19:13.349381  4007 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:19:13.349400  4007 layer_factory.hpp:74] Creating layer norm2
I0828 15:19:13.349429  4007 net.cpp:76] Creating Layer norm2
I0828 15:19:13.349447  4007 net.cpp:372] norm2 <- pool2
I0828 15:19:13.349459  4007 net.cpp:334] norm2 -> norm2
I0828 15:19:13.349475  4007 net.cpp:105] Setting up norm2
I0828 15:19:13.349499  4007 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:19:13.349515  4007 layer_factory.hpp:74] Creating layer conv3
I0828 15:19:13.349527  4007 net.cpp:76] Creating Layer conv3
I0828 15:19:13.349539  4007 net.cpp:372] conv3 <- norm2
I0828 15:19:13.349552  4007 net.cpp:334] conv3 -> conv3
I0828 15:19:13.349573  4007 net.cpp:105] Setting up conv3
I0828 15:19:13.351497  4007 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:19:13.351527  4007 layer_factory.hpp:74] Creating layer relu3
I0828 15:19:13.351541  4007 net.cpp:76] Creating Layer relu3
I0828 15:19:13.351552  4007 net.cpp:372] relu3 <- conv3
I0828 15:19:13.351564  4007 net.cpp:323] relu3 -> conv3 (in-place)
I0828 15:19:13.351583  4007 net.cpp:105] Setting up relu3
I0828 15:19:13.351646  4007 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:19:13.351663  4007 layer_factory.hpp:74] Creating layer pool3
I0828 15:19:13.351676  4007 net.cpp:76] Creating Layer pool3
I0828 15:19:13.351690  4007 net.cpp:372] pool3 <- conv3
I0828 15:19:13.351702  4007 net.cpp:334] pool3 -> pool3
I0828 15:19:13.351748  4007 net.cpp:105] Setting up pool3
I0828 15:19:13.351805  4007 net.cpp:112] Top shape: 100 64 4 4 (102400)
I0828 15:19:13.351820  4007 layer_factory.hpp:74] Creating layer ip1
I0828 15:19:13.351840  4007 net.cpp:76] Creating Layer ip1
I0828 15:19:13.351857  4007 net.cpp:372] ip1 <- pool3
I0828 15:19:13.351871  4007 net.cpp:334] ip1 -> ip1
I0828 15:19:13.351891  4007 net.cpp:105] Setting up ip1
I0828 15:19:13.351980  4007 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:19:13.352000  4007 layer_factory.hpp:74] Creating layer loss
I0828 15:19:13.352016  4007 net.cpp:76] Creating Layer loss
I0828 15:19:13.352031  4007 net.cpp:372] loss <- ip1
I0828 15:19:13.352041  4007 net.cpp:372] loss <- label
I0828 15:19:13.352057  4007 net.cpp:334] loss -> loss
I0828 15:19:13.352073  4007 net.cpp:105] Setting up loss
I0828 15:19:13.352095  4007 layer_factory.hpp:74] Creating layer loss
I0828 15:19:13.352180  4007 net.cpp:112] Top shape: 1 1 1 1 (1)
I0828 15:19:13.352195  4007 net.cpp:118]     with loss weight 1
I0828 15:19:13.352234  4007 net.cpp:163] loss needs backward computation.
I0828 15:19:13.352244  4007 net.cpp:163] ip1 needs backward computation.
I0828 15:19:13.352254  4007 net.cpp:163] pool3 needs backward computation.
I0828 15:19:13.352262  4007 net.cpp:163] relu3 needs backward computation.
I0828 15:19:13.352270  4007 net.cpp:163] conv3 needs backward computation.
I0828 15:19:13.352278  4007 net.cpp:163] norm2 needs backward computation.
I0828 15:19:13.352288  4007 net.cpp:163] pool2 needs backward computation.
I0828 15:19:13.352295  4007 net.cpp:163] relu2 needs backward computation.
I0828 15:19:13.352304  4007 net.cpp:163] conv2 needs backward computation.
I0828 15:19:13.352311  4007 net.cpp:163] norm1 needs backward computation.
I0828 15:19:13.352320  4007 net.cpp:163] relu1 needs backward computation.
I0828 15:19:13.352327  4007 net.cpp:163] pool1 needs backward computation.
I0828 15:19:13.352336  4007 net.cpp:163] conv1 needs backward computation.
I0828 15:19:13.352344  4007 net.cpp:165] data does not need backward computation.
I0828 15:19:13.352354  4007 net.cpp:201] This network produces output loss
I0828 15:19:13.352375  4007 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0828 15:19:13.352393  4007 net.cpp:213] Network initialization done.
I0828 15:19:13.352401  4007 net.cpp:214] Memory required for data: 36046004
I0828 15:19:13.352821  4007 solver.cpp:154] Creating test net (#0) specified by net file: src/train_val3.prototxt
I0828 15:19:13.352880  4007 net.cpp:253] The NetState phase (1) differed from the phase (0) specified by a rule in layer data
I0828 15:19:13.353034  4007 net.cpp:42] Initializing net from parameters: 
state {
  phase: TEST
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    crop_size: 32
    mean_file: "mean.binaryproto"
  }
  data_param {
    source: "val_lmdb"
    batch_size: 100
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "pool1"
  top: "pool1"
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 3
    alpha: 5e-05
    beta: 0.75
    norm_region: WITHIN_CHANNEL
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "pool3"
  type: "Pooling"
  bottom: "conv3"
  top: "pool3"
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "pool3"
  top: "ip1"
  param {
    lr_mult: 1
    decay_mult: 250
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "ip1"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
I0828 15:19:13.353165  4007 layer_factory.hpp:74] Creating layer data
I0828 15:19:13.353188  4007 net.cpp:76] Creating Layer data
I0828 15:19:13.353199  4007 net.cpp:334] data -> data
I0828 15:19:13.353217  4007 net.cpp:334] data -> label
I0828 15:19:13.353235  4007 net.cpp:105] Setting up data
I0828 15:19:13.353296  4007 db.cpp:34] Opened lmdb val_lmdb
I0828 15:19:13.353338  4007 data_layer.cpp:67] output data size: 100,3,32,32
I0828 15:19:13.353358  4007 data_transformer.cpp:22] Loading mean file from: mean.binaryproto
I0828 15:19:13.353912  4007 net.cpp:112] Top shape: 100 3 32 32 (307200)
I0828 15:19:13.353929  4007 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:19:13.353938  4007 layer_factory.hpp:74] Creating layer label_data_1_split
I0828 15:19:13.353955  4007 net.cpp:76] Creating Layer label_data_1_split
I0828 15:19:13.353971  4007 net.cpp:372] label_data_1_split <- label
I0828 15:19:13.353986  4007 net.cpp:334] label_data_1_split -> label_data_1_split_0
I0828 15:19:13.354003  4007 net.cpp:334] label_data_1_split -> label_data_1_split_1
I0828 15:19:13.354017  4007 net.cpp:105] Setting up label_data_1_split
I0828 15:19:13.354032  4007 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:19:13.354039  4007 net.cpp:112] Top shape: 100 1 1 1 (100)
I0828 15:19:13.354056  4007 layer_factory.hpp:74] Creating layer conv1
I0828 15:19:13.354070  4007 net.cpp:76] Creating Layer conv1
I0828 15:19:13.354086  4007 net.cpp:372] conv1 <- data
I0828 15:19:13.354104  4007 net.cpp:334] conv1 -> conv1
I0828 15:19:13.354125  4007 net.cpp:105] Setting up conv1
I0828 15:19:13.354488  4007 net.cpp:112] Top shape: 100 32 32 32 (3276800)
I0828 15:19:13.354516  4007 layer_factory.hpp:74] Creating layer pool1
I0828 15:19:13.354534  4007 net.cpp:76] Creating Layer pool1
I0828 15:19:13.354550  4007 net.cpp:372] pool1 <- conv1
I0828 15:19:13.354562  4007 net.cpp:334] pool1 -> pool1
I0828 15:19:13.354578  4007 net.cpp:105] Setting up pool1
I0828 15:19:13.354735  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.354753  4007 layer_factory.hpp:74] Creating layer relu1
I0828 15:19:13.354769  4007 net.cpp:76] Creating Layer relu1
I0828 15:19:13.354781  4007 net.cpp:372] relu1 <- pool1
I0828 15:19:13.354792  4007 net.cpp:323] relu1 -> pool1 (in-place)
I0828 15:19:13.354810  4007 net.cpp:105] Setting up relu1
I0828 15:19:13.354876  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.354890  4007 layer_factory.hpp:74] Creating layer norm1
I0828 15:19:13.354905  4007 net.cpp:76] Creating Layer norm1
I0828 15:19:13.354917  4007 net.cpp:372] norm1 <- pool1
I0828 15:19:13.354928  4007 net.cpp:334] norm1 -> norm1
I0828 15:19:13.354962  4007 net.cpp:105] Setting up norm1
I0828 15:19:13.354991  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.355006  4007 layer_factory.hpp:74] Creating layer conv2
I0828 15:19:13.355018  4007 net.cpp:76] Creating Layer conv2
I0828 15:19:13.355029  4007 net.cpp:372] conv2 <- norm1
I0828 15:19:13.355046  4007 net.cpp:334] conv2 -> conv2
I0828 15:19:13.355068  4007 net.cpp:105] Setting up conv2
I0828 15:19:13.356174  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.356205  4007 layer_factory.hpp:74] Creating layer relu2
I0828 15:19:13.356219  4007 net.cpp:76] Creating Layer relu2
I0828 15:19:13.356230  4007 net.cpp:372] relu2 <- conv2
I0828 15:19:13.356241  4007 net.cpp:323] relu2 -> conv2 (in-place)
I0828 15:19:13.356259  4007 net.cpp:105] Setting up relu2
I0828 15:19:13.356325  4007 net.cpp:112] Top shape: 100 32 16 16 (819200)
I0828 15:19:13.356341  4007 layer_factory.hpp:74] Creating layer pool2
I0828 15:19:13.356358  4007 net.cpp:76] Creating Layer pool2
I0828 15:19:13.356369  4007 net.cpp:372] pool2 <- conv2
I0828 15:19:13.356380  4007 net.cpp:334] pool2 -> pool2
I0828 15:19:13.356400  4007 net.cpp:105] Setting up pool2
I0828 15:19:13.356462  4007 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:19:13.356477  4007 layer_factory.hpp:74] Creating layer norm2
I0828 15:19:13.356487  4007 net.cpp:76] Creating Layer norm2
I0828 15:19:13.356495  4007 net.cpp:372] norm2 <- pool2
I0828 15:19:13.356510  4007 net.cpp:334] norm2 -> norm2
I0828 15:19:13.356523  4007 net.cpp:105] Setting up norm2
I0828 15:19:13.356554  4007 net.cpp:112] Top shape: 100 32 8 8 (204800)
I0828 15:19:13.356571  4007 layer_factory.hpp:74] Creating layer conv3
I0828 15:19:13.356586  4007 net.cpp:76] Creating Layer conv3
I0828 15:19:13.356597  4007 net.cpp:372] conv3 <- norm2
I0828 15:19:13.356609  4007 net.cpp:334] conv3 -> conv3
I0828 15:19:13.356628  4007 net.cpp:105] Setting up conv3
I0828 15:19:13.358608  4007 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:19:13.358635  4007 layer_factory.hpp:74] Creating layer relu3
I0828 15:19:13.358647  4007 net.cpp:76] Creating Layer relu3
I0828 15:19:13.358659  4007 net.cpp:372] relu3 <- conv3
I0828 15:19:13.358675  4007 net.cpp:323] relu3 -> conv3 (in-place)
I0828 15:19:13.358693  4007 net.cpp:105] Setting up relu3
I0828 15:19:13.358834  4007 net.cpp:112] Top shape: 100 64 8 8 (409600)
I0828 15:19:13.358852  4007 layer_factory.hpp:74] Creating layer pool3
I0828 15:19:13.358870  4007 net.cpp:76] Creating Layer pool3
I0828 15:19:13.358882  4007 net.cpp:372] pool3 <- conv3
I0828 15:19:13.358894  4007 net.cpp:334] pool3 -> pool3
I0828 15:19:13.358913  4007 net.cpp:105] Setting up pool3
I0828 15:19:13.358978  4007 net.cpp:112] Top shape: 100 64 4 4 (102400)
I0828 15:19:13.358993  4007 layer_factory.hpp:74] Creating layer ip1
I0828 15:19:13.359004  4007 net.cpp:76] Creating Layer ip1
I0828 15:19:13.359012  4007 net.cpp:372] ip1 <- pool3
I0828 15:19:13.359030  4007 net.cpp:334] ip1 -> ip1
I0828 15:19:13.359048  4007 net.cpp:105] Setting up ip1
I0828 15:19:13.359148  4007 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:19:13.359169  4007 layer_factory.hpp:74] Creating layer ip1_ip1_0_split
I0828 15:19:13.359180  4007 net.cpp:76] Creating Layer ip1_ip1_0_split
I0828 15:19:13.359195  4007 net.cpp:372] ip1_ip1_0_split <- ip1
I0828 15:19:13.359207  4007 net.cpp:334] ip1_ip1_0_split -> ip1_ip1_0_split_0
I0828 15:19:13.359222  4007 net.cpp:334] ip1_ip1_0_split -> ip1_ip1_0_split_1
I0828 15:19:13.359236  4007 net.cpp:105] Setting up ip1_ip1_0_split
I0828 15:19:13.359246  4007 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:19:13.359259  4007 net.cpp:112] Top shape: 100 2 1 1 (200)
I0828 15:19:13.359267  4007 layer_factory.hpp:74] Creating layer loss
I0828 15:19:13.359288  4007 net.cpp:76] Creating Layer loss
I0828 15:19:13.359302  4007 net.cpp:372] loss <- ip1_ip1_0_split_0
I0828 15:19:13.359311  4007 net.cpp:372] loss <- label_data_1_split_0
I0828 15:19:13.359329  4007 net.cpp:334] loss -> loss
I0828 15:19:13.359343  4007 net.cpp:105] Setting up loss
I0828 15:19:13.359374  4007 layer_factory.hpp:74] Creating layer loss
I0828 15:19:13.359443  4007 net.cpp:112] Top shape: 1 1 1 1 (1)
I0828 15:19:13.359458  4007 net.cpp:118]     with loss weight 1
I0828 15:19:13.359472  4007 layer_factory.hpp:74] Creating layer accuracy
I0828 15:19:13.359490  4007 net.cpp:76] Creating Layer accuracy
I0828 15:19:13.359505  4007 net.cpp:372] accuracy <- ip1_ip1_0_split_1
I0828 15:19:13.359515  4007 net.cpp:372] accuracy <- label_data_1_split_1
I0828 15:19:13.359530  4007 net.cpp:334] accuracy -> accuracy
I0828 15:19:13.359544  4007 net.cpp:105] Setting up accuracy
I0828 15:19:13.359565  4007 net.cpp:112] Top shape: 1 1 1 1 (1)
I0828 15:19:13.359575  4007 net.cpp:165] accuracy does not need backward computation.
I0828 15:19:13.359585  4007 net.cpp:163] loss needs backward computation.
I0828 15:19:13.359592  4007 net.cpp:163] ip1_ip1_0_split needs backward computation.
I0828 15:19:13.359602  4007 net.cpp:163] ip1 needs backward computation.
I0828 15:19:13.359611  4007 net.cpp:163] pool3 needs backward computation.
I0828 15:19:13.359619  4007 net.cpp:163] relu3 needs backward computation.
I0828 15:19:13.359627  4007 net.cpp:163] conv3 needs backward computation.
I0828 15:19:13.359637  4007 net.cpp:163] norm2 needs backward computation.
I0828 15:19:13.359643  4007 net.cpp:163] pool2 needs backward computation.
I0828 15:19:13.359658  4007 net.cpp:163] relu2 needs backward computation.
I0828 15:19:13.359665  4007 net.cpp:163] conv2 needs backward computation.
I0828 15:19:13.359673  4007 net.cpp:163] norm1 needs backward computation.
I0828 15:19:13.359683  4007 net.cpp:163] relu1 needs backward computation.
I0828 15:19:13.359690  4007 net.cpp:163] pool1 needs backward computation.
I0828 15:19:13.359699  4007 net.cpp:163] conv1 needs backward computation.
I0828 15:19:13.359707  4007 net.cpp:165] label_data_1_split does not need backward computation.
I0828 15:19:13.359716  4007 net.cpp:165] data does not need backward computation.
I0828 15:19:13.359724  4007 net.cpp:201] This network produces output accuracy
I0828 15:19:13.359732  4007 net.cpp:201] This network produces output loss
I0828 15:19:13.359757  4007 net.cpp:446] Collecting Learning Rate and Weight Decay.
I0828 15:19:13.359774  4007 net.cpp:213] Network initialization done.
I0828 15:19:13.359781  4007 net.cpp:214] Memory required for data: 36048408
I0828 15:19:13.359834  4007 solver.cpp:42] Solver scaffolding done.
I0828 15:19:13.359870  4007 solver.cpp:222] Solving 
I0828 15:19:13.359885  4007 solver.cpp:223] Learning Rate Policy: step
I0828 15:19:13.359900  4007 solver.cpp:266] Iteration 0, Testing net (#0)
I0828 15:19:14.176769  4007 solver.cpp:315]     Test net output #0: accuracy = 0.5053
I0828 15:19:14.176810  4007 solver.cpp:315]     Test net output #1: loss = 0.693154 (* 1 = 0.693154 loss)
I0828 15:19:14.188190  4007 solver.cpp:189] Iteration 0, loss = 0.693131
I0828 15:19:14.188225  4007 solver.cpp:204]     Train net output #0: loss = 0.693131 (* 1 = 0.693131 loss)
I0828 15:19:14.188253  4007 solver.cpp:470] Iteration 0, lr = 0.01
I0828 15:19:14.640431  4007 solver.cpp:189] Iteration 20, loss = 0.672216
I0828 15:19:14.640485  4007 solver.cpp:204]     Train net output #0: loss = 0.672216 (* 1 = 0.672216 loss)
I0828 15:19:14.640499  4007 solver.cpp:470] Iteration 20, lr = 0.01
I0828 15:19:15.093814  4007 solver.cpp:189] Iteration 40, loss = 0.681186
I0828 15:19:15.093853  4007 solver.cpp:204]     Train net output #0: loss = 0.681186 (* 1 = 0.681186 loss)
I0828 15:19:15.093868  4007 solver.cpp:470] Iteration 40, lr = 0.01
I0828 15:19:15.547363  4007 solver.cpp:189] Iteration 60, loss = 0.640645
I0828 15:19:15.547401  4007 solver.cpp:204]     Train net output #0: loss = 0.640645 (* 1 = 0.640645 loss)
I0828 15:19:15.547415  4007 solver.cpp:470] Iteration 60, lr = 0.01
I0828 15:19:16.000722  4007 solver.cpp:189] Iteration 80, loss = 0.679133
I0828 15:19:16.000762  4007 solver.cpp:204]     Train net output #0: loss = 0.679133 (* 1 = 0.679133 loss)
I0828 15:19:16.000777  4007 solver.cpp:470] Iteration 80, lr = 0.01
I0828 15:19:16.454226  4007 solver.cpp:189] Iteration 100, loss = 0.6818
I0828 15:19:16.454296  4007 solver.cpp:204]     Train net output #0: loss = 0.6818 (* 1 = 0.6818 loss)
I0828 15:19:16.454311  4007 solver.cpp:470] Iteration 100, lr = 0.01
I0828 15:19:16.907637  4007 solver.cpp:189] Iteration 120, loss = 0.630845
I0828 15:19:16.907675  4007 solver.cpp:204]     Train net output #0: loss = 0.630845 (* 1 = 0.630845 loss)
I0828 15:19:16.907690  4007 solver.cpp:470] Iteration 120, lr = 0.01
I0828 15:19:17.360930  4007 solver.cpp:189] Iteration 140, loss = 0.62592
I0828 15:19:17.360991  4007 solver.cpp:204]     Train net output #0: loss = 0.62592 (* 1 = 0.62592 loss)
I0828 15:19:17.361004  4007 solver.cpp:470] Iteration 140, lr = 0.01
I0828 15:19:17.814648  4007 solver.cpp:189] Iteration 160, loss = 0.678799
I0828 15:19:17.814707  4007 solver.cpp:204]     Train net output #0: loss = 0.678799 (* 1 = 0.678799 loss)
I0828 15:19:17.814723  4007 solver.cpp:470] Iteration 160, lr = 0.01
I0828 15:19:18.268667  4007 solver.cpp:189] Iteration 180, loss = 0.636126
I0828 15:19:18.268714  4007 solver.cpp:204]     Train net output #0: loss = 0.636126 (* 1 = 0.636126 loss)
I0828 15:19:18.268728  4007 solver.cpp:470] Iteration 180, lr = 0.01
I0828 15:19:18.721984  4007 solver.cpp:189] Iteration 200, loss = 0.706443
I0828 15:19:18.722028  4007 solver.cpp:204]     Train net output #0: loss = 0.706443 (* 1 = 0.706443 loss)
I0828 15:19:18.722041  4007 solver.cpp:470] Iteration 200, lr = 0.01
I0828 15:19:19.175544  4007 solver.cpp:189] Iteration 220, loss = 0.677156
I0828 15:19:19.175583  4007 solver.cpp:204]     Train net output #0: loss = 0.677156 (* 1 = 0.677156 loss)
I0828 15:19:19.175598  4007 solver.cpp:470] Iteration 220, lr = 0.01
I0828 15:19:19.628813  4007 solver.cpp:189] Iteration 240, loss = 0.644259
I0828 15:19:19.628854  4007 solver.cpp:204]     Train net output #0: loss = 0.644259 (* 1 = 0.644259 loss)
I0828 15:19:19.628867  4007 solver.cpp:470] Iteration 240, lr = 0.01
I0828 15:19:19.833181  4007 solver.cpp:266] Iteration 250, Testing net (#0)
I0828 15:19:20.663952  4007 solver.cpp:315]     Test net output #0: accuracy = 0.6406
I0828 15:19:20.663996  4007 solver.cpp:315]     Test net output #1: loss = 0.638854 (* 1 = 0.638854 loss)
I0828 15:19:20.899487  4007 solver.cpp:189] Iteration 260, loss = 0.61933
I0828 15:19:20.899524  4007 solver.cpp:204]     Train net output #0: loss = 0.61933 (* 1 = 0.61933 loss)
I0828 15:19:20.899538  4007 solver.cpp:470] Iteration 260, lr = 0.01
I0828 15:19:21.353091  4007 solver.cpp:189] Iteration 280, loss = 0.649318
I0828 15:19:21.353134  4007 solver.cpp:204]     Train net output #0: loss = 0.649318 (* 1 = 0.649318 loss)
I0828 15:19:21.353148  4007 solver.cpp:470] Iteration 280, lr = 0.01
I0828 15:19:21.806833  4007 solver.cpp:189] Iteration 300, loss = 0.666252
I0828 15:19:21.806869  4007 solver.cpp:204]     Train net output #0: loss = 0.666252 (* 1 = 0.666252 loss)
I0828 15:19:21.806884  4007 solver.cpp:470] Iteration 300, lr = 0.01
I0828 15:19:22.260231  4007 solver.cpp:189] Iteration 320, loss = 0.641104
I0828 15:19:22.260270  4007 solver.cpp:204]     Train net output #0: loss = 0.641104 (* 1 = 0.641104 loss)
I0828 15:19:22.260284  4007 solver.cpp:470] Iteration 320, lr = 0.01
I0828 15:19:22.713723  4007 solver.cpp:189] Iteration 340, loss = 0.574239
I0828 15:19:22.713765  4007 solver.cpp:204]     Train net output #0: loss = 0.574239 (* 1 = 0.574239 loss)
I0828 15:19:22.713779  4007 solver.cpp:470] Iteration 340, lr = 0.01
I0828 15:19:23.169340  4007 solver.cpp:189] Iteration 360, loss = 0.656269
I0828 15:19:23.169380  4007 solver.cpp:204]     Train net output #0: loss = 0.656269 (* 1 = 0.656269 loss)
I0828 15:19:23.169394  4007 solver.cpp:470] Iteration 360, lr = 0.01
I0828 15:19:23.622737  4007 solver.cpp:189] Iteration 380, loss = 0.611117
I0828 15:19:23.622777  4007 solver.cpp:204]     Train net output #0: loss = 0.611117 (* 1 = 0.611117 loss)
I0828 15:19:23.622792  4007 solver.cpp:470] Iteration 380, lr = 0.01
I0828 15:19:24.076351  4007 solver.cpp:189] Iteration 400, loss = 0.67366
I0828 15:19:24.076392  4007 solver.cpp:204]     Train net output #0: loss = 0.67366 (* 1 = 0.67366 loss)
I0828 15:19:24.076453  4007 solver.cpp:470] Iteration 400, lr = 0.01
I0828 15:19:24.530102  4007 solver.cpp:189] Iteration 420, loss = 0.551614
I0828 15:19:24.530143  4007 solver.cpp:204]     Train net output #0: loss = 0.551614 (* 1 = 0.551614 loss)
I0828 15:19:24.530156  4007 solver.cpp:470] Iteration 420, lr = 0.01
I0828 15:19:24.983788  4007 solver.cpp:189] Iteration 440, loss = 0.65293
I0828 15:19:24.983824  4007 solver.cpp:204]     Train net output #0: loss = 0.65293 (* 1 = 0.65293 loss)
I0828 15:19:24.983839  4007 solver.cpp:470] Iteration 440, lr = 0.01
I0828 15:19:25.437607  4007 solver.cpp:189] Iteration 460, loss = 0.635089
I0828 15:19:25.437649  4007 solver.cpp:204]     Train net output #0: loss = 0.635089 (* 1 = 0.635089 loss)
I0828 15:19:25.437662  4007 solver.cpp:470] Iteration 460, lr = 0.01
I0828 15:19:25.891010  4007 solver.cpp:189] Iteration 480, loss = 0.625452
I0828 15:19:25.891047  4007 solver.cpp:204]     Train net output #0: loss = 0.625452 (* 1 = 0.625452 loss)
I0828 15:19:25.891062  4007 solver.cpp:470] Iteration 480, lr = 0.01
I0828 15:19:26.322219  4007 solver.cpp:266] Iteration 500, Testing net (#0)
I0828 15:19:27.156536  4007 solver.cpp:315]     Test net output #0: accuracy = 0.6753
I0828 15:19:27.156599  4007 solver.cpp:315]     Test net output #1: loss = 0.610346 (* 1 = 0.610346 loss)
I0828 15:19:27.165695  4007 solver.cpp:189] Iteration 500, loss = 0.593774
I0828 15:19:27.165726  4007 solver.cpp:204]     Train net output #0: loss = 0.593774 (* 1 = 0.593774 loss)
I0828 15:19:27.165743  4007 solver.cpp:470] Iteration 500, lr = 0.001
I0828 15:19:27.620335  4007 solver.cpp:189] Iteration 520, loss = 0.58093
I0828 15:19:27.620405  4007 solver.cpp:204]     Train net output #0: loss = 0.58093 (* 1 = 0.58093 loss)
I0828 15:19:27.620426  4007 solver.cpp:470] Iteration 520, lr = 0.001
I0828 15:19:28.074036  4007 solver.cpp:189] Iteration 540, loss = 0.568358
I0828 15:19:28.074097  4007 solver.cpp:204]     Train net output #0: loss = 0.568358 (* 1 = 0.568358 loss)
I0828 15:19:28.074111  4007 solver.cpp:470] Iteration 540, lr = 0.001
I0828 15:19:28.529259  4007 solver.cpp:189] Iteration 560, loss = 0.56597
I0828 15:19:28.529323  4007 solver.cpp:204]     Train net output #0: loss = 0.56597 (* 1 = 0.56597 loss)
I0828 15:19:28.529340  4007 solver.cpp:470] Iteration 560, lr = 0.001
I0828 15:19:28.983772  4007 solver.cpp:189] Iteration 580, loss = 0.604554
I0828 15:19:28.983836  4007 solver.cpp:204]     Train net output #0: loss = 0.604554 (* 1 = 0.604554 loss)
I0828 15:19:28.983855  4007 solver.cpp:470] Iteration 580, lr = 0.001
I0828 15:19:29.437466  4007 solver.cpp:189] Iteration 600, loss = 0.680077
I0828 15:19:29.437535  4007 solver.cpp:204]     Train net output #0: loss = 0.680077 (* 1 = 0.680077 loss)
I0828 15:19:29.437552  4007 solver.cpp:470] Iteration 600, lr = 0.001
I0828 15:19:29.892958  4007 solver.cpp:189] Iteration 620, loss = 0.687161
I0828 15:19:29.893019  4007 solver.cpp:204]     Train net output #0: loss = 0.687161 (* 1 = 0.687161 loss)
I0828 15:19:29.893034  4007 solver.cpp:470] Iteration 620, lr = 0.001
I0828 15:19:30.347157  4007 solver.cpp:189] Iteration 640, loss = 0.530204
I0828 15:19:30.347220  4007 solver.cpp:204]     Train net output #0: loss = 0.530204 (* 1 = 0.530204 loss)
I0828 15:19:30.347235  4007 solver.cpp:470] Iteration 640, lr = 0.001
I0828 15:19:30.802021  4007 solver.cpp:189] Iteration 660, loss = 0.538698
I0828 15:19:30.802083  4007 solver.cpp:204]     Train net output #0: loss = 0.538698 (* 1 = 0.538698 loss)
I0828 15:19:30.802098  4007 solver.cpp:470] Iteration 660, lr = 0.001
I0828 15:19:31.257261  4007 solver.cpp:189] Iteration 680, loss = 0.588855
I0828 15:19:31.257323  4007 solver.cpp:204]     Train net output #0: loss = 0.588855 (* 1 = 0.588855 loss)
I0828 15:19:31.257344  4007 solver.cpp:470] Iteration 680, lr = 0.001
I0828 15:19:31.712052  4007 solver.cpp:189] Iteration 700, loss = 0.582341
I0828 15:19:31.712115  4007 solver.cpp:204]     Train net output #0: loss = 0.582341 (* 1 = 0.582341 loss)
I0828 15:19:31.712177  4007 solver.cpp:470] Iteration 700, lr = 0.001
I0828 15:19:32.164929  4007 solver.cpp:189] Iteration 720, loss = 0.589264
I0828 15:19:32.164993  4007 solver.cpp:204]     Train net output #0: loss = 0.589264 (* 1 = 0.589264 loss)
I0828 15:19:32.165006  4007 solver.cpp:470] Iteration 720, lr = 0.001
I0828 15:19:32.619052  4007 solver.cpp:189] Iteration 740, loss = 0.587964
I0828 15:19:32.619112  4007 solver.cpp:204]     Train net output #0: loss = 0.587964 (* 1 = 0.587964 loss)
I0828 15:19:32.619127  4007 solver.cpp:470] Iteration 740, lr = 0.001
I0828 15:19:32.837895  4007 solver.cpp:334] Snapshotting to checkpoints/snapshot_iter_750.caffemodel
I0828 15:19:32.839469  4007 solver.cpp:342] Snapshotting solver state to checkpoints/snapshot_iter_750.solverstate
I0828 15:19:32.840242  4007 solver.cpp:266] Iteration 750, Testing net (#0)
I0828 15:19:33.657124  4007 solver.cpp:315]     Test net output #0: accuracy = 0.7072
I0828 15:19:33.657169  4007 solver.cpp:315]     Test net output #1: loss = 0.580492 (* 1 = 0.580492 loss)
I0828 15:19:33.657181  4007 solver.cpp:253] Optimization Done.
I0828 15:19:33.657187  4007 caffe.cpp:121] Optimization Done.

Q #3:

What is the classification accuracy now?

A: See Answer #3 below

We have now seen two ways in which a model can be modified in Caffe to improve classication accuracy - adding more layers or increasing the number of neurons in existing layers. There are many more ways to improve classification accuracy for you to explore and we will cover some of them in subsequent classes.

Q #4:

What else might you do to improve classification accuracy?

A: See Answer #4 below

Task 4 - Classification

We will now learn how to deploy our final trained network to perform classification of new images. For all of the training above we used the Caffe command line interface tools. For classification we are going to use Caffe's Python interface. We will first import the Python libraries we require and create some variables specifying the locations of important files.



In [7]:

    
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline

plt.rcParams['figure.figsize'] = (6.0, 4.0)

# Make sure that caffe is on the python path:
#caffe_root = '../'  # this file is expected to be in {caffe_root}/examples
import sys
#sys.path.insert(0, caffe_root + 'python')

import caffe

# Set the right path to your model definition file, pretrained model weights,
# and the image you would like to classify.
MODEL_FILE = '/home/ubuntu/notebook/src/deploy3.prototxt'
PRETRAINED = '/home/ubuntu/notebook/checkpoints/pretrained.caffemodel'
IMAGE_FILE1 = '/home/ubuntu/data/dog_cat/dog_cat_32/test/cat_236.jpg'
IMAGE_FILE2 = '/home/ubuntu/data/dog_cat/dog_cat_32/test/dog_4987.jpg'
LABELS_FILE = '/home/ubuntu/data/dog_cat/dog_cat_32/labels.txt'
labels=open(LABELS_FILE,'r').readlines()

Loading a network is easy. The caffe.Classifier method takes care of everything. Note the arguments for configuring input preprocessing: mean subtraction switched on by giving a mean array, input channel swapping takes care of mapping RGB into the reference ImageNet model's BGR order, and raw scaling multiplies the feature scale from the input [0,1] to the ImageNet model's [0,255].



In [8]:

    
# First we must import the mean.binaryproto mean image into a numpy array
blob = caffe.proto.caffe_pb2.BlobProto()
data = open( 'mean.binaryproto' , 'rb' ).read()
blob.ParseFromString(data)
arr = np.array( caffe.io.blobproto_to_array(blob) )
out = arr[0]



In [9]:

    
# Load our pretrained model
net = caffe.Classifier(MODEL_FILE, PRETRAINED,
                       mean=out,
                       channel_swap=(2,1,0),
                       raw_scale=255,
                       image_dims=(32, 32))
net2 = caffe.Classifier(MODEL_FILE, PRETRAINED,
                       mean=out,
                       channel_swap=(2,1,0),
                       raw_scale=255,
                       image_dims=(32, 32))

Let's take a look at our example images with Caffe's image loading helper. We are going to classify 2 different images, one from each category.



In [10]:

    
# Load two test images
input_image1 = caffe.io.load_image(IMAGE_FILE1)
input_image2 = caffe.io.load_image(IMAGE_FILE2)
# Display the test images
plt.subplot(1,4,1).imshow(input_image1),plt.title('Cat')
plt.subplot(1,4,2).imshow(input_image2),plt.title('Dog')









    Out[10]:





(<matplotlib.image.AxesImage at 0x7f80ceec4fd0>,
 <matplotlib.text.Text at 0x7f80cef1d710>)

Time to classify. The default is to actually do 10 predictions, cropping the center and corners of the image as well as their mirrored versions, and average over the predictions. This approach typically leads to better classification accuracy as it is more robust to object translation within in the image.



In [ ]:

    
prediction1 = net.predict([input_image1]) 
prediction2 = net2.predict([input_image2])
width=0.1
plt.bar(np.arange(2),prediction1[0],width,color='blue',label='Cat')
plt.bar(np.arange(2)+width,prediction2[0],width,color='green',label='Dog')
plt.xticks(np.arange(2)+width,labels)
plt.ylabel('Class Probability')
plt.legend()

You can see what class the neural network believes each image is. In the cases above the highest probabilities are given to the correct class for both test images.

Task 5 - Filter Visualization

This portion of the lesson follows the filter visualization example provided with Caffe and the DeCAF visualizations originally developed by Yangqing Jia.

In this task you are going to visualize the network's response from the two images classified in Task 2. In a convolutional layer in a deep neural network the weights that connect the layer inputs to the outputs form a four-dimensional tensor. You can think of this tensor as being a collection of small two-dimensional arrays with multiple channels (you could also think of each of these as a three-dimensional array). It is these arrays which are convolved with the input to the layer to produce the layers output activations. In our final network above the first convolutional layer had 64 three-channel 5x5 weights. These small arrays are often referred to as (convolutional) filters. When we convolve these filters with the layer input we obtain what are often referred to as feature maps. In this task we will treat these network activations as images as this is often useful for understanding what the network has actually learned during the training process.



In [ ]:

    
#View a list of the network layer outputs and their dimensions
[(k, v.data.shape) for k, v in net.blobs.items()]

First you are going to visualize the filters of the first layer.



In [ ]:

    
# take an array of shape (n, height, width) or (n, height, width, channels)
# and visualize each (height, width) thing in a grid of size approx. sqrt(n) by sqrt(n)
def vis_square(data, padsize=1, padval=0):
    data -= data.min()
    data /= data.max()
    
    # force the number of filters to be square
    n = int(np.ceil(np.sqrt(data.shape[0])))
    padding = ((0, n ** 2 - data.shape[0]), (0, padsize), (0, padsize)) + ((0, 0),) * (data.ndim - 3)
    data = np.pad(data, padding, mode='constant', constant_values=(padval, padval))
    
    # tile the filters into an image
    data = data.reshape((n, n) + data.shape[1:]).transpose((0, 2, 1, 3) + tuple(range(4, data.ndim + 1)))
    data = data.reshape((n * data.shape[1], n * data.shape[3]) + data.shape[4:])
    
    plt.imshow(data)

    # the parameters are a list of [weights, biases]

###########################################################################################################################
#TODO: All of the weights of the first layer are plotted below. Modify the filters parameter so that you can view some of 
#the weights more closely. Try looking at the first 10 and 20 filters.  
##########################################################################################################################
plt.rcParams['figure.figsize'] = (25.0, 20.0)
filters = net.params['conv1'][0].data
vis_square(filters.transpose(0, 2, 3, 1))

Now you are going to view the feature maps of the two input images after they have been processed by the first convolutional layer. Feel free to modify the feat variables so that you can take a closer look as some of the feature maps more closely. Notice the visual similarities and differences between the features maps of both of these images.



In [ ]:

    
feat = net.blobs['conv1'].data[0,:64]
plt.subplot(1,2,1),plt.title('Cat')
vis_square(feat, padval=1)
net.blobs['conv1'].data.shape
feat2 = net2.blobs['conv1'].data[0, :64]
plt.subplot(1,2,2),plt.title('Dog')
vis_square(feat2, padval=1)

Do you see many differences between the networks responses for the two different input images?

Now view the feature maps from the 2nd convolutional layer.



In [ ]:

    
feat = net.blobs['conv2'].data[0]
plt.subplot(1,2,1),plt.title('Cat')
vis_square(feat, padval=1)
feat2 = net2.blobs['conv2'].data[0]
plt.subplot(1,2,2),plt.title('Dog')
vis_square(feat2, padval=1)

Now view the feature map of the last convolutional layer and then the pooled version.



In [ ]:

    
feat = net.blobs['conv3'].data[0]
plt.subplot(1,2,1),plt.title('Cat')
vis_square(feat, padval=0.5)
feat2 = net2.blobs['conv3'].data[0]
plt.subplot(1,2,2),plt.title('Dog')
vis_square(feat2, padval=0.5)



In [ ]:

    
feat = net.blobs['pool3'].data[0,:100]
plt.subplot(1,2,1),plt.title('Cat')
vis_square(feat, padval=1)
feat2 = net2.blobs['pool3'].data[0,:100]
plt.subplot(1,2,2),plt.title('Dog')
vis_square(feat2, padval=1)

Now view the neuron activations for the fully-connected layer ip2. You will notice that the neurons being activated by the two input images are very different. This is good as it means the network is effectively differentiating the two images at the higher layers in the network.



In [ ]:

    
feat = net.blobs['ip2'].data[0]
plt.plot(feat.flat,label='Cat')
plt.legend()
plt.show()
##########################################################################################################################
# Plot ip2 for the input image of the Dog image. Compare the Differences 
feat2 = net2.blobs['ip2'].data[0]
plt.plot(feat2.flat, label='Dog')
plt.legend()

Task 6 - Classifying Many Images

A text file containing a list of 20 images being stored on this host machine is provided. By executing the cell below you will classify all of these images with the network you trained above and calculate the mean accuracy.



In [ ]:

    
TEST_FILE=open('/home/ubuntu/data/dog_cat/dog_cat_32/val/test.txt','r') 
TEST_IMAGES=TEST_FILE.readlines()
PredictScore=np.zeros((len(TEST_IMAGES),1))
for i in range(len(TEST_IMAGES)):
    IMAGE_FILE='/home/ubuntu/data/dog_cat/dog_cat_32/val/' + TEST_IMAGES[i].split()[0]
    CATEGORY=TEST_IMAGES[i].split()[1]
    #print TEST_IMAGES[i]
    input_test = caffe.io.load_image(IMAGE_FILE)
    prediction = net2.predict([input_test])  
    #print prediction[0]
    if prediction[0].argmax()==int(CATEGORY):
        print 'CORRECT -- predicted class for ', str(IMAGE_FILE[62:]),':', prediction[0].argmax(), 'true class:', CATEGORY
    elif prediction[0].argmax()!=int(CATEGORY):
        print 'WRONG -- predicted class ', str(IMAGE_FILE[62:]),':', prediction[0].argmax(), 'true class:', CATEGORY            
    PredictScore[i]=int(prediction[0].argmax()==int(CATEGORY))
Accuracy=np.sum(PredictScore)/len(PredictScore)
print 'Prediction accuracy with this image set is', np.sum(PredictScore)/len(PredictScore)

Post-Lab Summary

If you would like to download this lab for later viewing, it is recommend you go to your browsers File menu (not the Jupyter notebook file menu) and save the complete web page. This will ensure the images are copied down as well.

More information

For more information on using Caffe, visit http://caffe.berkeleyvision.org/. A description of the framework, how to use it, and plenty of examples similar to this lesson are posted.

To learn more about these other topics, please visit:

GPU accelerated machine learning: http://www.nvidia.com/object/machine-learning.html
Theano: http://deeplearning.net/software/theano/
Torch: http://torch.ch/
DIGITS: https://developer.nvidia.com/digits
cuDNN: https://developer.nvidia.com/cudnn

Deep Learning Lab Series

Make sure to check out the rest of the classes in this Deep Learning lab series. You can find them here.

Lab Answers

Answer #1

About 60% after 250, 65% after 500 and 69% after 750

Return to question

Answer #2

You should see an improvement to over 70%

Return to question

Answer #3

Over 73%

Return to question

Answer #4

Increase network size, change activation functions, modify training algorithm, data augmentation

Return to question