Lab 5.3 - CNN for cats and dogs

Now that we have imported our custom image data, formatted them as proper feature and target numpy arrays, and split them between individual training and test data sets, we can use Keras to create another Convolutional Neural Network (CNN) and train it to classify images of cats and dogs (the holy grail of Arificial Intelligence!)

First, let's use the pickle library to bring in the data sets we generated in the previous part of the lab:


In [ ]:
import pickle

pickle_file = '-catsdogs.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    X_train = save['X_train']
    y_train = save['y_train']
    X_test = save['X_test']
    y_test = save['y_test']
    del save  # hint to help gc free up memory
    print('Training set', X_train.shape, y_train.shape)
    print('Test set', X_test.shape, y_test.shape)

Now that the data is imported, go through and implement the CNN from scratch based on the one developed in Lab 5.1.

Experiment with different hyper-parameters as well as different architectures for your network. If you're not getting the results you want try a deeper network by adding more convolutional or fully connected layers. Remember that with CNN's, all convolutional layers should go in the beginning, and the fully connected layers should go at the end. You can also try to make the network 'wider' by adding more depth to each convolutional layer or more neurons to the fully connected layers. If you are noticing problems with over-fitting you can experiment with larger dropout rates or other regularlization strategies.

You can also experiment with filters of a larger size in the convolutional layers. Larger filters will capture more information in the image at the expense of longer training times. For more information about the tradeoffs between depth and width in a CNN, you can read this paper:

https://arxiv.org/pdf/1409.1556.pdf

Known as the 'VGG paper', this research is currently one of the state-of-the-art benchmarks for image recognition using CNN's. The authors' hypothesis for the paper was that depth in a CNN (total number of layers) is much more important than the size of the filters or the depth within each convolutional layer. Thus they used very small filter sizes (only 3x3) but focused on making the networks as deep as possible. If you are still getting poor results and want to develop a deeper network, a good place to start would be to try to implement one of the networks from the 'VGG paper'. The deepest ones will probably take too long to train without having a dedicated graphics card, but you should be able to train one of the medium ones (for example network 'B') using just the virtual machine developed in the first lab.

Just like when we initially loaded the data, with large networks you again run the risk of overloading your RAM memory, which will either throw an error during model compilation or training, or cause your Python kernel to crash. If you run into these issues, try reducing the complexity of your network (either using less layers, or reducing the depth of each layer) or using a smaller mini-batch size. If you are using the virtual machine, your RAM will be quite limited so you will not be able to train very deep or complex networks. This is ok for the demonstration purposes of the class, but for your own work you may want to use a native installation of Python and the related libraries so that you can use the full potential of your computer.

Ofcourse classifying dogs and cats is a much more difficult problem than digit classification, so you should not expect to reach the same level of performance we did before. With an average sized network training over night on the virtual machine, you should be able to get at least 80% accuracy on the test dataset. Once you get a result you like, submit your work on this file as a pull request back to the main project.


In [ ]:
## implement your CNN starting here.