In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
%matplotlib inline
%pylab inline


Populating the interactive namespace from numpy and matplotlib

In [3]:
import matplotlib.pylab as plt
import numpy as np

In [4]:
from distutils.version import StrictVersion

In [5]:
import sklearn
print(sklearn.__version__)

assert StrictVersion(sklearn.__version__ ) >= StrictVersion('0.18.1')


0.18.1

In [6]:
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
print(tf.__version__)

assert StrictVersion(tf.__version__) >= StrictVersion('1.1.0')


1.2.1

In [7]:
import keras
print(keras.__version__)

assert StrictVersion(keras.__version__) >= StrictVersion('2.0.0')


Using TensorFlow backend.
2.0.6

This script goes along the blog post "Building powerful image classification models using very little data" from blog.keras.io. It uses data that can be downloaded at: https://www.kaggle.com/c/dogs-vs-cats/data In our setup, we:

  • created a data/ folder
  • created train/ and validation/ subfolders inside data/
  • created cats/ and dogs/ subfolders inside train/ and validation/
  • put the cat pictures index 0-999 in data/train/cats
  • put the cat pictures index 1000-1400 in data/validation/cats
  • put the dogs pictures index 12500-13499 in data/train/dogs
  • put the dog pictures index 13500-13900 in data/validation/dogs So that we have 1000 training examples for each class, and 400 validation examples for each class. In summary, this is our directory structure:
    data/
      train/
          dogs/
              dog001.jpg
              dog002.jpg
              ...
          cats/
              cat001.jpg
              cat002.jpg
              ...
      validation/
          dogs/
              dog001.jpg
              dog002.jpg
              ...
          cats/
              cat001.jpg
              cat002.jpg
              ...

In [8]:
!ls -lh data


total 8.0K
drwxrwxr-x 4 ubuntu ubuntu 4.0K Aug 31 18:52 train
drwxrwxr-x 4 ubuntu ubuntu 4.0K Aug 31 18:52 validation

In [9]:
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications

# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16

In [10]:
# build the VGG16 network
model = applications.VGG16(include_top=False, weights='imagenet')


Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
57409536/58889256 [============================>.] - ETA: 0s

In [17]:
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

First we use all convolutional layers of VGG16 to create its bottleneck features on our data


In [11]:
# just for rescaling
datagen = ImageDataGenerator(rescale=1. / 255)

In [12]:
train_data_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode=None,
    shuffle=False)


Found 2000 images belonging to 2 classes.

In [13]:
bottleneck_features_train = model.predict_generator(
    train_data_generator, nb_train_samples // batch_size)

In [15]:
# 2000 images, 512 bottleneck features, 4*4 in size
bottleneck_features_train.shape


Out[15]:
(2000, 4, 4, 512)

In [18]:
np.save(open('bottleneck_features_train.npy', 'wb'),
        bottleneck_features_train)

In [19]:
# same for validation
validation_data_generator = datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode=None,
    shuffle=False)
bottleneck_features_validation = model.predict_generator(
    validation_data_generator, nb_validation_samples // batch_size)
np.save(open('bottleneck_features_validation.npy', 'wb'),
        bottleneck_features_validation)


Found 802 images belonging to 2 classes.

In [ ]: