Retrain a CNN, part 2.1, creating bottleneck features

https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html



In [1]:

    
import warnings
warnings.filterwarnings('ignore')



In [2]:

    
%matplotlib inline
%pylab inline









    



Populating the interactive namespace from numpy and matplotlib



In [3]:

    
import matplotlib.pylab as plt
import numpy as np



In [4]:

    
from distutils.version import StrictVersion



In [5]:

    
import sklearn
print(sklearn.__version__)

assert StrictVersion(sklearn.__version__ ) >= StrictVersion('0.18.1')



In [6]:

    
import tensorflow as tf
tf.logging.set_verbosity(tf.logging.ERROR)
print(tf.__version__)

assert StrictVersion(tf.__version__) >= StrictVersion('1.1.0')



In [7]:

    
import keras
print(keras.__version__)

assert StrictVersion(keras.__version__) >= StrictVersion('2.0.0')









    



Using TensorFlow backend.






    



2.0.6

This script goes along the blog post "Building powerful image classification models using very little data" from blog.keras.io. It uses data that can be downloaded at: https://www.kaggle.com/c/dogs-vs-cats/data In our setup, we:

created a data/ folder
created train/ and validation/ subfolders inside data/
created cats/ and dogs/ subfolders inside train/ and validation/
put the cat pictures index 0-999 in data/train/cats
put the cat pictures index 1000-1400 in data/validation/cats
put the dogs pictures index 12500-13499 in data/train/dogs

put the dog pictures index 13500-13900 in data/validation/dogs So that we have 1000 training examples for each class, and 400 validation examples for each class. In summary, this is our directory structure:

data/
  train/
      dogs/
          dog001.jpg
          dog002.jpg
          ...
      cats/
          cat001.jpg
          cat002.jpg
          ...
  validation/
      dogs/
          dog001.jpg
          dog002.jpg
          ...
      cats/
          cat001.jpg
          cat002.jpg
          ...



In [8]:

    
!ls -lh data









    



total 8.0K
drwxrwxr-x 4 ubuntu ubuntu 4.0K Aug 31 18:52 train
drwxrwxr-x 4 ubuntu ubuntu 4.0K Aug 31 18:52 validation



In [9]:

    
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications

# dimensions of our images.
img_width, img_height = 150, 150

train_data_dir = 'data/train'
validation_data_dir = 'data/validation'
nb_train_samples = 2000
nb_validation_samples = 800
epochs = 50
batch_size = 16



In [10]:

    
# build the VGG16 network
model = applications.VGG16(include_top=False, weights='imagenet')









    



Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
57409536/58889256 [============================>.] - ETA: 0s



In [17]:

    
model.summary()









    



_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

First we use all convolutional layers of VGG16 to create its bottleneck features on our data



In [11]:

    
# just for rescaling
datagen = ImageDataGenerator(rescale=1. / 255)



In [12]:

    
train_data_generator = datagen.flow_from_directory(
    train_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode=None,
    shuffle=False)









    



Found 2000 images belonging to 2 classes.



In [13]:

    
bottleneck_features_train = model.predict_generator(
    train_data_generator, nb_train_samples // batch_size)



In [15]:

    
# 2000 images, 512 bottleneck features, 4*4 in size
bottleneck_features_train.shape









    Out[15]:





(2000, 4, 4, 512)



In [18]:

    
np.save(open('bottleneck_features_train.npy', 'wb'),
        bottleneck_features_train)



In [19]:

    
# same for validation
validation_data_generator = datagen.flow_from_directory(
    validation_data_dir,
    target_size=(img_width, img_height),
    batch_size=batch_size,
    class_mode=None,
    shuffle=False)
bottleneck_features_validation = model.predict_generator(
    validation_data_generator, nb_validation_samples // batch_size)
np.save(open('bottleneck_features_validation.npy', 'wb'),
        bottleneck_features_validation)









    



Found 802 images belonging to 2 classes.



In [ ]: