Project Workflow

  1. Set up data folders in correct pattern
  2. Randomly split the images into training, validation, and test sets
  3. Image preprocessing and transformation
    • a. Images from each class will be transformed
      • i. Flipped on horizontal axis
      • ii. Flipped on vertical axis
      • iii. Rotated
      • iv. Horizontal shifts
      • v. Vertial shifts
      • vi. Shear augmentation
    • b. Images will be downsampled to 250x250 pixel images
    • c. Pixel value normalization
  4. Design initial convolutional neural network architecture
  5. Train and validate a. Repeat and modify CNN architecture to attempting to get above 90% accuracy.
  6. Test
  7. Report final results

In [3]:
import tensorflow as tf
import matplotlib.pyplot as plt
import os
import numpy as np
from IPython.display import display, Image
from scipy.ndimage import imread
import os, shutil
import sys
import random
import time
import pickle
from keras.preprocessing.image import ImageDataGenerator
from keras.preprocessing.image import array_to_img, img_to_array, load_img
from keras.models import Sequential, Model
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras import backend as K
from keras.callbacks import ModelCheckpoint  
from keras import applications
from keras.utils.np_utils import to_categorical
from keras.models import model_from_json
import random
from glob import glob
from sklearn.metrics import confusion_matrix
import itertools
#if K.image_data_format() == 'channels_first':
#    input_shape = (3, img_width, img_height)
#else:
#    input_shape = (img_width, img_height, 3)


Using TensorFlow backend.

1. Set up data folders


In [39]:
# Create a data folder
os.mkdir('data')
os.mkdir('data/train')
os.mkdir('data/valid')
os.mkdir('data/test')

In [40]:
# Remove windows thumbnail files
for folder in os.listdir('FIDS30'):
    for img in os.listdir('FIDS30/'+folder):
        if 'Thumbs.db' == img:
            os.remove('FIDS30/'+folder+'/'+img)

2. Split into training/validation/test sets


In [41]:
# loop through all the image files
filenames = []
for folder in os.listdir('FIDS30'):
    os.mkdir('data/train/'+folder)
    os.mkdir('data/valid/'+folder)
    os.mkdir('data/test/'+folder)
    files = os.listdir('FIDS30/'+folder)
    # shuffle data prior to splitting into train/validation/test sets
    random.shuffle(files)
    # 60% used as training data
    train_ix = int(len(files)*0.6)
    # 20% used as validation set
    valid_ix = int(len(files)*0.8)
    # assign the file paths to training, validation, or test data folders
    train = files[:train_ix]
    valid = files[train_ix:valid_ix]
    test = files[valid_ix:]
    for i in train:
        shutil.copy('FIDS30/'+folder+'/'+i,'data/train/'+folder)
    for j in valid:
        shutil.copy('FIDS30/'+folder+'/'+j,'data/valid/'+folder)
    for k in test:
        shutil.copy('FIDS30/'+folder+'/'+k,'data/test/'+folder)

3. Image preprocessing and transformation


In [2]:
batch_size = 16

# this is the augmentation configuration we will use for training
train_datagen = ImageDataGenerator(
            rotation_range=40,
            width_shift_range=0.2,
            height_shift_range=0.2,
            rescale=1./255,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True,
            vertical_flip=True,
            cval=255,
            fill_mode='constant')

# this is the augmentation configuration we will use for testing:
# only rescaling
test_datagen = ImageDataGenerator(rescale=1./255)

# this is a generator that will read pictures found in 
# subfolders of 'data/train', and will indefinitely generate
# batches of augmented image data
train_generator = train_datagen.flow_from_directory(
                    'data/train', # this is the target directory
                    target_size=(250,250), # all images will be resized to 259x250
                    batch_size=batch_size,
                    class_mode='categorical') # need categorical since not using binary target

# this is a similar generator, for validation data
validation_generator = test_datagen.flow_from_directory(
                        'data/valid',
                        target_size=(250,250),
                        batch_size=batch_size,
                        class_mode='categorical')


Found 2009 images belonging to 34 classes.
Found 675 images belonging to 34 classes.

The cells below create examples of the image manipulation we will perform on the dataset. This way we can see how the augementation affects the images.


In [3]:
datagen_ex = ImageDataGenerator(
            rotation_range=40,
            width_shift_range=0.2,
            height_shift_range=0.2,
            shear_range=0.2,
            zoom_range=0.2,
            horizontal_flip=True,
            vertical_flip=True,
            cval=255,
            fill_mode='constant')

img = load_img('data/train/acerolas/5.jpg') # this is a PIL image
x = img_to_array(img) # this is a NumPy array (3, x, y)
x = x.reshape((1,) + x.shape) # this is a NumPy array shaped as a tensor (1, 3, x, y)

# the .flow() command below generates batches of randomly transformed images
# and saves the results to the `preview/` directory
i = 0
for batch in datagen_ex.flow(x, batch_size=1,
                          save_to_dir='preview', save_format='jpeg'):
    i += 1
    if i > 20:
        break  # otherwise the generator would loop indefinitely

I chose to use the fill_mode='constant' because the using 'nearest' introduced artifacts that would not occur in real life (smearing effects). By using the constant fill mode, a white background fills space left blank after the image augmentation. This prevents fake/artifial features from appearing in the data.

Next, we can create a model architecture and start training.

4. Design initial convolutional neural network architecture


In [3]:
# dimensions of our images.
img_width, img_height = 250, 250

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)

In [4]:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

# the model so far outputs 3D feature maps (height, width, features)

model.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(34, activation='softmax'))

model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 248, 248, 32)      896       
_________________________________________________________________
activation_1 (Activation)    (None, 248, 248, 32)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 124, 124, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 122, 122, 32)      9248      
_________________________________________________________________
activation_2 (Activation)    (None, 122, 122, 32)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 61, 61, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 59, 59, 64)        18496     
_________________________________________________________________
activation_3 (Activation)    (None, 59, 59, 64)        0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 29, 29, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 53824)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                3444800   
_________________________________________________________________
activation_4 (Activation)    (None, 64)                0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 34)                2210      
=================================================================
Total params: 3,475,650
Trainable params: 3,475,650
Non-trainable params: 0
_________________________________________________________________

In [5]:
model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

In [7]:
from keras.models import model_from_json
# serialize model to JSON
model_json = model.to_json()
with open("scratch_model.json", "w") as json_file:
    json_file.write(model_json)

In [5]:
# create a checkpointer to save model weights
checkpointer = ModelCheckpoint(filepath='saved_models/weights.best.from_scratch.03.hdf5', 
                               verbose=1, save_best_only=True)

# now fit the model using the fit_generator
history = model.fit_generator(
        train_generator,
        steps_per_epoch=2009 // batch_size,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=675 // batch_size,
        callbacks=[checkpointer])
model.save_weights('first_try.h5') # always save your weights after training or during training


Epoch 1/50
124/125 [============================>.] - ETA: 1s - loss: 3.3944 - acc: 0.1190Epoch 00000: val_loss improved from inf to 3.05805, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 187s - loss: 3.3922 - acc: 0.1195 - val_loss: 3.0581 - val_acc: 0.2068
Epoch 2/50
124/125 [============================>.] - ETA: 1s - loss: 2.9751 - acc: 0.1723Epoch 00001: val_loss improved from 3.05805 to 2.53124, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 185s - loss: 2.9741 - acc: 0.1719 - val_loss: 2.5312 - val_acc: 0.2807
Epoch 3/50
124/125 [============================>.] - ETA: 1s - loss: 2.6385 - acc: 0.2147Epoch 00002: val_loss improved from 2.53124 to 2.21379, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 184s - loss: 2.6326 - acc: 0.2165 - val_loss: 2.2138 - val_acc: 0.2822
Epoch 4/50
124/125 [============================>.] - ETA: 1s - loss: 2.4947 - acc: 0.2307Epoch 00003: val_loss did not improve
125/125 [==============================] - 184s - loss: 2.5017 - acc: 0.2304 - val_loss: 2.2245 - val_acc: 0.2822
Epoch 5/50
124/125 [============================>.] - ETA: 1s - loss: 2.3332 - acc: 0.2680Epoch 00004: val_loss improved from 2.21379 to 2.01031, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 183s - loss: 2.3337 - acc: 0.2684 - val_loss: 2.0103 - val_acc: 0.3627
Epoch 6/50
124/125 [============================>.] - ETA: 1s - loss: 2.2778 - acc: 0.3070Epoch 00005: val_loss did not improve
125/125 [==============================] - 184s - loss: 2.2811 - acc: 0.3061 - val_loss: 2.0362 - val_acc: 0.3080
Epoch 7/50
124/125 [============================>.] - ETA: 1s - loss: 2.1737 - acc: 0.3248Epoch 00006: val_loss improved from 2.01031 to 1.89691, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 184s - loss: 2.1746 - acc: 0.3247 - val_loss: 1.8969 - val_acc: 0.3763
Epoch 8/50
124/125 [============================>.] - ETA: 1s - loss: 2.1489 - acc: 0.3229Epoch 00007: val_loss did not improve
125/125 [==============================] - 183s - loss: 2.1513 - acc: 0.3228 - val_loss: 2.0056 - val_acc: 0.3187
Epoch 9/50
124/125 [============================>.] - ETA: 1s - loss: 2.0954 - acc: 0.3437Epoch 00008: val_loss improved from 1.89691 to 1.79831, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 183s - loss: 2.0910 - acc: 0.3449 - val_loss: 1.7983 - val_acc: 0.4613
Epoch 10/50
124/125 [============================>.] - ETA: 1s - loss: 2.0516 - acc: 0.3682Epoch 00009: val_loss did not improve
125/125 [==============================] - 183s - loss: 2.0502 - acc: 0.3693 - val_loss: 2.4322 - val_acc: 0.3505
Epoch 11/50
124/125 [============================>.] - ETA: 1s - loss: 2.0389 - acc: 0.3787Epoch 00010: val_loss improved from 1.79831 to 1.72541, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 183s - loss: 2.0395 - acc: 0.3787 - val_loss: 1.7254 - val_acc: 0.4871
Epoch 12/50
124/125 [============================>.] - ETA: 1s - loss: 2.0290 - acc: 0.3931Epoch 00011: val_loss did not improve
125/125 [==============================] - 184s - loss: 2.0313 - acc: 0.3919 - val_loss: 1.7966 - val_acc: 0.4492
Epoch 13/50
124/125 [============================>.] - ETA: 1s - loss: 1.9162 - acc: 0.4097Epoch 00012: val_loss improved from 1.72541 to 1.42334, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 183s - loss: 1.9118 - acc: 0.4104 - val_loss: 1.4233 - val_acc: 0.5463
Epoch 14/50
124/125 [============================>.] - ETA: 1s - loss: 1.9730 - acc: 0.3991Epoch 00013: val_loss did not improve
125/125 [==============================] - 183s - loss: 1.9701 - acc: 0.3994 - val_loss: 1.5710 - val_acc: 0.5175
Epoch 15/50
124/125 [============================>.] - ETA: 1s - loss: 1.9236 - acc: 0.4276Epoch 00014: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.9233 - acc: 0.4282 - val_loss: 1.4631 - val_acc: 0.5508
Epoch 16/50
124/125 [============================>.] - ETA: 1s - loss: 1.9538 - acc: 0.4229Epoch 00015: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.9491 - acc: 0.4246 - val_loss: 1.4293 - val_acc: 0.5615
Epoch 17/50
124/125 [============================>.] - ETA: 1s - loss: 1.8979 - acc: 0.4185Epoch 00016: val_loss did not improve
125/125 [==============================] - 178s - loss: 1.9042 - acc: 0.4177 - val_loss: 1.4544 - val_acc: 0.5751
Epoch 18/50
124/125 [============================>.] - ETA: 1s - loss: 1.8377 - acc: 0.4605Epoch 00017: val_loss did not improve
125/125 [==============================] - 176s - loss: 1.8354 - acc: 0.4603 - val_loss: 1.6205 - val_acc: 0.4674
Epoch 19/50
124/125 [============================>.] - ETA: 1s - loss: 1.9292 - acc: 0.4245Epoch 00018: val_loss did not improve
125/125 [==============================] - 183s - loss: 1.9291 - acc: 0.4236 - val_loss: 1.4413 - val_acc: 0.5539
Epoch 20/50
124/125 [============================>.] - ETA: 1s - loss: 1.7758 - acc: 0.4658Epoch 00019: val_loss improved from 1.42334 to 1.37263, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 186s - loss: 1.7749 - acc: 0.4651 - val_loss: 1.3726 - val_acc: 0.5781
Epoch 21/50
124/125 [============================>.] - ETA: 1s - loss: 1.9161 - acc: 0.4457Epoch 00020: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.9141 - acc: 0.4447 - val_loss: 1.8131 - val_acc: 0.5599
Epoch 22/50
124/125 [============================>.] - ETA: 1s - loss: 1.8515 - acc: 0.4602Epoch 00021: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.8585 - acc: 0.4581 - val_loss: 1.9426 - val_acc: 0.4294
Epoch 23/50
124/125 [============================>.] - ETA: 1s - loss: 1.9162 - acc: 0.4428Epoch 00022: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.9204 - acc: 0.4422 - val_loss: 1.6746 - val_acc: 0.5296
Epoch 24/50
124/125 [============================>.] - ETA: 1s - loss: 1.9112 - acc: 0.4447Epoch 00023: val_loss improved from 1.37263 to 1.33787, saving model to saved_models/weights.best.from_scratch.03.hdf5
125/125 [==============================] - 185s - loss: 1.9067 - acc: 0.4452 - val_loss: 1.3379 - val_acc: 0.5918
Epoch 25/50
124/125 [============================>.] - ETA: 1s - loss: 1.8545 - acc: 0.4682Epoch 00024: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.8575 - acc: 0.4679 - val_loss: 1.4878 - val_acc: 0.5964
Epoch 26/50
124/125 [============================>.] - ETA: 1s - loss: 1.9486 - acc: 0.4421Epoch 00025: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.9437 - acc: 0.4431 - val_loss: 1.9442 - val_acc: 0.4674
Epoch 27/50
124/125 [============================>.] - ETA: 1s - loss: 1.9894 - acc: 0.4441Epoch 00026: val_loss did not improve
125/125 [==============================] - 184s - loss: 1.9939 - acc: 0.4441 - val_loss: 1.6198 - val_acc: 0.5129
Epoch 28/50
124/125 [============================>.] - ETA: 1s - loss: 1.9973 - acc: 0.4350Epoch 00027: val_loss did not improve
125/125 [==============================] - 186s - loss: 1.9964 - acc: 0.4346 - val_loss: 1.6634 - val_acc: 0.4628
Epoch 29/50
124/125 [============================>.] - ETA: 1s - loss: 1.9245 - acc: 0.4557Epoch 00028: val_loss did not improve
125/125 [==============================] - 183s - loss: 1.9295 - acc: 0.4541 - val_loss: 2.8823 - val_acc: 0.3703
Epoch 30/50
124/125 [============================>.] - ETA: 1s - loss: 2.0743 - acc: 0.4274Epoch 00029: val_loss did not improve
125/125 [==============================] - 183s - loss: 2.0688 - acc: 0.4284 - val_loss: 1.8253 - val_acc: 0.5797
Epoch 31/50
124/125 [============================>.] - ETA: 1s - loss: 2.1011 - acc: 0.4310Epoch 00030: val_loss did not improve
125/125 [==============================] - 183s - loss: 2.1079 - acc: 0.4316 - val_loss: 2.6486 - val_acc: 0.3126
Epoch 32/50
124/125 [============================>.] - ETA: 1s - loss: 2.1181 - acc: 0.4328Epoch 00031: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.1103 - acc: 0.4338 - val_loss: 1.5224 - val_acc: 0.5220
Epoch 33/50
124/125 [============================>.] - ETA: 1s - loss: 2.0362 - acc: 0.4310Epoch 00032: val_loss did not improve
125/125 [==============================] - 181s - loss: 2.0384 - acc: 0.4301 - val_loss: 1.7743 - val_acc: 0.5008
Epoch 34/50
124/125 [============================>.] - ETA: 1s - loss: 2.1294 - acc: 0.4212Epoch 00033: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.1285 - acc: 0.4198 - val_loss: 2.3976 - val_acc: 0.5114
Epoch 35/50
124/125 [============================>.] - ETA: 1s - loss: 2.2049 - acc: 0.4213Epoch 00034: val_loss did not improve
125/125 [==============================] - 181s - loss: 2.2034 - acc: 0.4204 - val_loss: 2.6232 - val_acc: 0.4219
Epoch 36/50
124/125 [============================>.] - ETA: 1s - loss: 2.1967 - acc: 0.3976Epoch 00035: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.1976 - acc: 0.3969 - val_loss: 1.5941 - val_acc: 0.4962
Epoch 37/50
124/125 [============================>.] - ETA: 1s - loss: 2.2565 - acc: 0.3879Epoch 00036: val_loss did not improve
125/125 [==============================] - 181s - loss: 2.2556 - acc: 0.3868 - val_loss: 1.6787 - val_acc: 0.4886
Epoch 38/50
124/125 [============================>.] - ETA: 1s - loss: 2.1262 - acc: 0.4221Epoch 00037: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.1288 - acc: 0.4217 - val_loss: 1.6278 - val_acc: 0.5933
Epoch 39/50
124/125 [============================>.] - ETA: 1s - loss: 2.3332 - acc: 0.3861Epoch 00038: val_loss did not improve
125/125 [==============================] - 181s - loss: 2.3323 - acc: 0.3861 - val_loss: 2.3374 - val_acc: 0.4901
Epoch 40/50
124/125 [============================>.] - ETA: 1s - loss: 2.3259 - acc: 0.3828Epoch 00039: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.3252 - acc: 0.3833 - val_loss: 1.8090 - val_acc: 0.4886
Epoch 41/50
124/125 [============================>.] - ETA: 1s - loss: 2.2341 - acc: 0.3971Epoch 00040: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.2423 - acc: 0.3954 - val_loss: 1.7881 - val_acc: 0.4977
Epoch 42/50
124/125 [============================>.] - ETA: 1s - loss: 2.4882 - acc: 0.3413Epoch 00041: val_loss did not improve
125/125 [==============================] - 182s - loss: 2.4934 - acc: 0.3411 - val_loss: 2.7081 - val_acc: 0.2124
Epoch 43/50
124/125 [============================>.] - ETA: 1s - loss: 2.2246 - acc: 0.3841Epoch 00042: val_loss did not improve
125/125 [==============================] - 183s - loss: 2.2189 - acc: 0.3861 - val_loss: 1.6722 - val_acc: 0.5964
Epoch 44/50
124/125 [============================>.] - ETA: 1s - loss: 2.4394 - acc: 0.3785Epoch 00043: val_loss did not improve
125/125 [==============================] - 181s - loss: 2.4354 - acc: 0.3799 - val_loss: 1.8023 - val_acc: 0.5175
Epoch 45/50
124/125 [============================>.] - ETA: 1s - loss: 2.3372 - acc: 0.4091Epoch 00044: val_loss did not improve
125/125 [==============================] - 179s - loss: 2.3374 - acc: 0.4088 - val_loss: 1.9568 - val_acc: 0.4234
Epoch 46/50
124/125 [============================>.] - ETA: 1s - loss: 2.2906 - acc: 0.4169Epoch 00045: val_loss did not improve
125/125 [==============================] - 175s - loss: 2.2929 - acc: 0.4166 - val_loss: 1.7367 - val_acc: 0.5599
Epoch 47/50
124/125 [============================>.] - ETA: 1s - loss: 2.3266 - acc: 0.4165Epoch 00046: val_loss did not improve
125/125 [==============================] - 175s - loss: 2.3234 - acc: 0.4162 - val_loss: 2.1314 - val_acc: 0.5235
Epoch 48/50
124/125 [============================>.] - ETA: 1s - loss: 2.5197 - acc: 0.3836Epoch 00047: val_loss did not improve
125/125 [==============================] - 175s - loss: 2.5184 - acc: 0.3816 - val_loss: 2.5386 - val_acc: 0.3263
Epoch 49/50
124/125 [============================>.] - ETA: 1s - loss: 2.5827 - acc: 0.3501Epoch 00048: val_loss did not improve
125/125 [==============================] - 175s - loss: 2.5781 - acc: 0.3498 - val_loss: 1.9273 - val_acc: 0.4401
Epoch 50/50
124/125 [============================>.] - ETA: 1s - loss: 2.5169 - acc: 0.3587Epoch 00049: val_loss did not improve
125/125 [==============================] - 175s - loss: 2.5142 - acc: 0.3588 - val_loss: 1.8432 - val_acc: 0.4568

In [6]:
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# val_loss: 1.1858 - val_acc: 0.641 trial 1
# val_loss: 1.2730 - val_acc: 0.6464 trial 2
# val_loss: 1.3379 - val_acc: 0.5918 trial 3


The accuracy platueas after about 15-20 epochs, and the loss plateuas after about 12-15 epochs.

Next, let's try using class weights becuase the class sizes are unbalanced.


In [3]:
# create a dict to hold the class weights
class_weight= dict()
# loop through the folders containing the images, use the folder name as the key
for folder in os.listdir('data/train'):
    class_weight[folder] = len(os.listdir('data/train/'+folder))
# normalize the class weight relative to the class with the most examples
max_class = max(class_weight.values())
for key in class_weight.keys():
    class_weight[key] = max_class/class_weight[key]
# encode the dictionary keys as numeric labels
class_weights = dict()
for i in range(0,34): # classes are 0-indexed
    class_weights[i] = class_weight[sorted(class_weight.keys())[i]]

Use the same model architecture using the class weights parameter.


In [4]:
# dimensions of our images.
img_width, img_height = 250, 250

if K.image_data_format() == 'channels_first':
    input_shape = (3, img_width, img_height)
else:
    input_shape = (img_width, img_height, 3)
    
model2 = Sequential()
model2.add(Conv2D(32, (3, 3), input_shape=input_shape))
model2.add(Activation('relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))

model2.add(Conv2D(32, (3, 3)))
model2.add(Activation('relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))

model2.add(Conv2D(64, (3, 3)))
model2.add(Activation('relu'))
model2.add(MaxPooling2D(pool_size=(2, 2)))

# the model so far outputs 3D feature maps (height, width, features)

model2.add(Flatten())  # this converts our 3D feature maps to 1D feature vectors
model2.add(Dense(64))
model2.add(Activation('relu'))
model2.add(Dropout(0.5))
model2.add(Dense(34, activation='softmax'))

print(model2.summary())

model2.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 248, 248, 32)      896       
_________________________________________________________________
activation_1 (Activation)    (None, 248, 248, 32)      0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 124, 124, 32)      0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 122, 122, 32)      9248      
_________________________________________________________________
activation_2 (Activation)    (None, 122, 122, 32)      0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 61, 61, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 59, 59, 64)        18496     
_________________________________________________________________
activation_3 (Activation)    (None, 59, 59, 64)        0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 29, 29, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 53824)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                3444800   
_________________________________________________________________
activation_4 (Activation)    (None, 64)                0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 34)                2210      
=================================================================
Total params: 3,475,650
Trainable params: 3,475,650
Non-trainable params: 0
_________________________________________________________________
None

In [5]:
# now run the model with class weights
# create a checkpointer to save model weights
checkpointer = ModelCheckpoint(filepath='saved_models/class-weights-weights-improvement03-{epoch:02d}-{val_acc:.2f}.hdf5', 
                               verbose=1, save_best_only=True)

# now fit the model using the fit_generator
history = model2.fit_generator(
        train_generator,
        steps_per_epoch=2009 // batch_size,
        epochs=50,
        validation_data=validation_generator,
        validation_steps=675 // batch_size,
        class_weight=class_weights,
        callbacks=[checkpointer])
model2.save_weights('saved_models/first_try_with_class_weights.h5') # always save your weights after training or during training


Epoch 1/50
124/125 [============================>.] - ETA: 1s - loss: 14.7868 - acc: 0.0534Epoch 00000: val_loss improved from inf to 3.51659, saving model to saved_models/class-weights-weights-improvement03-00-0.06.hdf5
125/125 [==============================] - 187s - loss: 14.8306 - acc: 0.0545 - val_loss: 3.5166 - val_acc: 0.0551
Epoch 2/50
124/125 [============================>.] - ETA: 1s - loss: 14.4252 - acc: 0.0640Epoch 00001: val_loss improved from 3.51659 to 3.48066, saving model to saved_models/class-weights-weights-improvement03-01-0.11.hdf5
125/125 [==============================] - 190s - loss: 14.3860 - acc: 0.0635 - val_loss: 3.4807 - val_acc: 0.1123
Epoch 3/50
124/125 [============================>.] - ETA: 1s - loss: 14.1737 - acc: 0.0822Epoch 00002: val_loss improved from 3.48066 to 3.18984, saving model to saved_models/class-weights-weights-improvement03-02-0.16.hdf5
125/125 [==============================] - 189s - loss: 14.1953 - acc: 0.0825 - val_loss: 3.1898 - val_acc: 0.1639
Epoch 4/50
124/125 [============================>.] - ETA: 1s - loss: 13.5155 - acc: 0.1200Epoch 00003: val_loss improved from 3.18984 to 2.43310, saving model to saved_models/class-weights-weights-improvement03-03-0.25.hdf5
125/125 [==============================] - 186s - loss: 13.4874 - acc: 0.1205 - val_loss: 2.4331 - val_acc: 0.2519
Epoch 5/50
124/125 [============================>.] - ETA: 1s - loss: 12.6706 - acc: 0.1728Epoch 00004: val_loss improved from 2.43310 to 2.20226, saving model to saved_models/class-weights-weights-improvement03-04-0.27.hdf5
125/125 [==============================] - 187s - loss: 12.6110 - acc: 0.1739 - val_loss: 2.2023 - val_acc: 0.2656
Epoch 6/50
124/125 [============================>.] - ETA: 1s - loss: 11.7230 - acc: 0.2195Epoch 00005: val_loss improved from 2.20226 to 2.06043, saving model to saved_models/class-weights-weights-improvement03-05-0.31.hdf5
125/125 [==============================] - 189s - loss: 11.6882 - acc: 0.2193 - val_loss: 2.0604 - val_acc: 0.3080
Epoch 7/50
124/125 [============================>.] - ETA: 1s - loss: 11.5561 - acc: 0.2489Epoch 00006: val_loss improved from 2.06043 to 1.92529, saving model to saved_models/class-weights-weights-improvement03-06-0.40.hdf5
125/125 [==============================] - 191s - loss: 11.5331 - acc: 0.2489 - val_loss: 1.9253 - val_acc: 0.4006
Epoch 8/50
124/125 [============================>.] - ETA: 1s - loss: 11.0555 - acc: 0.2702Epoch 00007: val_loss improved from 1.92529 to 1.80133, saving model to saved_models/class-weights-weights-improvement03-07-0.44.hdf5
125/125 [==============================] - 189s - loss: 11.0812 - acc: 0.2706 - val_loss: 1.8013 - val_acc: 0.4416
Epoch 9/50
124/125 [============================>.] - ETA: 1s - loss: 10.7255 - acc: 0.2706Epoch 00008: val_loss improved from 1.80133 to 1.76488, saving model to saved_models/class-weights-weights-improvement03-08-0.51.hdf5
125/125 [==============================] - 192s - loss: 10.6994 - acc: 0.2714 - val_loss: 1.7649 - val_acc: 0.5144
Epoch 10/50
124/125 [============================>.] - ETA: 1s - loss: 10.7556 - acc: 0.2996Epoch 00009: val_loss improved from 1.76488 to 1.73544, saving model to saved_models/class-weights-weights-improvement03-09-0.50.hdf5
125/125 [==============================] - 186s - loss: 10.7455 - acc: 0.3002 - val_loss: 1.7354 - val_acc: 0.5023
Epoch 11/50
124/125 [============================>.] - ETA: 1s - loss: 10.0779 - acc: 0.3222Epoch 00010: val_loss improved from 1.73544 to 1.62173, saving model to saved_models/class-weights-weights-improvement03-10-0.51.hdf5
125/125 [==============================] - 188s - loss: 10.0969 - acc: 0.3232 - val_loss: 1.6217 - val_acc: 0.5144
Epoch 12/50
124/125 [============================>.] - ETA: 1s - loss: 10.1025 - acc: 0.3312Epoch 00011: val_loss did not improve
125/125 [==============================] - 188s - loss: 10.0705 - acc: 0.3316 - val_loss: 1.6324 - val_acc: 0.4917
Epoch 13/50
124/125 [============================>.] - ETA: 1s - loss: 9.9904 - acc: 0.3439Epoch 00012: val_loss did not improve
125/125 [==============================] - 192s - loss: 9.9627 - acc: 0.3437 - val_loss: 2.0647 - val_acc: 0.4052
Epoch 14/50
124/125 [============================>.] - ETA: 1s - loss: 9.6244 - acc: 0.3628Epoch 00013: val_loss did not improve
125/125 [==============================] - 191s - loss: 9.6326 - acc: 0.3609 - val_loss: 1.7310 - val_acc: 0.5341
Epoch 15/50
124/125 [============================>.] - ETA: 1s - loss: 10.3706 - acc: 0.3208Epoch 00014: val_loss did not improve
125/125 [==============================] - 184s - loss: 10.3901 - acc: 0.3193 - val_loss: 1.8361 - val_acc: 0.4249
Epoch 16/50
124/125 [============================>.] - ETA: 1s - loss: 9.7837 - acc: 0.3681Epoch 00015: val_loss improved from 1.62173 to 1.52150, saving model to saved_models/class-weights-weights-improvement03-15-0.55.hdf5
125/125 [==============================] - 181s - loss: 9.7258 - acc: 0.3697 - val_loss: 1.5215 - val_acc: 0.5524
Epoch 17/50
124/125 [============================>.] - ETA: 1s - loss: 9.9356 - acc: 0.3505Epoch 00016: val_loss improved from 1.52150 to 1.40755, saving model to saved_models/class-weights-weights-improvement03-16-0.59.hdf5
125/125 [==============================] - 179s - loss: 9.9252 - acc: 0.3507 - val_loss: 1.4076 - val_acc: 0.5918
Epoch 18/50
124/125 [============================>.] - ETA: 1s - loss: 9.6656 - acc: 0.3637Epoch 00017: val_loss did not improve
125/125 [==============================] - 179s - loss: 9.6755 - acc: 0.3623 - val_loss: 1.4265 - val_acc: 0.5675
Epoch 19/50
124/125 [============================>.] - ETA: 1s - loss: 9.4707 - acc: 0.3677Epoch 00018: val_loss did not improve
125/125 [==============================] - 181s - loss: 9.4648 - acc: 0.3678 - val_loss: 2.0056 - val_acc: 0.4598
Epoch 20/50
124/125 [============================>.] - ETA: 1s - loss: 9.5205 - acc: 0.3782Epoch 00019: val_loss did not improve
125/125 [==============================] - 187s - loss: 9.5544 - acc: 0.3757 - val_loss: 1.8103 - val_acc: 0.3945
Epoch 21/50
124/125 [============================>.] - ETA: 1s - loss: 10.1375 - acc: 0.3292Epoch 00020: val_loss did not improve
125/125 [==============================] - 186s - loss: 10.0761 - acc: 0.3301 - val_loss: 1.7180 - val_acc: 0.4689
Epoch 22/50
124/125 [============================>.] - ETA: 1s - loss: 9.6921 - acc: 0.3730Epoch 00021: val_loss did not improve
125/125 [==============================] - 188s - loss: 9.6851 - acc: 0.3720 - val_loss: 1.6230 - val_acc: 0.4901
Epoch 23/50
124/125 [============================>.] - ETA: 1s - loss: 9.6959 - acc: 0.3774Epoch 00022: val_loss did not improve
125/125 [==============================] - 188s - loss: 9.7063 - acc: 0.3758 - val_loss: 1.7149 - val_acc: 0.4917
Epoch 24/50
124/125 [============================>.] - ETA: 1s - loss: 10.2483 - acc: 0.3706Epoch 00023: val_loss did not improve
125/125 [==============================] - 186s - loss: 10.2273 - acc: 0.3717 - val_loss: 1.5335 - val_acc: 0.5114
Epoch 25/50
124/125 [============================>.] - ETA: 1s - loss: 9.6119 - acc: 0.3888Epoch 00024: val_loss did not improve
125/125 [==============================] - 187s - loss: 9.6164 - acc: 0.3877 - val_loss: 1.9053 - val_acc: 0.4006
Epoch 26/50
124/125 [============================>.] - ETA: 1s - loss: 10.3497 - acc: 0.3406Epoch 00025: val_loss improved from 1.40755 to 1.38660, saving model to saved_models/class-weights-weights-improvement03-25-0.57.hdf5
125/125 [==============================] - 185s - loss: 10.3337 - acc: 0.3403 - val_loss: 1.3866 - val_acc: 0.5660
Epoch 27/50
124/125 [============================>.] - ETA: 1s - loss: 9.7524 - acc: 0.3704Epoch 00026: val_loss did not improve
125/125 [==============================] - 187s - loss: 9.7128 - acc: 0.3699 - val_loss: 1.9779 - val_acc: 0.4719
Epoch 28/50
124/125 [============================>.] - ETA: 1s - loss: 9.6091 - acc: 0.3889Epoch 00027: val_loss did not improve
125/125 [==============================] - 185s - loss: 9.5627 - acc: 0.3893 - val_loss: 1.7643 - val_acc: 0.4613
Epoch 29/50
124/125 [============================>.] - ETA: 1s - loss: 10.1851 - acc: 0.3661Epoch 00028: val_loss did not improve
125/125 [==============================] - 186s - loss: 10.1919 - acc: 0.3642 - val_loss: 1.9864 - val_acc: 0.4219
Epoch 30/50
124/125 [============================>.] - ETA: 1s - loss: 9.4468 - acc: 0.4105Epoch 00029: val_loss did not improve
125/125 [==============================] - 185s - loss: 9.4698 - acc: 0.4112 - val_loss: 1.7290 - val_acc: 0.4992
Epoch 31/50
124/125 [============================>.] - ETA: 1s - loss: 10.0013 - acc: 0.3463Epoch 00030: val_loss did not improve
125/125 [==============================] - 186s - loss: 9.9940 - acc: 0.3470 - val_loss: 1.8478 - val_acc: 0.5068
Epoch 32/50
124/125 [============================>.] - ETA: 1s - loss: 10.5628 - acc: 0.3464Epoch 00031: val_loss did not improve
125/125 [==============================] - 186s - loss: 10.6989 - acc: 0.3467 - val_loss: 3.1682 - val_acc: 0.1077
Epoch 33/50
124/125 [============================>.] - ETA: 1s - loss: 10.6048 - acc: 0.3245Epoch 00032: val_loss did not improve
125/125 [==============================] - 186s - loss: 10.5877 - acc: 0.3239 - val_loss: 1.6774 - val_acc: 0.4932
Epoch 34/50
124/125 [============================>.] - ETA: 1s - loss: 9.9486 - acc: 0.3628Epoch 00033: val_loss did not improve
125/125 [==============================] - 181s - loss: 9.9407 - acc: 0.3629 - val_loss: 1.5646 - val_acc: 0.5129
Epoch 35/50
124/125 [============================>.] - ETA: 1s - loss: 10.1897 - acc: 0.3691Epoch 00034: val_loss did not improve
125/125 [==============================] - 182s - loss: 10.1712 - acc: 0.3697 - val_loss: 1.7753 - val_acc: 0.4568
Epoch 36/50
124/125 [============================>.] - ETA: 1s - loss: 10.3226 - acc: 0.3424Epoch 00035: val_loss did not improve
125/125 [==============================] - 186s - loss: 10.3152 - acc: 0.3432 - val_loss: 1.6861 - val_acc: 0.4765
Epoch 37/50
124/125 [============================>.] - ETA: 1s - loss: 10.6168 - acc: 0.3374Epoch 00036: val_loss did not improve
125/125 [==============================] - 185s - loss: 10.6090 - acc: 0.3382 - val_loss: 1.7808 - val_acc: 0.4977
Epoch 38/50
124/125 [============================>.] - ETA: 1s - loss: 10.8975 - acc: 0.3105Epoch 00037: val_loss did not improve
125/125 [==============================] - 185s - loss: 10.8542 - acc: 0.3125 - val_loss: 1.9112 - val_acc: 0.5083
Epoch 39/50
124/125 [============================>.] - ETA: 1s - loss: 10.6510 - acc: 0.3058Epoch 00038: val_loss did not improve
125/125 [==============================] - 183s - loss: 10.6407 - acc: 0.3064 - val_loss: 1.9483 - val_acc: 0.4249
Epoch 40/50
124/125 [============================>.] - ETA: 1s - loss: 11.5104 - acc: 0.3052Epoch 00039: val_loss did not improve
125/125 [==============================] - 184s - loss: 11.4919 - acc: 0.3048 - val_loss: 1.5977 - val_acc: 0.4643
Epoch 41/50
124/125 [============================>.] - ETA: 1s - loss: 11.0165 - acc: 0.3616Epoch 00040: val_loss did not improve
125/125 [==============================] - 184s - loss: 10.9899 - acc: 0.3632 - val_loss: 1.7359 - val_acc: 0.5175
Epoch 42/50
124/125 [============================>.] - ETA: 1s - loss: 12.9626 - acc: 0.2087Epoch 00041: val_loss did not improve
125/125 [==============================] - 189s - loss: 12.9850 - acc: 0.2075 - val_loss: 2.3779 - val_acc: 0.2959
Epoch 43/50
124/125 [============================>.] - ETA: 1s - loss: 11.4886 - acc: 0.2955Epoch 00042: val_loss did not improve
125/125 [==============================] - 188s - loss: 11.4956 - acc: 0.2952 - val_loss: 1.6023 - val_acc: 0.5144
Epoch 44/50
124/125 [============================>.] - ETA: 1s - loss: 11.2283 - acc: 0.3319Epoch 00043: val_loss did not improve
125/125 [==============================] - 185s - loss: 11.2056 - acc: 0.3318 - val_loss: 1.6636 - val_acc: 0.4871
Epoch 45/50
124/125 [============================>.] - ETA: 1s - loss: 12.0851 - acc: 0.2876Epoch 00044: val_loss did not improve
125/125 [==============================] - 185s - loss: 12.1774 - acc: 0.2883 - val_loss: 1.8917 - val_acc: 0.4279
Epoch 46/50
124/125 [============================>.] - ETA: 1s - loss: 11.2191 - acc: 0.2374Epoch 00045: val_loss did not improve
125/125 [==============================] - 185s - loss: 11.1925 - acc: 0.2360 - val_loss: 2.6810 - val_acc: 0.2777
Epoch 47/50
124/125 [============================>.] - ETA: 1s - loss: 12.6029 - acc: 0.2564Epoch 00046: val_loss did not improve
125/125 [==============================] - 186s - loss: 12.5969 - acc: 0.2569 - val_loss: 2.1433 - val_acc: 0.3551
Epoch 48/50
124/125 [============================>.] - ETA: 1s - loss: 11.9271 - acc: 0.2457Epoch 00047: val_loss did not improve
125/125 [==============================] - 185s - loss: 11.9414 - acc: 0.2463 - val_loss: 2.2415 - val_acc: 0.3187
Epoch 49/50
124/125 [============================>.] - ETA: 1s - loss: 12.4184 - acc: 0.2703Epoch 00048: val_loss did not improve
125/125 [==============================] - 185s - loss: 12.3833 - acc: 0.2692 - val_loss: 2.0105 - val_acc: 0.3945
Epoch 50/50
124/125 [============================>.] - ETA: 1s - loss: 13.0497 - acc: 0.2614Epoch 00049: val_loss did not improve
125/125 [==============================] - 180s - loss: 13.0489 - acc: 0.2638 - val_loss: 2.4525 - val_acc: 0.4112

In [7]:
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# val_loss: 1.4879 - val_acc: 0.5144 trial 2
# val_loss: 1.3866 - val_acc: 0.5660 trial 3


Using class weights actually brought the accuracy down compared to the model without the class weights. Next, I will use transfer learning with the VGG16 CNN architecture. First, need to make dictionaries holding the file locations of each image. These will be fed to the VGG16 convolution layer which serves as a feature extractor.


In [6]:
# create a dict to hold the number of examples of each class in the train and valid data
nb_train_class= dict()
# loop through the folders containing the images, use the folder name as the key
for folder in os.listdir('data/train'):
    nb_train_class[folder] = len(os.listdir('data/train/'+folder))
nb_valid_class = dict()
for folder in os.listdir('data/valid'):
    nb_valid_class[folder] = len(os.listdir('data/valid/'+folder))
nb_test_class = dict()
for folder in os.listdir('data/test'):
    nb_test_class[folder] = len(os.listdir('data/test/'+folder))
print(sum(nb_train_class.values()))
print(sum(nb_valid_class.values()))
print(sum(nb_test_class.values()))


2009
675
689

In [4]:
# use transfer learning
# from keras import applications

# dimensions of our images
img_width, img_height = 250, 250

top_model_weights_path = 'bottleneck_fc_model.h5'
train_data_dir = 'data/train'
validation_data_dir = 'data/valid'
nb_train_samples = 2009
nb_validation_samples = 675
epochs = 50
batch_size = 16

def save_bottleneck_features():
    datagen = ImageDataGenerator(rescale=1. / 255)
    
    # build the VGG16 network
    model = applications.VGG16(include_top=False, weights='imagenet')
    
    generator = datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode=None, # this means the generator will only yield data, no class labels
        shuffle=False) # our data will be in order
    
    # the predict_generator method returns the output of a model, given
    # a generator that yields batches of numpy data
    bottleneck_features_train = model.predict_generator(
        generator, nb_train_samples // batch_size)
    print(bottleneck_features_train.shape, 'train features')
    # save the output as a numpy array
    np.save(open('bottleneck_features_train.npy', 'wb'),
           bottleneck_features_train)
    
    generator = datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_width, img_height),
        batch_size=batch_size,
        class_mode=None,
        shuffle=False)
    
    bottleneck_features_validation = model.predict_generator(
        generator, nb_validation_samples // batch_size)
    print(bottleneck_features_validation.shape, 'valid features')
    np.save(open('bottleneck_features_validation.npy', 'wb'),
           bottleneck_features_validation)



save_bottleneck_features()


Found 2009 images belonging to 34 classes.
(2000, 7, 7, 512) train features
Found 675 images belonging to 34 classes.
(672, 7, 7, 512) valid features

In [5]:
def train_top_model(nb_train_class, nb_valid_class):
    train_data = np.load(open('bottleneck_features_train.npy','rb'))
    print(train_data.shape, 'train data')
    # the features were saved in order, so recreating the labels is easy
    train_labels = np.array(
        [0] * nb_train_class['acerolas'] + [1] * nb_train_class['apples'] +
        [2] * nb_train_class['apricots'] + [3] * nb_train_class['avocados'] +
        [4] * nb_train_class['bananas'] + [5] * nb_train_class['blackberries'] +
        [6] * nb_train_class['blueberries'] + [7] * nb_train_class['cantaloupes'] +
        [8] * nb_train_class['cherries'] + [9] * nb_train_class['coconuts'] + 
        [10] * nb_train_class['figs'] + [11] * nb_train_class['grapefruits'] +
        [12] * nb_train_class['grapes'] + [13] * nb_train_class['guava'] +
        [14] * nb_train_class['honneydew_melon'] + [15] * nb_train_class['kiwifruit'] +
        [16] * nb_train_class['lemons'] + [17] * nb_train_class['limes'] + 
        [18] * nb_train_class['mangos'] + [19] * nb_train_class['nectarine'] +
        [20] * nb_train_class['olives'] + [21] * nb_train_class['onion'] +
        [22] * nb_train_class['orange'] + [23] * nb_train_class['passionfruit'] +
        [24] * nb_train_class['peaches'] + [25] * nb_train_class['pears'] +
        [26] * nb_train_class['pineapples'] + [27] * nb_train_class['plums'] +
        [28] * nb_train_class['pomegranates'] + [29] * nb_train_class['potato'] +
        [30] * nb_train_class['raspberries'] + [31] * nb_train_class['strawberries'] +
        [32] * nb_train_class['tomatoes'] + [33] * (nb_train_class['watermelon']-9))
    train_labels = to_categorical(train_labels, num_classes=34)
    print(train_labels.shape, 'train labels')

    validation_data = np.load(open('bottleneck_features_validation.npy','rb'))
    print(validation_data.shape, 'valid data')
    validation_labels = np.array(
        [0] * nb_valid_class['acerolas'] + [1] * nb_valid_class['apples'] +
        [2] * nb_valid_class['apricots'] + [3] * nb_valid_class['avocados'] +
        [4] * nb_valid_class['bananas'] + [5] * nb_valid_class['blackberries'] +
        [6] * nb_valid_class['blueberries'] + [7] * nb_valid_class['cantaloupes'] +
        [8] * nb_valid_class['cherries'] + [9] * nb_valid_class['coconuts'] + 
        [10] * nb_valid_class['figs'] + [11] * nb_valid_class['grapefruits'] +
        [12] * nb_valid_class['grapes'] + [13] * nb_valid_class['guava'] +
        [14] * nb_valid_class['honneydew_melon'] + [15] * nb_valid_class['kiwifruit'] +
        [16] * nb_valid_class['lemons'] + [17] * nb_valid_class['limes'] + 
        [18] * nb_valid_class['mangos'] + [19] * nb_valid_class['nectarine'] +
        [20] * nb_valid_class['olives'] + [21] * nb_valid_class['onion'] +
        [22] * nb_valid_class['orange'] + [23] * nb_valid_class['passionfruit'] +
        [24] * nb_valid_class['peaches'] + [25] * nb_valid_class['pears'] +
        [26] * nb_valid_class['pineapples'] + [27] * nb_valid_class['plums'] +
        [28] * nb_valid_class['pomegranates'] + [29] * nb_valid_class['potato'] +
        [30] * nb_valid_class['raspberries'] + [31] * nb_valid_class['strawberries'] +
        [32] * nb_valid_class['tomatoes'] + [33] * (nb_valid_class['watermelon']-3))
    validation_labels = to_categorical(validation_labels, num_classes=34)
    np.save(open('validation_labels.npy', 'wb'),
           validation_labels)
    print(validation_labels.shape, 'valid labels')

    model3 = Sequential()
    model3.add(Flatten(input_shape=train_data.shape[1:]))
    model3.add(Dense(256, activation='relu'))
    model3.add(Dropout(0.5))
    model3.add(Dense(34, activation='softmax'))

    model3.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])
    
    # create a checkpointer to save model weights
    checkpointer = ModelCheckpoint(filepath='saved_models/tflearning-weights-improvement03-{epoch:02d}-{val_acc:.2f}.hdf5', 
                               verbose=1, save_best_only=True)

    history = model3.fit(train_data, train_labels,
              epochs=epochs,
              batch_size=batch_size,
              validation_data=(validation_data, validation_labels),
                callbacks=[checkpointer])
    model3.save_weights('bottleneck_fc_model.h5')
    return history

history3 = train_top_model(nb_train_class, nb_valid_class)


(2000, 7, 7, 512) train data
(2000, 34) train labels
(672, 7, 7, 512) valid data
(672, 34) valid labels
Train on 2000 samples, validate on 672 samples
Epoch 1/50
1984/2000 [============================>.] - ETA: 0s - loss: 3.6198 - acc: 0.2616Epoch 00000: val_loss improved from inf to 1.82638, saving model to saved_models/tflearning-weights-improvement03-00-0.50.hdf5
2000/2000 [==============================] - 12s - loss: 3.6011 - acc: 0.2645 - val_loss: 1.8264 - val_acc: 0.5015
Epoch 2/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.9222 - acc: 0.4677Epoch 00001: val_loss improved from 1.82638 to 1.50295, saving model to saved_models/tflearning-weights-improvement03-01-0.57.hdf5
2000/2000 [==============================] - 13s - loss: 1.9213 - acc: 0.4675 - val_loss: 1.5030 - val_acc: 0.5729
Epoch 3/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.5847 - acc: 0.5479Epoch 00002: val_loss improved from 1.50295 to 1.35749, saving model to saved_models/tflearning-weights-improvement03-02-0.64.hdf5
2000/2000 [==============================] - 13s - loss: 1.5772 - acc: 0.5505 - val_loss: 1.3575 - val_acc: 0.6384
Epoch 4/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3804 - acc: 0.6174Epoch 00003: val_loss improved from 1.35749 to 1.02760, saving model to saved_models/tflearning-weights-improvement03-03-0.73.hdf5
2000/2000 [==============================] - 15s - loss: 1.3875 - acc: 0.6160 - val_loss: 1.0276 - val_acc: 0.7321
Epoch 5/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.2291 - acc: 0.6426Epoch 00004: val_loss improved from 1.02760 to 1.00650, saving model to saved_models/tflearning-weights-improvement03-04-0.73.hdf5
2000/2000 [==============================] - 13s - loss: 1.2339 - acc: 0.6410 - val_loss: 1.0065 - val_acc: 0.7336
Epoch 6/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.1152 - acc: 0.6880Epoch 00005: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.1124 - acc: 0.6885 - val_loss: 1.0199 - val_acc: 0.7292
Epoch 7/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.9972 - acc: 0.7102Epoch 00006: val_loss improved from 1.00650 to 0.89735, saving model to saved_models/tflearning-weights-improvement03-06-0.78.hdf5
2000/2000 [==============================] - 14s - loss: 0.9968 - acc: 0.7105 - val_loss: 0.8974 - val_acc: 0.7753
Epoch 8/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.9172 - acc: 0.7233Epoch 00007: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.9175 - acc: 0.7230 - val_loss: 0.9306 - val_acc: 0.7589
Epoch 9/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.7969 - acc: 0.7656Epoch 00008: val_loss improved from 0.89735 to 0.88410, saving model to saved_models/tflearning-weights-improvement03-08-0.79.hdf5
2000/2000 [==============================] - 14s - loss: 0.7957 - acc: 0.7650 - val_loss: 0.8841 - val_acc: 0.7887
Epoch 10/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.7323 - acc: 0.7863Epoch 00009: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.7383 - acc: 0.7850 - val_loss: 0.9100 - val_acc: 0.7842
Epoch 11/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.7019 - acc: 0.8004Epoch 00010: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.7004 - acc: 0.8005 - val_loss: 0.9204 - val_acc: 0.8095
Epoch 12/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.7018 - acc: 0.7974Epoch 00011: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.7030 - acc: 0.7970 - val_loss: 0.9717 - val_acc: 0.7812
Epoch 13/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.5925 - acc: 0.8291Epoch 00012: val_loss improved from 0.88410 to 0.79935, saving model to saved_models/tflearning-weights-improvement03-12-0.81.hdf5
2000/2000 [==============================] - 14s - loss: 0.5933 - acc: 0.8285 - val_loss: 0.7994 - val_acc: 0.8051
Epoch 14/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.5841 - acc: 0.8241Epoch 00013: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.5858 - acc: 0.8235 - val_loss: 0.9830 - val_acc: 0.8065
Epoch 15/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.5358 - acc: 0.8407Epoch 00014: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.5354 - acc: 0.8405 - val_loss: 0.9101 - val_acc: 0.8155
Epoch 16/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.5079 - acc: 0.8458Epoch 00015: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.5097 - acc: 0.8440 - val_loss: 1.0222 - val_acc: 0.8006
Epoch 17/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4779 - acc: 0.8609Epoch 00016: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.4756 - acc: 0.8610 - val_loss: 1.0295 - val_acc: 0.8021
Epoch 18/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4860 - acc: 0.8624Epoch 00017: val_loss did not improve
2000/2000 [==============================] - 14s - loss: 0.4867 - acc: 0.8620 - val_loss: 1.0959 - val_acc: 0.8006
Epoch 19/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4649 - acc: 0.8589Epoch 00018: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.4651 - acc: 0.8590 - val_loss: 1.0462 - val_acc: 0.8036
Epoch 20/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4152 - acc: 0.8866Epoch 00019: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.4155 - acc: 0.8865 - val_loss: 1.0209 - val_acc: 0.8185
Epoch 21/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4401 - acc: 0.8760Epoch 00020: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.4403 - acc: 0.8760 - val_loss: 0.9508 - val_acc: 0.8244
Epoch 22/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4299 - acc: 0.8800Epoch 00021: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.4294 - acc: 0.8795 - val_loss: 1.0776 - val_acc: 0.8095
Epoch 23/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3918 - acc: 0.8896Epoch 00022: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.3904 - acc: 0.8900 - val_loss: 0.9290 - val_acc: 0.8438
Epoch 24/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.4081 - acc: 0.8856Epoch 00023: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.4085 - acc: 0.8855 - val_loss: 1.0729 - val_acc: 0.8110
Epoch 25/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3479 - acc: 0.9073Epoch 00024: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.3470 - acc: 0.9075 - val_loss: 1.0850 - val_acc: 0.8095
Epoch 26/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3766 - acc: 0.8962Epoch 00025: val_loss did not improve
2000/2000 [==============================] - 14s - loss: 0.3741 - acc: 0.8965 - val_loss: 1.0420 - val_acc: 0.8408
Epoch 27/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3330 - acc: 0.9052Epoch 00026: val_loss did not improve
2000/2000 [==============================] - 14s - loss: 0.3350 - acc: 0.9055 - val_loss: 1.1449 - val_acc: 0.8244
Epoch 28/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3331 - acc: 0.9017Epoch 00027: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.3351 - acc: 0.9015 - val_loss: 1.1867 - val_acc: 0.8036
Epoch 29/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3340 - acc: 0.9042Epoch 00028: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.3400 - acc: 0.9030 - val_loss: 1.2602 - val_acc: 0.8125
Epoch 30/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3395 - acc: 0.9093Epoch 00029: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 0.3430 - acc: 0.9095 - val_loss: 1.0805 - val_acc: 0.8214
Epoch 31/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2850 - acc: 0.9254Epoch 00030: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 0.2855 - acc: 0.9245 - val_loss: 1.0585 - val_acc: 0.8423
Epoch 32/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3160 - acc: 0.9168Epoch 00031: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.3180 - acc: 0.9170 - val_loss: 1.2151 - val_acc: 0.8080
Epoch 33/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2712 - acc: 0.9269Epoch 00032: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2747 - acc: 0.9260 - val_loss: 1.0895 - val_acc: 0.8408
Epoch 34/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2761 - acc: 0.9249Epoch 00033: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2740 - acc: 0.9255 - val_loss: 1.1267 - val_acc: 0.8318
Epoch 35/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2716 - acc: 0.9234Epoch 00034: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2754 - acc: 0.9225 - val_loss: 1.2177 - val_acc: 0.8214
Epoch 36/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.3019 - acc: 0.9229Epoch 00035: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2997 - acc: 0.9235 - val_loss: 1.0827 - val_acc: 0.8393
Epoch 37/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2637 - acc: 0.9330Epoch 00036: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 0.2633 - acc: 0.9330 - val_loss: 1.3164 - val_acc: 0.8199
Epoch 38/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2685 - acc: 0.9309Epoch 00037: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 0.2683 - acc: 0.9305 - val_loss: 1.1259 - val_acc: 0.8289
Epoch 39/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2483 - acc: 0.9360Epoch 00038: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2470 - acc: 0.9360 - val_loss: 1.2785 - val_acc: 0.8333
Epoch 40/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2733 - acc: 0.9234Epoch 00039: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2731 - acc: 0.9235 - val_loss: 1.0412 - val_acc: 0.8393
Epoch 41/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2730 - acc: 0.9269Epoch 00040: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 0.2734 - acc: 0.9270 - val_loss: 1.4658 - val_acc: 0.8110
Epoch 42/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2860 - acc: 0.9249Epoch 00041: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2862 - acc: 0.9250 - val_loss: 1.2439 - val_acc: 0.8423
Epoch 43/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2507 - acc: 0.9415Epoch 00042: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2508 - acc: 0.9410 - val_loss: 1.4217 - val_acc: 0.8259
Epoch 44/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2354 - acc: 0.9415Epoch 00043: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2366 - acc: 0.9405 - val_loss: 1.3687 - val_acc: 0.8304
Epoch 45/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2753 - acc: 0.9350Epoch 00044: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 0.2736 - acc: 0.9350 - val_loss: 1.3624 - val_acc: 0.8244
Epoch 46/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2516 - acc: 0.9400Epoch 00045: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2503 - acc: 0.9400 - val_loss: 1.3071 - val_acc: 0.8423
Epoch 47/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2763 - acc: 0.9335Epoch 00046: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2742 - acc: 0.9340 - val_loss: 1.3636 - val_acc: 0.8348
Epoch 48/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2753 - acc: 0.9365Epoch 00047: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2752 - acc: 0.9365 - val_loss: 1.1799 - val_acc: 0.8467
Epoch 49/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2442 - acc: 0.9425Epoch 00048: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 0.2426 - acc: 0.9430 - val_loss: 1.3091 - val_acc: 0.8423
Epoch 50/50
1984/2000 [============================>.] - ETA: 0s - loss: 0.2317 - acc: 0.9390Epoch 00049: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 0.2338 - acc: 0.9385 - val_loss: 1.2564 - val_acc: 0.8512

In [6]:
def acc_loss_curve(model_history):
    # summarize history for accuracy
    plt.plot(model_history.history['acc'])
    plt.plot(model_history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    # summarize history for loss
    plt.plot(model_history.history['loss'])
    plt.plot(model_history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
acc_loss_curve(history3)
# val_loss: 0.7571 - val_acc: 0.8244
# val_loss: 0.7994 - val_acc: 0.8051 trial 3


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-7398cdf6eb27> in <module>()
     16     plt.legend(['train', 'test'], loc='upper left')
     17     plt.show()
---> 18 acc_loss_curve(history3)
     19 # val_loss: 0.7571 - val_acc: 0.8244
     20 # val_loss: 0.7994 - val_acc: 0.8051 trial 3

NameError: name 'history3' is not defined

Using the bottleneck features from the VGG16 network produced validation accuracies in the low 80% range. It looks like there is some over fitting going on after about 12 epochs. Now we can use the same strategy, but with class weights during training.


In [12]:
# dimensions of our images
img_width, img_height = 250, 250

train_data_dir = 'data/train'
validation_data_dir = 'data/valid'
nb_train_samples = 2009
nb_validation_samples = 675
epochs = 50
batch_size = 16
top_model_weights_path = 'bottleneck_fc_model_cw.h5'
def train_top_model(nb_train_class, nb_valid_class):
    train_data = np.load(open('bottleneck_features_train.npy','rb'))
    print(train_data.shape, 'train data')
    # the features were saved in order, so recreating the labels is easy
    train_labels = np.array(
        [0] * nb_train_class['acerolas'] + [1] * nb_train_class['apples'] +
        [2] * nb_train_class['apricots'] + [3] * nb_train_class['avocados'] +
        [4] * nb_train_class['bananas'] + [5] * nb_train_class['blackberries'] +
        [6] * nb_train_class['blueberries'] + [7] * nb_train_class['cantaloupes'] +
        [8] * nb_train_class['cherries'] + [9] * nb_train_class['coconuts'] + 
        [10] * nb_train_class['figs'] + [11] * nb_train_class['grapefruits'] +
        [12] * nb_train_class['grapes'] + [13] * nb_train_class['guava'] +
        [14] * nb_train_class['honneydew_melon'] + [15] * nb_train_class['kiwifruit'] +
        [16] * nb_train_class['lemons'] + [17] * nb_train_class['limes'] + 
        [18] * nb_train_class['mangos'] + [19] * nb_train_class['nectarine'] +
        [20] * nb_train_class['olives'] + [21] * nb_train_class['onion'] +
        [22] * nb_train_class['orange'] + [23] * nb_train_class['passionfruit'] +
        [24] * nb_train_class['peaches'] + [25] * nb_train_class['pears'] +
        [26] * nb_train_class['pineapples'] + [27] * nb_train_class['plums'] +
        [28] * nb_train_class['pomegranates'] + [29] * nb_train_class['potato'] +
        [30] * nb_train_class['raspberries'] + [31] * nb_train_class['strawberries'] +
        [32] * nb_train_class['tomatoes'] + [33] * (nb_train_class['watermelon']-9))
    train_labels = to_categorical(train_labels, num_classes=34)
    print(train_labels.shape, 'train labels')

    validation_data = np.load(open('bottleneck_features_validation.npy','rb'))
    print(validation_data.shape, 'valid data')
    validation_labels = np.array(
        [0] * nb_valid_class['acerolas'] + [1] * nb_valid_class['apples'] +
        [2] * nb_valid_class['apricots'] + [3] * nb_valid_class['avocados'] +
        [4] * nb_valid_class['bananas'] + [5] * nb_valid_class['blackberries'] +
        [6] * nb_valid_class['blueberries'] + [7] * nb_valid_class['cantaloupes'] +
        [8] * nb_valid_class['cherries'] + [9] * nb_valid_class['coconuts'] + 
        [10] * nb_valid_class['figs'] + [11] * nb_valid_class['grapefruits'] +
        [12] * nb_valid_class['grapes'] + [13] * nb_valid_class['guava'] +
        [14] * nb_valid_class['honneydew_melon'] + [15] * nb_valid_class['kiwifruit'] +
        [16] * nb_valid_class['lemons'] + [17] * nb_valid_class['limes'] + 
        [18] * nb_valid_class['mangos'] + [19] * nb_valid_class['nectarine'] +
        [20] * nb_valid_class['olives'] + [21] * nb_valid_class['onion'] +
        [22] * nb_valid_class['orange'] + [23] * nb_valid_class['passionfruit'] +
        [24] * nb_valid_class['peaches'] + [25] * nb_valid_class['pears'] +
        [26] * nb_valid_class['pineapples'] + [27] * nb_valid_class['plums'] +
        [28] * nb_valid_class['pomegranates'] + [29] * nb_valid_class['potato'] +
        [30] * nb_valid_class['raspberries'] + [31] * nb_valid_class['strawberries'] +
        [32] * nb_valid_class['tomatoes'] + [33] * (nb_valid_class['watermelon']-3))
    validation_labels = to_categorical(validation_labels, num_classes=34)
    print(validation_labels.shape, 'valid labels')

    model4 = Sequential()
    model4.add(Flatten(input_shape=train_data.shape[1:]))
    model4.add(Dense(256, activation='relu'))
    model4.add(Dropout(0.5))
    model4.add(Dense(34, activation='softmax'))

    model4.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])
    
    # create a checkpointer to save model weights
    checkpointer = ModelCheckpoint(filepath='saved_models/tflearningwclassweights02-weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5', 
                               verbose=1, save_best_only=True)

    history = model4.fit(train_data, train_labels,
              epochs=epochs,
              batch_size=batch_size,
            class_weight=class_weights,
              validation_data=(validation_data, validation_labels),
            callbacks=[checkpointer])
    model4.save_weights('saved_models/bottleneck_fc_model_cw.h5')
    from keras.models import model_from_json
    # serialize model to JSON
    model_json = model4.to_json()
    with open("bestmodel.json", "w") as json_file:
        json_file.write(model_json)
    return history

history4 = train_top_model(nb_train_class, nb_valid_class)


(2000, 7, 7, 512) train data
(2000, 34) train labels
(672, 7, 7, 512) valid data
(672, 34) valid labels
Train on 2000 samples, validate on 672 samples
Epoch 1/50
1984/2000 [============================>.] - ETA: 0s - loss: 24.0663 - acc: 0.1331Epoch 00000: val_loss improved from inf to 2.66315, saving model to saved_models/tflearningwclassweights02-weights-improvement-00-0.28.hdf5
2000/2000 [==============================] - 12s - loss: 24.0687 - acc: 0.1340 - val_loss: 2.6632 - val_acc: 0.2753
Epoch 2/50
1984/2000 [============================>.] - ETA: 0s - loss: 12.5255 - acc: 0.2762Epoch 00001: val_loss improved from 2.66315 to 2.05778, saving model to saved_models/tflearningwclassweights02-weights-improvement-01-0.41.hdf5
2000/2000 [==============================] - 13s - loss: 12.4750 - acc: 0.2755 - val_loss: 2.0578 - val_acc: 0.4137
Epoch 3/50
1984/2000 [============================>.] - ETA: 0s - loss: 10.5569 - acc: 0.3871Epoch 00002: val_loss improved from 2.05778 to 1.48075, saving model to saved_models/tflearningwclassweights02-weights-improvement-02-0.58.hdf5
2000/2000 [==============================] - 13s - loss: 10.5643 - acc: 0.3870 - val_loss: 1.4807 - val_acc: 0.5818
Epoch 4/50
1984/2000 [============================>.] - ETA: 0s - loss: 9.9122 - acc: 0.4677   - ETA: 1s - loss: 9.97Epoch 00003: val_loss improved from 1.48075 to 1.21017, saving model to saved_models/tflearningwclassweights02-weights-improvement-03-0.69.hdf5
2000/2000 [==============================] - 13s - loss: 9.9022 - acc: 0.4670 - val_loss: 1.2102 - val_acc: 0.6920
Epoch 5/50
1984/2000 [============================>.] - ETA: 0s - loss: 8.2415 - acc: 0.5363Epoch 00004: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 8.2471 - acc: 0.5355 - val_loss: 1.2248 - val_acc: 0.6696
Epoch 6/50
1984/2000 [============================>.] - ETA: 0s - loss: 7.2327 - acc: 0.5817Epoch 00005: val_loss improved from 1.21017 to 1.14780, saving model to saved_models/tflearningwclassweights02-weights-improvement-05-0.71.hdf5
2000/2000 [==============================] - 13s - loss: 7.2037 - acc: 0.5805 - val_loss: 1.1478 - val_acc: 0.7098
Epoch 7/50
1984/2000 [============================>.] - ETA: 0s - loss: 6.3357 - acc: 0.6321Epoch 00006: val_loss improved from 1.14780 to 0.98015, saving model to saved_models/tflearningwclassweights02-weights-improvement-06-0.73.hdf5
2000/2000 [==============================] - 13s - loss: 6.3093 - acc: 0.6330 - val_loss: 0.9802 - val_acc: 0.7336
Epoch 8/50
1984/2000 [============================>.] - ETA: 0s - loss: 6.2871 - acc: 0.6633- ETA: 1s - loss: 6.Epoch 00007: val_loss improved from 0.98015 to 0.90285, saving model to saved_models/tflearningwclassweights02-weights-improvement-07-0.79.hdf5
2000/2000 [==============================] - 13s - loss: 6.2842 - acc: 0.6635 - val_loss: 0.9029 - val_acc: 0.7932
Epoch 9/50
1984/2000 [============================>.] - ETA: 0s - loss: 6.0611 - acc: 0.6981Epoch 00008: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 6.0720 - acc: 0.6970 - val_loss: 1.0632 - val_acc: 0.7262
Epoch 10/50
1984/2000 [============================>.] - ETA: 0s - loss: 4.1675 - acc: 0.7384Epoch 00009: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 4.1490 - acc: 0.7380 - val_loss: 1.2502 - val_acc: 0.7545
Epoch 11/50
1984/2000 [============================>.] - ETA: 0s - loss: 4.2318 - acc: 0.7409- ETA: 2s - lEpoch 00010: val_loss improved from 0.90285 to 0.84996, saving model to saved_models/tflearningwclassweights02-weights-improvement-10-0.82.hdf5
2000/2000 [==============================] - 13s - loss: 4.2302 - acc: 0.7405 - val_loss: 0.8500 - val_acc: 0.8155
Epoch 12/50
1984/2000 [============================>.] - ETA: 0s - loss: 3.9362 - acc: 0.7692Epoch 00011: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 3.9222 - acc: 0.7695 - val_loss: 0.8877 - val_acc: 0.8140
Epoch 13/50
1984/2000 [============================>.] - ETA: 0s - loss: 3.2073 - acc: 0.7984Epoch 00012: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 3.1882 - acc: 0.7980 - val_loss: 0.9205 - val_acc: 0.8051
Epoch 14/50
1984/2000 [============================>.] - ETA: 0s - loss: 3.9372 - acc: 0.8009Epoch 00013: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 3.9238 - acc: 0.8005 - val_loss: 0.8702 - val_acc: 0.8408
Epoch 15/50
1984/2000 [============================>.] - ETA: 0s - loss: 3.3099 - acc: 0.8014Epoch 00014: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 3.2876 - acc: 0.8015 - val_loss: 0.8749 - val_acc: 0.8185
Epoch 16/50
1984/2000 [============================>.] - ETA: 0s - loss: 3.1155 - acc: 0.8211Epoch 00015: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 3.0986 - acc: 0.8215 - val_loss: 0.8559 - val_acc: 0.8348
Epoch 17/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.4837 - acc: 0.8538Epoch 00016: val_loss improved from 0.84996 to 0.81764, saving model to saved_models/tflearningwclassweights02-weights-improvement-16-0.84.hdf5
2000/2000 [==============================] - 12s - loss: 2.4653 - acc: 0.8550 - val_loss: 0.8176 - val_acc: 0.8408
Epoch 18/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.5567 - acc: 0.8579- ETA: 1s - loss: 2Epoch 00017: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 2.5575 - acc: 0.8565 - val_loss: 0.9989 - val_acc: 0.8185
Epoch 19/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.1360 - acc: 0.8700Epoch 00018: val_loss did not improve
2000/2000 [==============================] - 12s - loss: 2.1191 - acc: 0.8710 - val_loss: 1.0428 - val_acc: 0.8006
Epoch 20/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.4805 - acc: 0.8664Epoch 00019: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 2.4635 - acc: 0.8665 - val_loss: 0.9822 - val_acc: 0.8274
Epoch 21/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.5270 - acc: 0.8649Epoch 00020: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 2.5079 - acc: 0.8655 - val_loss: 0.8735 - val_acc: 0.8423
Epoch 22/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.1904 - acc: 0.8780Epoch 00021: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 2.1829 - acc: 0.8775 - val_loss: 0.9639 - val_acc: 0.8304
Epoch 23/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.0366 - acc: 0.8851Epoch 00022: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 2.0281 - acc: 0.8845 - val_loss: 1.1279 - val_acc: 0.8155
Epoch 24/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.6714 - acc: 0.9002Epoch 00023: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.6636 - acc: 0.9000 - val_loss: 1.0372 - val_acc: 0.8423
Epoch 25/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.9579 - acc: 0.9037Epoch 00024: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.9779 - acc: 0.9035 - val_loss: 1.2234 - val_acc: 0.8318
Epoch 26/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.8251 - acc: 0.9052Epoch 00025: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.8591 - acc: 0.9045 - val_loss: 1.0270 - val_acc: 0.8363
Epoch 27/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.8978 - acc: 0.8992Epoch 00026: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.8839 - acc: 0.8995 - val_loss: 1.1507 - val_acc: 0.8304
Epoch 28/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.9760 - acc: 0.9017Epoch 00027: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.9864 - acc: 0.9010 - val_loss: 1.2330 - val_acc: 0.8304
Epoch 29/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3739 - acc: 0.9143Epoch 00028: val_loss did not improve
2000/2000 [==============================] - 13s - loss: 1.3958 - acc: 0.9130 - val_loss: 1.1185 - val_acc: 0.8393
Epoch 30/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.2989 - acc: 0.9118Epoch 00029: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 2.2872 - acc: 0.9105 - val_loss: 0.9549 - val_acc: 0.8542
Epoch 31/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.2958 - acc: 0.9133Epoch 00030: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.2856 - acc: 0.9140 - val_loss: 1.0522 - val_acc: 0.8452
Epoch 32/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3391 - acc: 0.9224Epoch 00031: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.3297 - acc: 0.9230 - val_loss: 1.1326 - val_acc: 0.8438
Epoch 33/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.7282 - acc: 0.9214Epoch 00032: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.7174 - acc: 0.9215 - val_loss: 1.1902 - val_acc: 0.8304
Epoch 34/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.4716 - acc: 0.9229Epoch 00033: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.5543 - acc: 0.9215 - val_loss: 1.0645 - val_acc: 0.8423
Epoch 35/50
1984/2000 [============================>.] - ETA: 0s - loss: 2.5221 - acc: 0.9108Epoch 00034: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 2.5319 - acc: 0.9105 - val_loss: 1.3706 - val_acc: 0.8274
Epoch 36/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3323 - acc: 0.9259Epoch 00035: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.3320 - acc: 0.9255 - val_loss: 1.0959 - val_acc: 0.8348
Epoch 37/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3184 - acc: 0.9244Epoch 00036: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.3122 - acc: 0.9240 - val_loss: 1.2259 - val_acc: 0.8482
Epoch 38/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.2975 - acc: 0.9239Epoch 00037: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.3029 - acc: 0.9235 - val_loss: 1.1980 - val_acc: 0.8423
Epoch 39/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.4691 - acc: 0.9309Epoch 00038: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.4909 - acc: 0.9280 - val_loss: 1.3864 - val_acc: 0.8452
Epoch 40/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3769 - acc: 0.9380Epoch 00039: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.3860 - acc: 0.9380 - val_loss: 1.2994 - val_acc: 0.8512
Epoch 41/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.1870 - acc: 0.9365Epoch 00040: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.2024 - acc: 0.9365 - val_loss: 1.3496 - val_acc: 0.8423
Epoch 42/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.6259 - acc: 0.9365Epoch 00041: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.6144 - acc: 0.9370 - val_loss: 1.1959 - val_acc: 0.8452
Epoch 43/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.4271 - acc: 0.9360Epoch 00042: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.4184 - acc: 0.9360 - val_loss: 1.3108 - val_acc: 0.8452
Epoch 44/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.7293 - acc: 0.9340Epoch 00043: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.8156 - acc: 0.9335 - val_loss: 1.2590 - val_acc: 0.8333
Epoch 45/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.1115 - acc: 0.9304Epoch 00044: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.1048 - acc: 0.9305 - val_loss: 1.2378 - val_acc: 0.8557
Epoch 46/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.4584 - acc: 0.9430Epoch 00045: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.4972 - acc: 0.9430 - val_loss: 1.3471 - val_acc: 0.8408
Epoch 47/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.3966 - acc: 0.9435Epoch 00046: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.3909 - acc: 0.9435 - val_loss: 1.3861 - val_acc: 0.8274
Epoch 48/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.2653 - acc: 0.9425Epoch 00047: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.2604 - acc: 0.9420 - val_loss: 1.2127 - val_acc: 0.8571
Epoch 49/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.4523 - acc: 0.9415Epoch 00048: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.4407 - acc: 0.9420 - val_loss: 1.3299 - val_acc: 0.8438
Epoch 50/50
1984/2000 [============================>.] - ETA: 0s - loss: 1.2409 - acc: 0.9395Epoch 00049: val_loss did not improve
2000/2000 [==============================] - 11s - loss: 1.2364 - acc: 0.9395 - val_loss: 1.4139 - val_acc: 0.8363

In [7]:
acc_loss_curve(history4)
# val_acc 0.7915
#val_loss: 0.8618 - val_acc: 0.8348
#val_loss: 0.8868 - val_acc: 0.7917


After running three trials of each CNN architecture, I selected the weights with the lowest loss on the validation data from the model using transfer learning with class weights. Below, I load a model with those weights and apply to the test data which has not been used during training.


In [2]:
json_file = open('bestmodel.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)
# load weights into new model
loaded_model.load_weights("saved_models/tflearningwclassweights02-weights-improvement-16-0.84.hdf5")
print("Loaded model from disk")
 
# evaluate loaded model on test data
loaded_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])


Loaded model from disk

Modify the save_bottleneck_features function for use with the test data


In [4]:
def produce_bottleneck_features():
    datagen = ImageDataGenerator(rescale=1. / 255)
    batch_size = 16
    # build the VGG16 network
    model = applications.VGG16(include_top=False, weights='imagenet')
    
    generator = datagen.flow_from_directory(
        'data/test',
        target_size=(250, 250),
        batch_size=16,
        class_mode=None, # this means the generator will only yield data, no class labels
        shuffle=False) # our data will be in order
    
    # the predict_generator method returns the output of a model, given
    # a generator that yields batches of numpy data
    bottleneck_features_test = model.predict_generator(
        generator, 689 // batch_size)
    print(bottleneck_features_test.shape, 'test features')
    # save the output as a numpy array
    np.save(open('bottleneck_features_test.npy', 'wb'),
           bottleneck_features_test)
    return bottleneck_features_test

Create an array holding all the labels of the test data.


In [7]:
test_labels = np.array(
        [0] * nb_test_class['acerolas'] + [1] * nb_test_class['apples'] +
        [2] * nb_test_class['apricots'] + [3] * nb_test_class['avocados'] +
        [4] * nb_test_class['bananas'] + [5] * nb_test_class['blackberries'] +
        [6] * nb_test_class['blueberries'] + [7] * nb_test_class['cantaloupes'] +
        [8] * nb_test_class['cherries'] + [9] * nb_test_class['coconuts'] + 
        [10] * nb_test_class['figs'] + [11] * nb_test_class['grapefruits'] +
        [12] * nb_test_class['grapes'] + [13] * nb_test_class['guava'] +
        [14] * nb_test_class['honneydew_melon'] + [15] * nb_test_class['kiwifruit'] +
        [16] * nb_test_class['lemons'] + [17] * nb_test_class['limes'] + 
        [18] * nb_test_class['mangos'] + [19] * nb_test_class['nectarine'] +
        [20] * nb_test_class['olives'] + [21] * nb_test_class['onion'] +
        [22] * nb_test_class['orange'] + [23] * nb_test_class['passionfruit'] +
        [24] * nb_test_class['peaches'] + [25] * nb_test_class['pears'] +
        [26] * nb_test_class['pineapples'] + [27] * nb_test_class['plums'] +
        [28] * nb_test_class['pomegranates'] + [29] * nb_test_class['potato'] +
        [30] * nb_test_class['raspberries'] + [31] * nb_test_class['strawberries'] +
        [32] * nb_test_class['tomatoes'] + [33] * (nb_test_class['watermelon']-1))
test_labels = to_categorical(test_labels, num_classes=34)
np.save(open('test_labels.npy', 'wb'),
           test_labels)
print(test_labels.shape, 'test labels')


(688, 34) test labels

In [8]:
# create a numpy array of the bottleneck features that can be fed to the top layer model.
X = produce_bottleneck_features()


Found 689 images belonging to 34 classes.
(688, 7, 7, 512) test features

Finally, evaluate the model on the test data.


In [172]:
results = loaded_model.evaluate(X, test_labels)
print("\n%s: %.2f%%" % (loaded_model.metrics_names[1], results[1]*100))


640/688 [==========================>...] - ETA: 0s
acc: 80.67%

In [15]:
fruit_names = [item[11:-1] for item in sorted(glob("data/train/*/"))]
print(fruit_names)


['acerolas', 'apples', 'apricots', 'avocados', 'bananas', 'blackberries', 'blueberries', 'cantaloupes', 'cherries', 'coconuts', 'figs', 'grapefruits', 'grapes', 'guava', 'honneydew_melon', 'kiwifruit', 'lemons', 'limes', 'mangos', 'nectarine', 'olives', 'onion', 'orange', 'passionfruit', 'peaches', 'pears', 'pineapples', 'plums', 'pomegranates', 'potato', 'raspberries', 'strawberries', 'tomatoes', 'watermelon']

Finally, we create a confusion matrix to evaluate the accuracy of specific classes in the data set.


In [17]:
# confusion matrix

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    """
    This function prints and plots the confusion matrix.
    Normalization can be applied by setting `normalize=True`.
    """
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=45)
    plt.yticks(tick_marks, classes)

    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.tight_layout()
    plt.ylabel('True label')
    plt.xlabel('Predicted label')

# Compute confusion matrix
cnf_matrix = confusion_matrix([np.argmax(x) for x in loaded_model.predict(X)], [np.argmax(x) for x in test_labels])
np.set_printoptions(precision=2)

# Plot non-normalized confusion matrix
plt.figure(figsize=(20,20))
plot_confusion_matrix(cnf_matrix, classes=fruit_names,
                      title='Confusion matrix, without normalization')
plt.savefig('cm.png')

# Plot normalized confusion matrix
plt.figure(figsize=(20,20))
plot_confusion_matrix(cnf_matrix, classes=fruit_names, normalize=True,
                      title='Normalized confusion matrix')
plt.savefig('cmnorm.png')
plt.show()

#test_labels
#[np.argmax(x) for x in loaded_model.predict(X)]
np.sum(confusion_matrix([np.argmax(x) for x in loaded_model.predict(X)], [np.argmax(x) for x in test_labels]))


Confusion matrix, without normalization
[[ 0  0  0 ...,  0  0  0]
 [ 0 73  0 ...,  0  0  0]
 [ 0  0  2 ...,  0  0  0]
 ..., 
 [ 0  0  0 ...,  5  0  0]
 [ 0  1  0 ...,  1  7  0]
 [ 0  0  0 ...,  0  0 41]]
c:\users\john maxi\anaconda3\envs\tensorflowenv\lib\site-packages\ipykernel_launcher.py:12: RuntimeWarning: invalid value encountered in true_divide
  if sys.path[0] == '':
Normalized confusion matrix
[[  nan   nan   nan ...,   nan   nan   nan]
 [ 0.    0.92  0.   ...,  0.    0.    0.  ]
 [ 0.    0.    0.22 ...,  0.    0.    0.  ]
 ..., 
 [ 0.    0.    0.   ...,  1.    0.    0.  ]
 [ 0.    0.06  0.   ...,  0.06  0.44  0.  ]
 [ 0.    0.    0.   ...,  0.    0.    0.84]]
Out[17]:
688

In [158]:
# just some code to quickly pull out the class accuracies and the names of each row
for i in range(cnf_matrix.shape[0]):
    print('{:.2f}'.format(cnf_matrix[i,i]/np.sum(cnf_matrix[:,i])*100))
for i in range(cnf_matrix.shape[0]):
    print('{}'.format(fruit_names[i]))


0.00
90.12
33.33
16.67
87.50
87.50
71.43
14.29
0.00
66.67
66.67
85.71
37.50
28.57
100.00
97.62
0.00
62.96
14.29
98.00
0.00
93.33
85.71
60.00
85.42
89.74
71.43
89.83
83.33
84.42
62.50
50.00
70.00
93.18
acerolas
apples
apricots
avocados
bananas
blackberries
blueberries
cantaloupes
cherries
coconuts
figs
grapefruits
grapes
guava
honneydew_melon
kiwifruit
lemons
limes
mangos
nectarine
olives
onion
orange
passionfruit
peaches
pears
pineapples
plums
pomegranates
potato
raspberries
strawberries
tomatoes
watermelon

In [214]:
def path_to_tensor(img_path):
    # loads RGB image as PIL.Image.Image type
    print(image)
    img = image.load_img(img_path, target_size=(250, 250))
    # convert PIL.Image.Image type to 3D tensor with shape (250, 250, 3)
    x = image.img_to_array(img)
    # normalize the image
    x /= 255
    # convert 3D tensor to 4D tensor with shape (1, 250, 250, 3) and return 4D tensor
    return np.expand_dims(x, axis=0)

In [208]:
def top_produce_predictor(img_path):
    """Depends on the path to tensor function, having a loaded model, and the fruit
    names array being created"""
    from keras.applications.vgg16 import VGG16
    # convert image file to a 4D tensor with shape (1,250,250,3)
    # tensor has normed pixel values
    normed_array = path_to_tensor(img_path)
    # extract the bottleneck features
    extracted_features = VGG16(weights='imagenet', include_top=False).predict(normed_array)
    # make predictions on the features
    preds = loaded_model.predict(extracted_features)
    return fruit_names[np.argmax(preds)]

In [215]:
top_produce_predictor('data/test/honneydew_melon/honneydew_melon_013.jpg')


<module 'keras.preprocessing.image' from 'c:\\users\\john maxi\\anaconda3\\envs\\tensorflowenv\\lib\\site-packages\\keras\\preprocessing\\image.py'>
Out[215]:
'honneydew_melon'

In [211]:
top_produce_predictor('data/test/nectarine/nectarine_078.jpg')


Out[211]:
'nectarine'

In [16]:
# model 1
#first_try.h5
#weights.best.from_scratch.02.hdf5
#weights.best.from_scratch.03.hdf5
# model 2
#class-weights-weights-improvement-26-0.54.hdf5
#class-weights-weights-improvement02-14-0.51.hdf5
#class-weights-weights-improvement03-25-0.57.hdf5
# model 3
#tflearning-weights-improvement-10-0.81.hdf5
#tflearning-weights-improvement02-12-0.82.hdf5
#tflearning-weights-improvement03-12-0.81.hdf5
# model 4
#tflearningwclassweights-weights-improvement-09-0.79.hdf5
#tflearningwclassweights02-weights-improvement-18-0.83.hdf5
#tflearningwclassweights03-weights-improvement-16-0.84.hdf5
def load_a_model(model, weights):
    json_file = open(model, 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    loaded_model = model_from_json(loaded_model_json)
    # load weights into new model
    loaded_model.load_weights(weights)
    print("Loaded model from disk")
    return loaded_model

weights_list = ['first_try.h5']
for w in weights_list:
    # load the model
    curr_model = load_a_model('scratch_model.json', w)
    # compile the model
    curr_model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
    # evaluate the model on validation data
    preds = curr_model.predict_generator(validation_generator, steps=675)
    print(preds.shape)
    for i in preds:
        np.put(preds,np.where(i),np.argmax(i))
    print(i)
    print(preds)
    #score = f1_score(validation_generator.classes,preds,average='weighted')


Loaded model from disk
(10605, 34)
[ 0.00734257  0.0550801   0.0239051   0.02283614  0.03092976  0.02933197
  0.02350995  0.02658211  0.02566814  0.02258073  0.02002076  0.02524665
  0.02918338  0.0263679   0.02916536  0.04755478  0.01972011  0.029465
  0.02802422  0.04417045  0.01815821  0.01325345  0.04677913  0.01881557
  0.04835045  0.03738371  0.03091703  0.02958899  0.02435     0.02331149
  0.0306966   0.03520932  0.03495983  0.04154112]
[[  1.00000000e+00   1.00000000e+00   1.00000000e+00 ...,   1.00000000e+00
    1.00000000e+00   1.00000000e+00]
 [  3.15205692e-18   6.35384619e-02   3.87358568e-14 ...,   2.28588947e-17
    4.93588068e-16   1.33601790e-02]
 [  5.87087670e-05   2.32318535e-01   3.73203220e-04 ...,   3.10981792e-04
    3.24324385e-04   2.50949943e-03]
 ..., 
 [  8.67761279e-11   1.95161998e-03   3.61621866e-09 ...,   1.07064531e-10
    2.98521902e-10   2.06606364e-05]
 [  1.12556353e-26   2.09591331e-08   2.07932627e-22 ...,   2.01546754e-22
    1.39881901e-22   2.02749906e-09]
 [  7.34256720e-03   5.50801046e-02   2.39050966e-02 ...,   3.52093242e-02
    3.49598266e-02   4.15411219e-02]]

In [10]:
len(validation_generator.classes)


Out[10]:
675