Fish classification

In this notebook the fish classification is done. We are going to classify in four classes: Tuna fish (TUNA), LAG, DOL and SHARK. The detector will save the cropped image of a fish. Here we will take this image and we will use a CNN to classify it.

In the original Kaggle competition there are six classes of fish: ALB, BET, YFT, DOL, LAG and SHARK. We started trying to classify them all, but three of them are vey similar: ALB, BET and YFT. In fact, they are all different tuna species, while the other fishes come from different families. Therefore, the classification of those species was difficult and the results were not too good. We will make a small comparison of both on the presentation, but here we will only upload the clsifier with four classes.


In [1]:
from PIL import Image
import tensorflow as tf
import numpy as np
import scipy
import os
import cv2
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import log_loss
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.core import Activation
from keras.layers.core import Flatten
from keras.layers.core import Dense
from keras.layers.core import Dropout
from keras import backend as K
import matplotlib.pyplot as plt


Using TensorFlow backend.

In [2]:
#Define some values and constants
fish_classes = ['TUNA','DOL','SHARK','LAG']
fish_classes_test = fish_classes
number_classes = len(fish_classes)
main_path_train = '../train_cut_oversample'
main_path_test = '../test'
channels = 3
ROWS_RESIZE = 100
COLS_RESIZE = 100

Now we read the data from the file where the fish detection part has stored the images.

We also preprocess slightly the images to convert them to the same size (100x100). The aspect ratio of the images is important, so instead of just resizing the image, we have created the function resize(im). This function takes an image and resizes its longest side to 100, keeping the aspect ratio. In other words, the short side of the image will be smaller than 100 poixels. This image is pasted onto the middle of a white layer that is 100x100. So, our image will have white pixels on two of its sides. This is not optimum, but it is still better than changing the aspect ratio. We have also tried with other colors, but the best results were achieved with white.


In [13]:
# Get data and preproccess it

def resize(image):
    rows = image.shape[0]
    cols = image.shape[1]
    dominant = max(rows,cols)
    ratio = ROWS_RESIZE/float(dominant)
    im_res = scipy.misc.imresize(image,ratio)
    rows = im_res.shape[0]
    cols = im_res.shape[1]
    im_res = Image.fromarray(im_res)
    layer = Image.new('RGB',[ROWS_RESIZE,COLS_RESIZE],(255,255,255))
    if rows > cols:
        layer.paste(im_res,(COLS_RESIZE/2-cols/2,0))
    if cols > rows:
        layer.paste(im_res,(0,ROWS_RESIZE/2-rows/2))
    if rows == cols:
            layer.paste(im_res,(0,0))            
    return np.array(layer)


X_train = []
y_labels = []
for classes in fish_classes:
    path_class = os.path.join(main_path_train,classes)
    y_class = np.tile(classes,len(os.listdir(path_class)))
    y_labels.extend(y_class)
    for image in os.listdir(path_class):
        path = os.path.join(path_class,image)
        im = scipy.misc.imread(path)
        im = resize(im)
        X_train.append(np.array(im))
     
X_train = np.array(X_train)

# Convert labels into one hot vectors
y_labels = LabelEncoder().fit_transform(y_labels)
y_train = np_utils.to_categorical(y_labels)


X_test = []
y_test = []
for classes in fish_classes_test:
    path_class = os.path.join(main_path_test,classes)
    y_class = np.tile(classes,len(os.listdir(path_class)))
    y_test.extend(y_class)
    for image in os.listdir(path_class):
        path = os.path.join(path_class,image)
        im = scipy.misc.imread(path)
        im = resize(im)
        X_test.append(np.array(im))
     
X_test = np.array(X_test)

# Convert labels into one hot vectors
y_test = LabelEncoder().fit_transform(y_test)
y_test = np_utils.to_categorical(y_test)



X_train = np.reshape(X_train,(X_train.shape[0],ROWS_RESIZE,COLS_RESIZE,channels))
X_test = np.reshape(X_test,(X_test.shape[0],ROWS_RESIZE,COLS_RESIZE,channels))
print('X_train shape: ',X_train.shape)
print('y_train shape: ',y_train.shape)
print('X_test shape: ',X_test.shape)
print('y_test shape: ',y_test.shape)


('X_train shape: ', (23581, 100, 100, 3))
('y_train shape: ', (23581, 4))
('X_test shape: ', (400, 100, 100, 3))
('y_test shape: ', (400, 4))

The data is now organized in the following way:

-The training has been done with 23581 images of size 100x100x3 (rgb).

-There are 4 possible classes: LAG, SHARK, DOL and TUNA.

-The test has been done with 400 images of the same size, 100 per class.

We are now ready to build and train the classifier. Th CNN has 7 convolutional layers, 4 pooling layers and three fully connected layers at the end. Dropout has been used in the fully connected layers to avoid overfitting. The loss function used is multi class logloss because is the one used by Kaggle in the competition. The optimizeer is gradient descent.


In [4]:
def center_normalize(x):
    return (x-K.mean(x))/K.std(x)
# Convolutional net

model = Sequential()

model.add(Activation(activation=center_normalize,input_shape=(ROWS_RESIZE,COLS_RESIZE,channels)))

model.add(Convolution2D(6,20,20,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(MaxPooling2D(pool_size=(2,2),dim_ordering='tf'))

model.add(Convolution2D(12,10,10,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(Convolution2D(12,10,10,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(MaxPooling2D(pool_size=(2,2),dim_ordering='tf'))

model.add(Convolution2D(24,5,5,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(Convolution2D(24,5,5,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(MaxPooling2D(pool_size=(2,2),dim_ordering='tf'))

model.add(Convolution2D(24,5,5,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(Convolution2D(24,5,5,border_mode='same',activation='relu',dim_ordering='tf'))
model.add(MaxPooling2D(pool_size=(2,2),dim_ordering='tf'))

model.add(Flatten())
model.add(Dense(4092,activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1024,activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(number_classes))
model.add(Activation('softmax'))

print(model.summary())

model.compile(optimizer='sgd',loss='categorical_crossentropy',metrics=['accuracy'])
model.fit(X_train,y_train,nb_epoch=1,verbose=1)


____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
activation_1 (Activation)        (None, 100, 100, 3)   0           activation_input_1[0][0]         
____________________________________________________________________________________________________
convolution2d_1 (Convolution2D)  (None, 100, 100, 6)   7206        activation_1[0][0]               
____________________________________________________________________________________________________
maxpooling2d_1 (MaxPooling2D)    (None, 50, 50, 6)     0           convolution2d_1[0][0]            
____________________________________________________________________________________________________
convolution2d_2 (Convolution2D)  (None, 50, 50, 12)    7212        maxpooling2d_1[0][0]             
____________________________________________________________________________________________________
convolution2d_3 (Convolution2D)  (None, 50, 50, 12)    14412       convolution2d_2[0][0]            
____________________________________________________________________________________________________
maxpooling2d_2 (MaxPooling2D)    (None, 25, 25, 12)    0           convolution2d_3[0][0]            
____________________________________________________________________________________________________
convolution2d_4 (Convolution2D)  (None, 25, 25, 24)    7224        maxpooling2d_2[0][0]             
____________________________________________________________________________________________________
convolution2d_5 (Convolution2D)  (None, 25, 25, 24)    14424       convolution2d_4[0][0]            
____________________________________________________________________________________________________
maxpooling2d_3 (MaxPooling2D)    (None, 12, 12, 24)    0           convolution2d_5[0][0]            
____________________________________________________________________________________________________
convolution2d_6 (Convolution2D)  (None, 12, 12, 24)    14424       maxpooling2d_3[0][0]             
____________________________________________________________________________________________________
convolution2d_7 (Convolution2D)  (None, 12, 12, 24)    14424       convolution2d_6[0][0]            
____________________________________________________________________________________________________
maxpooling2d_4 (MaxPooling2D)    (None, 6, 6, 24)      0           convolution2d_7[0][0]            
____________________________________________________________________________________________________
flatten_1 (Flatten)              (None, 864)           0           maxpooling2d_4[0][0]             
____________________________________________________________________________________________________
dense_1 (Dense)                  (None, 4092)          3539580     flatten_1[0][0]                  
____________________________________________________________________________________________________
dropout_1 (Dropout)              (None, 4092)          0           dense_1[0][0]                    
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 1024)          4191232     dropout_1[0][0]                  
____________________________________________________________________________________________________
dropout_2 (Dropout)              (None, 1024)          0           dense_2[0][0]                    
____________________________________________________________________________________________________
dense_3 (Dense)                  (None, 4)             4100        dropout_2[0][0]                  
____________________________________________________________________________________________________
activation_2 (Activation)        (None, 4)             0           dense_3[0][0]                    
====================================================================================================
Total params: 7,814,238
Trainable params: 7,814,238
Non-trainable params: 0
____________________________________________________________________________________________________
None
Epoch 1/1
23581/23581 [==============================] - 3622s - loss: 1.0135 - acc: 0.5487     
Out[4]:
<keras.callbacks.History at 0x7f94e75d5dd0>

Since there are a lot of images the training takes around one hour. Once it is done we can pass the test set to the classifier and measure its accuracy.


In [15]:
(loss,accuracy) = model.evaluate(X_test,y_test,verbose=1)
print('accuracy',accuracy)


400/400 [==============================] - 22s     
('accuracy', 0.69750000000000001)