Fish detection

In this notebook we address the problem of detecting and cropping the fishes from the data images. This is a problem of computer vision that has no easy solution. We considered different approaches, being the most relevant ones the following three:

  • Passing the whole image to a CNN: As shown in the data exploration section, the images contain many different elements being the fishes just a small part of the image. Given the small amount of available training data this strategy gave a very low performance, very close to random classification. Therefore, we discarded this possibility.
  • Template matching: Most of the classes contain images in which the fishes are in similar positions. Thus, using several templates a number of fishes can be succesfully detected and cropped. However, this approach presents some important problems. Firstly, the test set does not necessarily contain images in which the fishes match any template, yielding the template matching useless. And secondly, in order to detect an acceptable number of fishes a big number of templates should be used for each class, every image should be compared against all the templates of all classes which takes an extremely long time.
  • Slidding window: The main idea is to sweep every image with a slidding window of different sizes and finding the probability for each frame of containinng a fish. The frame with the highest probability is then selected and stored.

After trying all these possibilities we concluded that the slidding window offered the best trade-off between performance and computation time, so we took this option. In order to implement this system a classifier is necessary, which means that we need training data. However, the training data provided by the kaggle competition is not adequate because it consists of whole images, not of frames or cropped fishes. For this reason we have need to modify this images in order to obtain cropped fishes.

To do this we firstly cut some images manually (20 of each class) and trained the slidding window with two classes, "fish" and "no fish". Then, we ran it on several of the original images and selected manually the ones that were well detected and feeded them to the "fish" class. The wrongly detected frames were given to the "no fish" class. We repeated this process iteratively many times, the performance being increased little by little every time. Nevertheless, this process was very time consuming, so to speed it up we used template matching. In the classes "LAG", "SHARK" and "DOL" the template matching was very effective. After a long process we obtained about 2500 images of cropped fishes and 8500 frames of "no fish". Some of the pieces of code used in this process, like the template matching, are not included in this or any other notebook since we were used as a mean to obtain the fish detector here presented.


In [251]:
import os
import glob
import time
from SimpleCV import *
import scipy
import numpy as np
import tensorflow as tf
import collections
import matplotlib.pyplot as plt
import cv2
import imutils
from skimage.transform import pyramid_gaussian
import argparse
import cv2
from scipy import ndimage
from scipy.ndimage import sum as ndi_sum
from subprocess import check_output
from skimage.transform import pyramid_gaussian
from sklearn.svm import SVC
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import log_loss
import matplotlib.pyplot as plt
%matplotlib inline

After importing the libraries we declare the functions that we need for the fish detection. As introduced above we need a slidding window and a classificator to determine the probability of a frame containing a fish.

For the slidding window we will sweep the image with a square capturing (not storing) the frames. Once the image is completely swept, it is resized smaller and swept again. The effect of this is that the image is swept with squares of different sizes. To do this we use the functions pyramid and sliding window.

For the classificator we extract the HOG features, which give a characterization of the image, and we give them to a SVM. The SVM has the advantage that is very fast, so given the very big amount of images to classify and the many frames that are extracted per image it is the best option.

Given the poor results obtained when using a single SVM distinguishing between "fish" and "no fish", we adopt the following strategy for classification. We build seven SVM with two classes each, "no fish", and one per class of fish (ALB, BET, DOL, LAG, SHARK, YFT), and one SVM for all of these classes convined. Then we select the frame that gives the highest probability for each class of fish, producing in this way six cropped images from the original image. Then, using the SVM of "fish" vs "no fish" we select out of the six selected frames the one that is the most similar to a fish. The code doing this will be seen later.


In [252]:
################################## Functions definition ###################################
#These functions are inspired from http://www.pyimagesearch.com/

def pyramid(image, scale=1.5, minSize=(30, 30)):
    # yield the original image
    yield image
 
    # keep looping over the pyramid
    while True:
        # compute the new dimensions of the image and resize it
        w = int(image.shape[1] / scale)
        image = imutils.resize(image, width=w)
 
        # if the resized image does not meet the supplied minimum
        # size, then stop constructing the pyramid
        if image.shape[0] < minSize[1] or image.shape[1] < minSize[0]:
            break
 
        # yield the next image in the pyramid
        yield image




def sliding_window(image, stepSize, windowSize):
    # slide a window across the image
    for y in xrange(0, image.shape[0], stepSize):
        for x in xrange(0, image.shape[1], stepSize):
            # yield the current window
            yield (x, y, image[y:y + windowSize[1], x:x + windowSize[0]])


# lenet 5
def findHOGFeatures(self, n_divs=3, n_bins=6):
    """
    **SUMMARY**
    Get HOG(Histogram of Oriented Gradients) features from the image.


    **PARAMETERS**
    * *n_divs* - the number of divisions(cells).
    * *n_divs* - the number of orientation bins.

    **RETURNS**
    Returns the HOG vector in a numpy array

    """
    n_HOG = n_divs * n_divs * n_bins  # Size of HOG vector

    HOG = np.zeros((n_HOG, 1))  # Initialize output HOG vector

    # Apply sobel on image to find x and y orientations of the image
    Icv = self.getNumpyCv2()
    Ix = cv2.Sobel(Icv, ddepth=cv.CV_32F, dx=1, dy=0, ksize=3)
    Iy = cv2.Sobel(Icv, ddepth=cv.CV_32F, dx=0, dy=1, ksize=3)

    Ix = Ix.transpose(1, 0, 2)
    Iy = Iy.transpose(1, 0, 2)
    cellx = self.width / n_divs  # width of each cell(division)
    celly = self.height / n_divs  # height of each cell(division)

    # Area of image
    img_area = self.height * self.width

    #Range of each bin
    BIN_RANGE = (2 * pi) / n_bins

    angles = np.arctan2(Iy, Ix)
    magnit = ((Ix ** 2) + (Iy ** 2)) ** 0.5

    height, width = self.height, self.width
    bins = (angles[...,0] % (2 * pi) / BIN_RANGE).astype(int)
    x, y = np.mgrid[:width, :height]
    x = x * n_divs // width
    y = y * n_divs // height
    labels = (x * n_divs + y) * n_bins + bins
    index = np.arange(n_HOG)
    HOG = ndi_sum(magnit[..., 0], labels, index)

    return HOG / (height*width)

The last definitions to be made are the constant parameters used along this notebook. Due to the unbalance amount of data, we oversample the classes (BET, DOL, LAG and SHARK).


In [253]:
#Define some values and constants
fish_classes = ['ALB','BET','LAG','DOL','SHARK','YFT','NoF']
fish_classes_test = ['Fish','NoFish']
number_classes = len(fish_classes)
main_path_train = '../train_cut_oversample'
main_path_test = '../test'
extension = "*.jpg"

Next, we generate the HOG arrays for all seven classifiers


In [255]:
############################## Get HOG of fish and No-fish cases ###################################

#One array per classifier
HOG = []
HOG_n = []
HOG_ALB = []
HOG_BET = []
HOG_DOL = []
HOG_LAG = []
HOG_SHARK = []
HOG_YFT = []


#Construct arrays
for classes in fish_classes:
    #Acces the files
    path_class = os.path.join(main_path_train,classes)
    directory = os.path.join(path_class, extension)
    files = glob.glob(directory)  
    for file in files:       
        new_img = cv2.imread(file)
        H = findHOGFeatures(Image(new_img))
        if classes != 'NoF':
            HOG.append(H)
            if classes == 'ALB':
                HOG_ALB.append(H)
            if classes == 'BET':
                HOG_BET.append(H)
            if classes == 'DOL':
                HOG_DOL.append(H)
            if classes == 'LAG':
                HOG_LAG.append(H)
            if classes == 'SHARK':
                HOG_SHARK.append(H)
            if classes == 'YFT':
                HOG_YFT.append(H)
        else:
            HOG_n.append(H)
            
HOG = np.array(HOG)
HOG_ALB = np.array(HOG_ALB)
HOG_BET = np.array(HOG_BET)
HOG_DOL = np.array(HOG_DOL)
HOG_LAG = np.array(HOG_LAG)
HOG_SHARK = np.array(HOG_SHARK)
HOG_YFT = np.array(HOG_YFT)
HOG_n = np.array(HOG_n)

#Print shapes of the arrays
print HOG.shape
print HOG_ALB.shape
print HOG_BET.shape
print HOG_DOL.shape
print HOG_LAG.shape
print HOG_SHARK.shape
print HOG_YFT.shape
print HOG_n.shape


(4640, 54)
(1316, 54)
(693, 54)
(660, 54)
(609, 54)
(585, 54)
(777, 54)
(8460, 54)

In [256]:
############################## Build and train the classifiers ###################################

#SVM with all classes against No Fish
X = np.concatenate((HOG, HOG_n),axis = 0)
class_one = np.ones(HOG.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_all = SVC(probability=True)
clf_all.fit(X, y) 

#SVM: ALB vs No Fish
X = np.concatenate((HOG_ALB, HOG_n),axis = 0)
class_one = np.ones(HOG_ALB.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_ALB = SVC(probability=True)
clf_ALB.fit(X,y)

#SVM: BET vs No Fish
X = np.concatenate((HOG_BET, HOG_n),axis = 0)
class_one = np.ones(HOG_BET.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_BET = SVC(probability=True)
clf_BET.fit(X,y)

#SVM: DOL vs No Fish
X = np.concatenate((HOG_DOL, HOG_n),axis = 0)
class_one = np.ones(HOG_DOL.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_DOL = SVC(probability=True)
clf_DOL.fit(X,y)

#SVM: LAG vs No Fish
X = np.concatenate((HOG_LAG, HOG_n),axis = 0)
class_one = np.ones(HOG_LAG.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_LAG = SVC(probability=True)
clf_LAG.fit(X,y)

#SVM: SHARK vs No Fish
X = np.concatenate((HOG_SHARK, HOG_n),axis = 0)
class_one = np.ones(HOG_SHARK.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_SHARK = SVC(probability=True)
clf_SHARK.fit(X,y)

#SVM: YFT vs No Fish
X = np.concatenate((HOG_YFT, HOG_n),axis = 0)
class_one = np.ones(HOG_YFT.shape[0])
class_zero = np.zeros(HOG_n.shape[0])
y = np.concatenate((class_one, class_zero), axis=0)

clf_YFT = SVC(probability=True)
clf_YFT.fit(X,y)


Out[256]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

As already explained, the output of the following code are six frames per image, stored in a folder called "buffer". The fact that we have the test data organized in classes does not influence neither de detection nor the classification, it just helps us to check the results.


In [271]:
###################################### Apply 6 classifiers (buffer) ##################################


(winW, winH) = (100, 100)

#Apply classifier on test
directory = os.path.join(main_path_test, extension)
files = glob.glob(directory)
extension = "*.jpg"
for classes in fish_classes:
    path_class = os.path.join(main_path_test,classes)
    
    directory = os.path.join(path_class, extension)
    files = glob.glob(directory)

    for file in files:
        image = cv2.imread(file)
        prob_ALB = 0
        prob_BET = 0
        prob_DOL = 0
        prob_LAG = 0
        prob_SHARK = 0
        prob_YFT = 0
        
        # loop over the image pyramid
        for resized in pyramid(image, scale=1.5):
            # loop over the sliding window for each layer of the pyramid
            for (x, y, window) in sliding_window(resized, stepSize=64, windowSize=(winW, winH)):
                # if the window does not meet our desired window size, ignore it
                if window.shape[0] != winH or window.shape[1] != winW:
                    continue
                H = findHOGFeatures(Image(window))
                
                #Predict probability for each class
                p_ALB = clf_ALB.predict_proba([H])
                p_BET = clf_BET.predict_proba([H])
                p_DOL = clf_DOL.predict_proba([H])
                p_LAG = clf_LAG.predict_proba([H])
                p_SHARK = clf_SHARK.predict_proba([H])
                p_YFT = clf_YFT.predict_proba([H])
                 
                #Store frame with the highest probability per class
                if prob_ALB < p_ALB[0,1]:
                    prob_ALB = p_ALB[0,1]
                    wind_ALB = window
                if prob_BET< p_BET[0,1]:
                    prob_BET = p_BET[0,1]
                    wind_BET = window
                if prob_DOL<p_DOL[0,1]:
                    prob_DOL = p_DOL[0,1]
                    wind_DOL = window
                if prob_LAG<p_LAG[0,1]:
                    prob_LAG = p_LAG[0,1]
                    wind_LAG = window
                if prob_SHARK<p_SHARK[0,1]:
                    prob_SHARK = p_SHARK[0,1]
                    wind_SHARK = window
                if prob_YFT<p_YFT[0,1]:
                    prob_YFT = p_YFT[0,1]
                    wind_YFT = window
                                           
        j = 0
        for wind in [wind_ALB,wind_BET,wind_DOL,wind_LAG,wind_SHARK,wind_YFT] :   
            f = str(os.path.basename(file))
            cv2.imwrite("buffer/"+str(classes)+"/"+f[:-4]+"_"+str(j)+"0.jpg", wind)
            j = j+1

Finally, we apply the SVM "Fish" vs "No Fish" in order to select the image which is the most similar to a fish and we store in a folder called "fish_detected"


In [272]:
###################################### Apply 1 classifier (fish_detected) ##################################

#from PIL import Image
path = "buffer/"
extension2 = "*_00.jpg"
nam = ""
directory = os.path.join(path, extension)
files = glob.glob(directory)

for classes in fish_classes:
    #Access folders
    path_class = os.path.join(path,classes)    
    directory = os.path.join(path_class, extension2)
    files = glob.glob(directory)
    for file in files:
        prob_fish = 0
        f = str(os.path.basename(file))
        #Access files
        ext = f[:-6]+"*.jpg"
        direct = os.path.join(path_class, ext)
        for name in glob.glob(direct):
            #Open image
            img = cv2.imread(name)
            if img.shape == (100,100,3):        #Check that the image generated by the slidding window has the right size
                #Predict probabilities
                H = findHOGFeatures(Image(img))
                aux = clf_all.predict_proba([H])
                #Store highest probability frame
                if prob_fish < aux[0,1]:
                    prob_fish = aux[0,1]                   
                    img = np.reshape(img, (ROWS_RESIZE, COLS_RESIZE,3))
                    img_save = img
                    nam = name              
        #Save frame
        cv2.imwrite("fish_detected/"+str(classes)+"/"+str(os.path.basename(nam)), img_save)

This classifier has been applied on a test set. To determine if a fish is correctly detected we need to check manually all the images that have been stored in "fish_detected". The results are shown in the following table.

Class ALB BET DOL LAG SHARK YFT
Test 115 95 76 51 81 107
Detect 17 13 28 18 28 32
% 14.78% 13.68% 36.84% 35.25% 34.57% 29.91%
TOTAL SAMPLES 525
TOTAL DETECTED 136
TOTAL % 25.90%

Given the complexity of the prbblem and the difficult nature of the images, we consider a detection accuracy of 25.9% as a very good result.


In [ ]: