Over sampling

It has been seen that there are very few sample images for some of the species. For example, for LAG there are only 67, 117 for DOL or 176 for SHARK. Since we have a deep neural network, we will need a lot of training images to avoid overfitting. Therefore it is crucial to do a good oversampling. Here we work with the cropped images of the fishes.

Instead of just making copies of the images, we modify them slightly each time, so they are different and they don't overfit the classifier too much. We use two ways of modifying: rotation and flipping. We could also use others like stretching or changin the brighness. For rotation we rotate the image either 0, 90, 180 or 270 degrees. We don't rotate any other angle because we could either lose information (if we keep the same size we would lose the corners) or add more white pixels like in the image resizing that is explained in other notebook (by doing the image bigger). For flipping we flip the image either horizontally or vertically. Each iteration one operation is chosen randomly and performed on the image.

There are two ways of setting the number of copies. The first one is by manually writing the number for each class. The second one is based in that what we want is to have the same certain number of images per class. Therefore we pass this number and depending on the amont of images of each class it will automatically compute the number of copies of each image.

This notebook was very handy to create all the training data for the classifier.


In [ ]:
import numpy as np
import scipy
import os
from SimpleCV import *
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
from skimage import color
from PIL import Image
import imutils

In [6]:
def choose_operation():
    operations = ['rotation','flipping']
    operation = np.random.randint(0,len(operations))
    return operations[operation]

def rotation(image):
    angle = np.random.choice((0,90,180,270))
    image = imutils.rotate_bound(image,angle)
    return image

def flipping(image):
    flip = np.random.randint(0,2)
    if flip == 0:
        image = cv2.flip(image,0)
    else:
        image = cv2.flip(image,1)            
    return image


def determine_copies(classes): 
    if classes == 'ALB':
        copies = 2
    if classes == 'BET':
        copies = 10
    if classes == 'DOL':
        copies = 50
    if classes == 'LAG':
        copies = 54
    if classes == 'SHARK':
        copies = 26
    if classes == 'YFT':
        copies = 3
    return copies
    
    

fish_classes = ['ALB','LAG','DOL','SHARK','YFT','BET']
number_classes = len(fish_classes)
number_final = 600
main_load_path = '../train_cut'
main_save_path = '../train_cut_oversample'

for classes in fish_classes:
    path_class = os.path.join(main_load_path,classes)
    if classes=='ALB' or classes=='BET' or classes=='YFT':
        save_path = os.path.join(main_save_path,'TUNA')
    else:
        save_path = os.path.join(main_save_path,classes)  
    
    if not os.path.exists(save_path):
            os.makedirs(save_path)
    number_original = float(len(os.listdir(path_class)))
    copies_per_image = determine_copies(classes)
    #copies_per_image = int(round(number_final/number_original))
    for image in os.listdir(path_class):
        path = os.path.join(path_class,image)
        im_initial = cv2.imread(path)
        for iteration in range(0,copies_per_image):
            name = str(np.random.randn()) + '_'+str(iteration)+ '.jpg'
            if iteration == 0:
                im = Image.fromarray(im_initial)
                im.save(save_path+'/'+name)
            else:
                operation = choose_operation()
                
                if operation == 'rotation':
                    im = np.array(im_initial)
                    im = rotation(im)
                    name = str(np.random.randn()) + '_'+str(iteration)+ '.jpg'
                if operation == 'flipping':
                    im = np.array(im_initial)
                    im = flipping(im)
                    name = str(np.random.randn()) + '_'+str(iteration)+ '.jpg'
            im = np.array(im)
            im = cv2.cvtColor(im,cv2.COLOR_BGR2RGB)
            im = Image.fromarray(im)
            
            im.save(save_path+'/'+name)
    print(classes+' finished')


ALB finished
LAG finished
DOL finished
SHARK finished
YFT finished
BET finished