Deep learning using fastai library

(https://github.com/fastai/courses)

This is a beginner level code (can be learnt in 2 classes of deep learning taught by Jeremy Howard) and we can get decent results by only tuning learning rate and training until model overfits.

I have used pretrained resnet18 model based on Imagenet data for this

Steps to use fastai library --

git clone https://github.com/fastai/fastai
cd fastai
conda create -n fastai python=3.6 anaconda
conda env update
source activate fastai

This kernel is specifically is for Beginners who want's to experiment building CNN using fastai (on the top of pytorch). By using this kernel, you can expect to get good score and also learn fastai. Fastai has made building deep neural networks very easy.



In [55]:

    
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline



In [56]:

    
# This is required to make fastai library work on ec2-user 
# fastai library is not yet available using pip install
# pull it from github using below link
# https://github.com/fastai/courses

import sys
sys.path.append("/home/ec2-user/data/fastai/")



In [57]:

    
# This file contains all the main external libs we'll use
import numpy as np
import pandas as pd
from fastai.imports import *
from sklearn.model_selection import train_test_split

from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *



In [58]:

    
#! ls /home/ec2-user/data/data/processed



In [59]:

    
path = "/home/ec2-user/data/data/processed/"



In [60]:

    
train = pd.read_json(f'{path}train.json')



In [61]:

    
test = pd.read_json(f'{path}test.json')



In [62]:

    
train[:2]









    Out[62]:







  
    
      
      band_1
      band_2
      id
      inc_angle
      is_iceberg
    
  
  
    
      0
      [-27.878360999999998, -27.15416, -28.668615, -...
      [-27.154118, -29.537888, -31.0306, -32.190483,...
      dfd5f913
      43.9239
      0
    
    
      1
      [-12.242375, -14.920304999999999, -14.920363, ...
      [-31.506321, -27.984554, -26.645678, -23.76760...
      e25388fd
      38.1562
      0



In [63]:

    
len(train.iloc[4][1])









    Out[63]:





5625



In [64]:

    
train.inc_angle = train.inc_angle.apply(lambda x: np.nan if x == 'na' else x)
test.inc_angle = test.inc_angle.apply(lambda x: np.nan if x == 'na' else x)



In [65]:

    
img1 = train.loc[0,['band_1','band_2']]



In [66]:

    
img1









    Out[66]:





band_1    [-27.878360999999998, -27.15416, -28.668615, -...
band_2    [-27.154118, -29.537888, -31.0306, -32.190483,...
Name: 0, dtype: object



In [67]:

    
img1 = np.stack([img1['band_1'], img1['band_2']], -1).reshape(75,75,2)

Below picture is not an iceberg



In [68]:

    
plt.imshow(img1[:,:,1])









    Out[68]:





<matplotlib.image.AxesImage at 0x7f3b7cedcb38>

Get rgb of image using color composite function

Thanks to MadScientist for color composite. Here is the kernal -- https://www.kaggle.com/keremt/getting-color-composites



In [69]:

    
def color_composite(data):
    rgb_arrays = []
    for i, row in data.iterrows():
        band_1 = np.array(row['band_1']).reshape(75, 75)
        band_2 = np.array(row['band_2']).reshape(75, 75)
        band_3 = band_1 / band_2

        r = (band_1 + abs(band_1.min())) / np.max((band_1 + abs(band_1.min())))
        g = (band_2 + abs(band_2.min())) / np.max((band_2 + abs(band_2.min())))
        b = (band_3 + abs(band_3.min())) / np.max((band_3 + abs(band_3.min())))
        
#         r = ((band_1 - np.mean(band_1)) / (np.max(band_1) - np.min(band_1))) 
#         g = ((band_2 - np.mean(band_2)) / (np.max(band_2) - np.min(band_2))) 
#         b = ((band_3 - np.mean(band_3)) / (np.max(band_3) - np.min(band_3)))

        rgb = np.dstack((r, g, b))
        rgb_arrays.append(rgb)
    return np.array(rgb_arrays)



In [70]:

    
# Trained with data about rgb
rgb_train = color_composite(train)
rgb_train.shape









    Out[70]:





(1604, 75, 75, 3)



In [71]:

    
# Test with data about rgb
rgb_test = color_composite(test)
rgb_test.shape









    Out[71]:





(8424, 75, 75, 3)

Exploring images before training CNN model



In [72]:

    
# look at random ships
print('Looking at random ships')
ships = np.random.choice(np.where(train.is_iceberg ==0)[0], 9)
fig = plt.figure(1,figsize=(12,12))
for i in range(9):
    ax = fig.add_subplot(3,3,i+1)
    arr = rgb_train[ships[i], :, :]
    ax.imshow(arr)
    
plt.show()









    



Looking at random ships



In [73]:

    
# look at random iceberges
print('Looking at random icebergs')
ice = np.random.choice(np.where(train.is_iceberg ==1)[0], 9)
fig = plt.figure(200,figsize=(12,12))
for i in range(9):
    ax = fig.add_subplot(3,3,i+1)
    arr = rgb_train[ice[i], :, :]
    ax.imshow(arr)
    
plt.show()









    



Looking at random icebergs

Observation from images --

• Ships have a trace of bright lights around them which will be taken as a feature in CNN
• Ships are more consistent with shapes
• Icebergs shape vary more than ships

saving images in directories (train, valid, test)



In [75]:

    
# # making directories for training resnet (as it need files to be in right dir)

os.makedirs(f'{path}/composites', exist_ok= True)
os.makedirs(f'{path}/composites/train', exist_ok=True)
os.makedirs(f'{path}/composites/valid', exist_ok=True)
os.makedirs(f'{path}/composites/test', exist_ok=True)

dir_list = [f'{path}/composites/train', f'{path}/composites/valid']

for i in dir_list:
    os.makedirs(f'{i}/ship')
    os.makedirs(f'{i}/iceberg')

The reason of converting these images to .png is that the pretrained ConvLearner that I am going to call takes image as input



In [ ]:

    
# split
train_y, valid_y = train_test_split(train.is_iceberg, test_size=0.10)

train_iceberg_index, train_ship_index, valid_iceberg_index, valid_ship_index  = train_y[train_y==1].index, train_y[train_y==0].index, valid_y[valid_y==1].index, valid_y[valid_y==0].index


#save train images
for idx in train_iceberg_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/train/iceberg/' + str(idx) + '.png',  img)

for idx in train_ship_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/train/ship/' + str(idx) + '.png',  img)    
    
#save valid images
for idx in valid_iceberg_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/valid/iceberg/' + str(idx) + '.png',  img)

for idx in valid_ship_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/valid/ship/' + str(idx) + '.png',  img)

#save test images
for idx in range(len(test)):
    img = rgb_test[idx]
    plt.imsave(f'{path}/composites/test/' + str(idx) + '.png',  img)

let's check directory where files are saved. Ok we have train, test and valid directories



In [ ]:

    
! ls /home/ec2-user/data/data/processed/composites/

I can start from here from next time

let's train first resnet model using fastai



In [76]:

    
path2 = '/home/ec2-user/data/data/processed/composites/'

let's look at a random ships now (from png) to make sure images are saved in directory



In [77]:

    
files = !ls {path2}valid/ship | head

img = plt.imread(f'{path2}valid/ship/{files[0]}')
plt.imshow(img)









    Out[77]:





<matplotlib.image.AxesImage at 0x7f39c42e9f28>



In [78]:

    
! ls {path2}









    



models	test  tmp  train  valid



In [ ]:

    
# only run this if face an error
#! rm -f -R {path2}tmp

Finding learning rate using lr finder

One of the most sensitive hyperparameter for deep learning is learning rate. Finding a good learning rate is most important step to make a good model (without over or underfitting)



In [79]:

    
def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_top_down, max_zoom=1.00)
    data = ImageClassifierData.from_paths(path2, test_name = 'test', bs = bs,
                                          tfms = tfms)
    return data



In [80]:

    
arch=resnet18 
sz = 75  # because our image size is 75*75
bs = 32 # because default batch size of 64 was not giving good converging loss



In [81]:

    
data = get_data(sz, bs)



In [82]:

    
data = data.resize(int(sz*1.3), 'tmp')



In [83]:

    
learn = ConvLearner.pretrained(arch, data, precompute=False)



In [84]:

    
lrf = learn.lr_find()









    





 
 










    



 89%|████████▉ | 41/46 [00:04<00:00,  9.53it/s, loss=3.54]



In [85]:

    
learn.sched.plot_lr()



In [86]:

    
learn.sched.plot()

Now, this plot is important to decide a good learning rate. We will not decide the learning rate at the lowest loss, which might sound confusing. But the catch is that we are going to do differential annealing to activate our layers which take varying learning rates. And the rate we chose here is going to be the maximum rate. So, we will chose some rate just before it bottoms up where loss is still falling.

Minimum is at 10^-1, I would chose 10^-2 is LR

Stochastic descent with restart



In [87]:

    
lr = 0.01

epochs = 3
cycle length = 1
cycle multiple = 2



In [88]:

    
learn.fit(lr, 3, cycle_len=1, cycle_mult=2) # precompute was false # first fit









    





 
 










    



[ 0.       0.39146  0.384    0.80729]                      
[ 1.       0.49517  0.36757  0.82812]                      
[ 2.       0.47401  0.36057  0.82292]                      
[ 3.       0.46928  0.38627  0.84896]                      
[ 4.       0.47741  0.33864  0.84896]                      
[ 5.       0.45769  0.34478  0.82812]                      
[ 6.       0.44052  0.34427  0.83333]



In [89]:

    
# stochastic descent with restart
learn.sched.plot_lr()



In [90]:

    
learn.sched.plot_loss()

Loss is decreasing with increaseing iterations



In [ ]:

    
#learn.save('/home/ec2-user/data/data/processed/75_lastlayer_res34_bs28')



In [ ]:

    
#learn.load('/home/ec2-user/data/data/processed/75_lastlayer')

Fine tuning and differential annealing

Since, the images that we have do not have very clear and sharp features which might be present in top layers of our pre-trained neural network. Let's unfreeze our layers and calculate activations



In [91]:

    
lr = 0.01



In [92]:

    
learn.unfreeze()



In [93]:

    
lrs=np.array([lr/9,lr/3,lr]) # use lr/100 and lr/10 respectively if images would have been larger in sizes



In [94]:

    
learn.fit(lrs, 3, cycle_len=1, cycle_mult=2)









    





 
 










    



[ 0.       0.29903  0.40509  0.8125 ]                      
[ 1.       0.41049  0.50269  0.68229]                      
[ 2.       0.4371   0.3768   0.81771]                      
[ 3.       0.43599  0.35945  0.83333]                      
[ 4.       0.42319  0.26291  0.90104]                      
[ 5.       0.3668   0.27012  0.90104]                      
[ 6.       0.32801  0.26606  0.90625]



In [95]:

    
learn.sched.plot_loss()



In [96]:

    
! ls /home/ec2-user/data/









    



data	   fastai	    submissions  test.json.7z
DL models  pred_feather_rf  test	 train.json.7z



In [ ]:

    
#learn.save('/home/ec2-user/data/data/processed/75_all_res18_bs24_4') # save learn

Train again

Now let's train again using same learning rate until we start over-fitting.

I ran below code 2-3 times to reach optimum validation score.



In [ ]:

    
learn.fit(lr, 3, cycle_len=1, cycle_mult=2) # precompute was false # first fit



In [ ]:

    
learn.sched.plot_loss()



In [ ]:

    
#learn.save('/home/ec2-user/data/data/processed/75_all_res18_bs32_2') # save learn



In [ ]:

    
#learn.load('/home/ec2-user/data/data/processed/75_all_res18_bs32_2')



In [104]:

    
# to check validation accuray

log_preds,y = learn.TTA()
accuracy(log_preds,y)

TTA (Test Time Augmentation) simply makes predictions not just on the images in your validation set, but also makes predictions on a number of randomly augmented versions of them

Predictions



In [98]:

    
# from here we know that 'icebergs' is label 0 and 'ships' is label 1.
data.classes









    Out[98]:





['iceberg', 'ship']



In [99]:

    
# this gives prediction for validation set. Predictions are in log scale
log_preds = learn.TTA(is_test=True) # If need TTA
#log_preds = learn.predict(is_test=True) # if don't need TTA
log_preds[0].shape









    Out[99]:





(8424, 2)



In [100]:

    
probs_submit1 = np.exp(log_preds[0][:,0]) 
probs_submit1[:10]









    Out[100]:





array([ 0.92662,  0.20084,  0.04377,  0.03495,  0.00351,  0.00174,  0.00412,  0.98356,  0.03753,  0.88934], dtype=float32)



In [101]:

    
# getting ids from test list
id_raw = data.test_dl.dataset.fnames
id_raw[1]


# using regex to take numbers from ids
id_pro = result_array = np.empty((0, len(id_raw)))
for i in range(len(id_raw)):
    stuff = int(re.findall(r'\d+', id_raw[i])[0])
    #print(type(str(stuff)))
    id_pro = np.append(id_pro,int(stuff))



In [102]:

    
id_pro_list = []
for i in id_pro:
    id_pro_list.append(int(i))
id_pro_list[:4]









    Out[102]:





[3734, 1584, 6952, 2286]



In [103]:

    
# joining id and probability
d = {'index': id_pro_list, 'is_iceberg': probs_submit1}
submit1_df = pd.DataFrame(data=d)



In [ ]:

    
id_ = test['id']
id_pd = pd.DataFrame({'id': id_} )



In [ ]:

    
submit1_df_sorted = submit1_df.sort_values('index')
submit1_df_sorted2 = pd.concat([id_, submit1_df_sorted.set_index('index')], axis = 1)



In [ ]:

    
submit1_df_sorted2.dtypes



In [ ]:

    
submit1_df.dtypes



In [ ]:

    
#submit1_df.id = submit1_df.id.astype(str)



In [ ]:

    
submit1_df_sorted2.to_csv('/home/ec2-user/data/submissions/submit6.csv', index = False)



In [ ]:

    
! head -5 /home/ec2-user/data/submissions/submit7.csv



In [ ]:

    
submit_check = pd.read_csv("/home/ec2-user/data/submissions/submit6.csv")



In [ ]:

    
#submit_check.dtypes



In [ ]:

    
submit_check[:10]

scp submit1 to local machine to upload on kaggle

Post analysis

Looking at correctly, incorrectly classified images

Analyzing results: looking at pictures

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:

A few correct labels at random
A few incorrect labels at random
The most correct labels of each class (ie those with highest probability that are correct)
The most incorrect labels of each class (ie those with highest probability that are incorrect)
The most uncertain labels (ie those with probability closest to 0.5).



In [158]:

    
log_pred1 = learn.TTA() # If need TTA

preds = np.argmax(log_pred1[0], axis=1)  # from log probabilities to 0 or 1
probs = np.exp(log_pred1[0][:,1])        # pr(ship)

#probs = submit_check['is_iceberg']
#submit_check['C'] = np.where(submit_check['is_iceberg'] >= 0.5,1, 0)
#preds = np.array(submit_check['C'])



In [159]:

    
def rand_by_mask(mask): return np.random.choice(np.where(mask)[0], 4, replace=False)
def rand_by_correct(is_correct): return rand_by_mask((preds == data.val_y)==is_correct)



In [160]:

    
def plot_val_with_title(idxs, title):
    imgs = np.stack([data.val_ds[x][0] for x in idxs])
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(data.val_ds.denorm(imgs), rows=1, titles=title_probs)



In [161]:

    
def plots(ims, figsize=(12,6), rows=1, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])



In [162]:

    
def load_img_id(ds, idx): return np.array(PIL.Image.open(path2+ds.fnames[idx]))

def plot_val_with_title(idxs, title):
    imgs = [load_img_id(data.val_ds,x) for x in idxs]
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(imgs, rows=1, titles=title_probs, figsize=(16,8))



In [163]:

    
# 1. A few correct labels at random
plot_val_with_title(rand_by_correct(True), "Correctly classified")









    



Correctly classified



In [164]:

    
# 2. A few incorrect labels at random
plot_val_with_title(rand_by_correct(False), "Incorrectly classified")









    



Incorrectly classified



In [112]:

    
def most_by_mask(mask, mult):
    idxs = np.where(mask)[0]
    return idxs[np.argsort(mult * probs[idxs])[:4]]

def most_by_correct(y, is_correct): 
    mult = -1 if (y==1)==is_correct else 1
    return most_by_mask((preds == data.val_y)==is_correct & (data.val_y == y), mult)



In [165]:

    
plot_val_with_title(most_by_correct(0, True), "Most correct icebergs")









    



Most correct icebergs



In [166]:

    
plot_val_with_title(most_by_correct(1, True), "Most correct ships")









    



Most correct ships



In [ ]:

	band_1	band_2	id	inc_angle	is_iceberg
0	[-27.878360999999998, -27.15416, -28.668615, -...	[-27.154118, -29.537888, -31.0306, -32.190483,...	dfd5f913	43.9239	0
1	[-12.242375, -14.920304999999999, -14.920363, ...	[-31.506321, -27.984554, -26.645678, -23.76760...	e25388fd	38.1562	0