Deep learning using fastai library

(https://github.com/fastai/courses)

This is a beginner level code (can be learnt in 2 classes of deep learning taught by Jeremy Howard) and we can get decent results by only tuning learning rate and training until model overfits.

I have used pretrained resnet18 model based on Imagenet data for this

Steps to use fastai library --

  1. git clone https://github.com/fastai/fastai
  2. cd fastai
  3. conda create -n fastai python=3.6 anaconda
  4. conda env update
  5. source activate fastai

This kernel is specifically is for Beginners who want's to experiment building CNN using fastai (on the top of pytorch). By using this kernel, you can expect to get good score and also learn fastai. Fastai has made building deep neural networks very easy.


In [55]:
# Put these at the top of every notebook, to get automatic reloading and inline plotting
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [56]:
# This is required to make fastai library work on ec2-user 
# fastai library is not yet available using pip install
# pull it from github using below link
# https://github.com/fastai/courses

import sys
sys.path.append("/home/ec2-user/data/fastai/")

In [57]:
# This file contains all the main external libs we'll use
import numpy as np
import pandas as pd
from fastai.imports import *
from sklearn.model_selection import train_test_split

from fastai.imports import *
from fastai.transforms import *
from fastai.conv_learner import *
from fastai.model import *
from fastai.dataset import *
from fastai.sgdr import *
from fastai.plots import *

In [58]:
#! ls /home/ec2-user/data/data/processed

In [59]:
path = "/home/ec2-user/data/data/processed/"

In [60]:
train = pd.read_json(f'{path}train.json')

In [61]:
test = pd.read_json(f'{path}test.json')

In [62]:
train[:2]


Out[62]:
band_1 band_2 id inc_angle is_iceberg
0 [-27.878360999999998, -27.15416, -28.668615, -... [-27.154118, -29.537888, -31.0306, -32.190483,... dfd5f913 43.9239 0
1 [-12.242375, -14.920304999999999, -14.920363, ... [-31.506321, -27.984554, -26.645678, -23.76760... e25388fd 38.1562 0

In [63]:
len(train.iloc[4][1])


Out[63]:
5625

In [64]:
train.inc_angle = train.inc_angle.apply(lambda x: np.nan if x == 'na' else x)
test.inc_angle = test.inc_angle.apply(lambda x: np.nan if x == 'na' else x)

In [65]:
img1 = train.loc[0,['band_1','band_2']]

In [66]:
img1


Out[66]:
band_1    [-27.878360999999998, -27.15416, -28.668615, -...
band_2    [-27.154118, -29.537888, -31.0306, -32.190483,...
Name: 0, dtype: object

In [67]:
img1 = np.stack([img1['band_1'], img1['band_2']], -1).reshape(75,75,2)

Below picture is not an iceberg


In [68]:
plt.imshow(img1[:,:,1])


Out[68]:
<matplotlib.image.AxesImage at 0x7f3b7cedcb38>

Get rgb of image using color composite function

Thanks to MadScientist for color composite. Here is the kernal -- https://www.kaggle.com/keremt/getting-color-composites


In [69]:
def color_composite(data):
    rgb_arrays = []
    for i, row in data.iterrows():
        band_1 = np.array(row['band_1']).reshape(75, 75)
        band_2 = np.array(row['band_2']).reshape(75, 75)
        band_3 = band_1 / band_2

        r = (band_1 + abs(band_1.min())) / np.max((band_1 + abs(band_1.min())))
        g = (band_2 + abs(band_2.min())) / np.max((band_2 + abs(band_2.min())))
        b = (band_3 + abs(band_3.min())) / np.max((band_3 + abs(band_3.min())))
        
#         r = ((band_1 - np.mean(band_1)) / (np.max(band_1) - np.min(band_1))) 
#         g = ((band_2 - np.mean(band_2)) / (np.max(band_2) - np.min(band_2))) 
#         b = ((band_3 - np.mean(band_3)) / (np.max(band_3) - np.min(band_3)))

        rgb = np.dstack((r, g, b))
        rgb_arrays.append(rgb)
    return np.array(rgb_arrays)

In [70]:
# Trained with data about rgb
rgb_train = color_composite(train)
rgb_train.shape


Out[70]:
(1604, 75, 75, 3)

In [71]:
# Test with data about rgb
rgb_test = color_composite(test)
rgb_test.shape


Out[71]:
(8424, 75, 75, 3)

Exploring images before training CNN model


In [72]:
# look at random ships
print('Looking at random ships')
ships = np.random.choice(np.where(train.is_iceberg ==0)[0], 9)
fig = plt.figure(1,figsize=(12,12))
for i in range(9):
    ax = fig.add_subplot(3,3,i+1)
    arr = rgb_train[ships[i], :, :]
    ax.imshow(arr)
    
plt.show()


Looking at random ships

In [73]:
# look at random iceberges
print('Looking at random icebergs')
ice = np.random.choice(np.where(train.is_iceberg ==1)[0], 9)
fig = plt.figure(200,figsize=(12,12))
for i in range(9):
    ax = fig.add_subplot(3,3,i+1)
    arr = rgb_train[ice[i], :, :]
    ax.imshow(arr)
    
plt.show()


Looking at random icebergs

Observation from images --

• Ships have a trace of bright lights around them which will be taken as a feature in CNN
• Ships are more consistent with shapes
• Icebergs shape vary more than ships

saving images in directories (train, valid, test)


In [75]:
# # making directories for training resnet (as it need files to be in right dir)

os.makedirs(f'{path}/composites', exist_ok= True)
os.makedirs(f'{path}/composites/train', exist_ok=True)
os.makedirs(f'{path}/composites/valid', exist_ok=True)
os.makedirs(f'{path}/composites/test', exist_ok=True)

dir_list = [f'{path}/composites/train', f'{path}/composites/valid']

for i in dir_list:
    os.makedirs(f'{i}/ship')
    os.makedirs(f'{i}/iceberg')

The reason of converting these images to .png is that the pretrained ConvLearner that I am going to call takes image as input


In [ ]:
# split
train_y, valid_y = train_test_split(train.is_iceberg, test_size=0.10)

train_iceberg_index, train_ship_index, valid_iceberg_index, valid_ship_index  = train_y[train_y==1].index, train_y[train_y==0].index, valid_y[valid_y==1].index, valid_y[valid_y==0].index


#save train images
for idx in train_iceberg_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/train/iceberg/' + str(idx) + '.png',  img)

for idx in train_ship_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/train/ship/' + str(idx) + '.png',  img)    
    
#save valid images
for idx in valid_iceberg_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/valid/iceberg/' + str(idx) + '.png',  img)

for idx in valid_ship_index:
    img = rgb_train[idx]
    plt.imsave(f'{path}/composites/valid/ship/' + str(idx) + '.png',  img)

#save test images
for idx in range(len(test)):
    img = rgb_test[idx]
    plt.imsave(f'{path}/composites/test/' + str(idx) + '.png',  img)

let's check directory where files are saved. Ok we have train, test and valid directories


In [ ]:
! ls /home/ec2-user/data/data/processed/composites/

I can start from here from next time

let's train first resnet model using fastai


In [76]:
path2 = '/home/ec2-user/data/data/processed/composites/'

let's look at a random ships now (from png) to make sure images are saved in directory


In [77]:
files = !ls {path2}valid/ship | head

img = plt.imread(f'{path2}valid/ship/{files[0]}')
plt.imshow(img)


Out[77]:
<matplotlib.image.AxesImage at 0x7f39c42e9f28>

In [78]:
! ls {path2}


models	test  tmp  train  valid

In [ ]:
# only run this if face an error
#! rm -f -R {path2}tmp

Finding learning rate using lr finder

One of the most sensitive hyperparameter for deep learning is learning rate. Finding a good learning rate is most important step to make a good model (without over or underfitting)


In [79]:
def get_data(sz, bs):
    tfms = tfms_from_model(arch, sz, aug_tfms=transforms_top_down, max_zoom=1.00)
    data = ImageClassifierData.from_paths(path2, test_name = 'test', bs = bs,
                                          tfms = tfms)
    return data

In [80]:
arch=resnet18 
sz = 75  # because our image size is 75*75
bs = 32 # because default batch size of 64 was not giving good converging loss

In [81]:
data = get_data(sz, bs)

In [82]:
data = data.resize(int(sz*1.3), 'tmp')




In [83]:
learn = ConvLearner.pretrained(arch, data, precompute=False)

In [84]:
lrf = learn.lr_find()


 89%|████████▉ | 41/46 [00:04<00:00,  9.53it/s, loss=3.54] 
                                                          

In [85]:
learn.sched.plot_lr()



In [86]:
learn.sched.plot()


Now, this plot is important to decide a good learning rate. We will not decide the learning rate at the lowest loss, which might sound confusing. But the catch is that we are going to do differential annealing to activate our layers which take varying learning rates. And the rate we chose here is going to be the maximum rate. So, we will chose some rate just before it bottoms up where loss is still falling.

Minimum is at 10^-1, I would chose 10^-2 is LR

Stochastic descent with restart


In [87]:
lr = 0.01

epochs = 3
cycle length = 1
cycle multiple = 2


In [88]:
learn.fit(lr, 3, cycle_len=1, cycle_mult=2) # precompute was false # first fit


[ 0.       0.39146  0.384    0.80729]                      
[ 1.       0.49517  0.36757  0.82812]                      
[ 2.       0.47401  0.36057  0.82292]                      
[ 3.       0.46928  0.38627  0.84896]                      
[ 4.       0.47741  0.33864  0.84896]                      
[ 5.       0.45769  0.34478  0.82812]                      
[ 6.       0.44052  0.34427  0.83333]                      


In [89]:
# stochastic descent with restart
learn.sched.plot_lr()



In [90]:
learn.sched.plot_loss()


Loss is decreasing with increaseing iterations


In [ ]:
#learn.save('/home/ec2-user/data/data/processed/75_lastlayer_res34_bs28')

In [ ]:
#learn.load('/home/ec2-user/data/data/processed/75_lastlayer')

Fine tuning and differential annealing

Since, the images that we have do not have very clear and sharp features which might be present in top layers of our pre-trained neural network. Let's unfreeze our layers and calculate activations


In [91]:
lr = 0.01

In [92]:
learn.unfreeze()

In [93]:
lrs=np.array([lr/9,lr/3,lr]) # use lr/100 and lr/10 respectively if images would have been larger in sizes

In [94]:
learn.fit(lrs, 3, cycle_len=1, cycle_mult=2)


[ 0.       0.29903  0.40509  0.8125 ]                      
[ 1.       0.41049  0.50269  0.68229]                      
[ 2.       0.4371   0.3768   0.81771]                      
[ 3.       0.43599  0.35945  0.83333]                      
[ 4.       0.42319  0.26291  0.90104]                      
[ 5.       0.3668   0.27012  0.90104]                      
[ 6.       0.32801  0.26606  0.90625]                      


In [95]:
learn.sched.plot_loss()



In [96]:
! ls /home/ec2-user/data/


data	   fastai	    submissions  test.json.7z
DL models  pred_feather_rf  test	 train.json.7z

In [ ]:
#learn.save('/home/ec2-user/data/data/processed/75_all_res18_bs24_4') # save learn

Train again

Now let's train again using same learning rate until we start over-fitting.

I ran below code 2-3 times to reach optimum validation score.


In [ ]:
learn.fit(lr, 3, cycle_len=1, cycle_mult=2) # precompute was false # first fit

In [ ]:
learn.sched.plot_loss()

In [ ]:
#learn.save('/home/ec2-user/data/data/processed/75_all_res18_bs32_2') # save learn

In [ ]:
#learn.load('/home/ec2-user/data/data/processed/75_all_res18_bs32_2')

In [104]:
# to check validation accuray

log_preds,y = learn.TTA()
accuracy(log_preds,y)

TTA (Test Time Augmentation) simply makes predictions not just on the images in your validation set, but also makes predictions on a number of randomly augmented versions of them

Predictions


In [98]:
# from here we know that 'icebergs' is label 0 and 'ships' is label 1.
data.classes


Out[98]:
['iceberg', 'ship']

In [99]:
# this gives prediction for validation set. Predictions are in log scale
log_preds = learn.TTA(is_test=True) # If need TTA
#log_preds = learn.predict(is_test=True) # if don't need TTA
log_preds[0].shape


Out[99]:
(8424, 2)

In [100]:
probs_submit1 = np.exp(log_preds[0][:,0]) 
probs_submit1[:10]


Out[100]:
array([ 0.92662,  0.20084,  0.04377,  0.03495,  0.00351,  0.00174,  0.00412,  0.98356,  0.03753,  0.88934], dtype=float32)

In [101]:
# getting ids from test list
id_raw = data.test_dl.dataset.fnames
id_raw[1]


# using regex to take numbers from ids
id_pro = result_array = np.empty((0, len(id_raw)))
for i in range(len(id_raw)):
    stuff = int(re.findall(r'\d+', id_raw[i])[0])
    #print(type(str(stuff)))
    id_pro = np.append(id_pro,int(stuff))

In [102]:
id_pro_list = []
for i in id_pro:
    id_pro_list.append(int(i))
id_pro_list[:4]


Out[102]:
[3734, 1584, 6952, 2286]

In [103]:
# joining id and probability
d = {'index': id_pro_list, 'is_iceberg': probs_submit1}
submit1_df = pd.DataFrame(data=d)

In [ ]:
id_ = test['id']
id_pd = pd.DataFrame({'id': id_} )

In [ ]:
submit1_df_sorted = submit1_df.sort_values('index')
submit1_df_sorted2 = pd.concat([id_, submit1_df_sorted.set_index('index')], axis = 1)

In [ ]:
submit1_df_sorted2.dtypes

In [ ]:
submit1_df.dtypes

In [ ]:
#submit1_df.id = submit1_df.id.astype(str)

In [ ]:
submit1_df_sorted2.to_csv('/home/ec2-user/data/submissions/submit6.csv', index = False)

In [ ]:
! head -5 /home/ec2-user/data/submissions/submit7.csv

In [ ]:
submit_check = pd.read_csv("/home/ec2-user/data/submissions/submit6.csv")

In [ ]:
#submit_check.dtypes

In [ ]:
submit_check[:10]

scp submit1 to local machine to upload on kaggle

Post analysis

Looking at correctly, incorrectly classified images

Analyzing results: looking at pictures

As well as looking at the overall metrics, it's also a good idea to look at examples of each of:

  • A few correct labels at random
  • A few incorrect labels at random
  • The most correct labels of each class (ie those with highest probability that are correct)
  • The most incorrect labels of each class (ie those with highest probability that are incorrect)
  • The most uncertain labels (ie those with probability closest to 0.5).

In [158]:
log_pred1 = learn.TTA() # If need TTA

preds = np.argmax(log_pred1[0], axis=1)  # from log probabilities to 0 or 1
probs = np.exp(log_pred1[0][:,1])        # pr(ship)

#probs = submit_check['is_iceberg']
#submit_check['C'] = np.where(submit_check['is_iceberg'] >= 0.5,1, 0)
#preds = np.array(submit_check['C'])

In [159]:
def rand_by_mask(mask): return np.random.choice(np.where(mask)[0], 4, replace=False)
def rand_by_correct(is_correct): return rand_by_mask((preds == data.val_y)==is_correct)

In [160]:
def plot_val_with_title(idxs, title):
    imgs = np.stack([data.val_ds[x][0] for x in idxs])
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(data.val_ds.denorm(imgs), rows=1, titles=title_probs)

In [161]:
def plots(ims, figsize=(12,6), rows=1, titles=None):
    f = plt.figure(figsize=figsize)
    for i in range(len(ims)):
        sp = f.add_subplot(rows, len(ims)//rows, i+1)
        sp.axis('Off')
        if titles is not None: sp.set_title(titles[i], fontsize=16)
        plt.imshow(ims[i])

In [162]:
def load_img_id(ds, idx): return np.array(PIL.Image.open(path2+ds.fnames[idx]))

def plot_val_with_title(idxs, title):
    imgs = [load_img_id(data.val_ds,x) for x in idxs]
    title_probs = [probs[x] for x in idxs]
    print(title)
    return plots(imgs, rows=1, titles=title_probs, figsize=(16,8))

In [163]:
# 1. A few correct labels at random
plot_val_with_title(rand_by_correct(True), "Correctly classified")


Correctly classified

In [164]:
# 2. A few incorrect labels at random
plot_val_with_title(rand_by_correct(False), "Incorrectly classified")


Incorrectly classified

In [112]:
def most_by_mask(mask, mult):
    idxs = np.where(mask)[0]
    return idxs[np.argsort(mult * probs[idxs])[:4]]

def most_by_correct(y, is_correct): 
    mult = -1 if (y==1)==is_correct else 1
    return most_by_mask((preds == data.val_y)==is_correct & (data.val_y == y), mult)

In [165]:
plot_val_with_title(most_by_correct(0, True), "Most correct icebergs")


Most correct icebergs

In [166]:
plot_val_with_title(most_by_correct(1, True), "Most correct ships")


Most correct ships

In [ ]: