FAI1 Practical Deep Learning I | 11 May 2017 | Wayne Nixalo

In this notebook I'll be building a simple linear model in Keras using Sequential()

Tutorial on Linear Model for MNIST: linky

Keras.io doc on .fit_generator & Sequential

Some Notes:

It looks like I'll need to use Pandas to work with data in .csv files (MNIST from Kaggle comes that way). For the data sets that come in as folders of .jpegs, I'll use the way shown in class of get_batches & get_data ... but then if that's for a DLNN wouldn't I have to do that for this as well? Will see.



In [1]:

    
# Import relevant libraries
from keras.models import Sequential
from keras.layers import Dense
from keras.optimizers import SGD, RMSprop
from keras.preprocessing import image
import numpy as np
import os









    



Using Theano backend.



In [2]:

    
# Data functions ~ mostly from utils.py or vgg16.py
def get_batches(dirname, gen=image.ImageDataGenerator(), shuffle=True, batch_size=4, class_mode='categorical',
                 target_size=(224,224)):
    return gen.flow_from_directory(dirname, target_size=target_size,
            class_mode=class_mode, shuffle=shuffle, batch_size=batch_size)

# from keras.utils.np_utils import to_categorical
# def onehot(x): return to_categorical(labels, num_classes=10)

from keras.utils.np_utils import to_categorical
def onehot(x): return to_categorical(x)

# from sklearn.preprocessing import OneHotEncoder
# def onehot(x): return np.array(OneHotEncoder().fit_transform(x.reshape(-1,1)).todense())
    
# import bcolz
# def save_data(fname, array): c=bcolz.carray(array, rootdir=fname, mode='w'); c.flush()
# def load_data(path): return bcolz.open(fname)[:]



In [3]:

    
# Some setup
path = 'L2HW_data/'
if not os.path.exists(path): os.mkdir(path)



In [4]:

    
# Getting Data

# val_batches = get_batches(path+'valid/', shuffle=False, batch_size=1)
# trn_batches = get_batches(path+'train/', shuffle=False, batch_size=1)

# converting classes to OneHot for Keras
# val_classes = val_batches.classes
# trn_classes = trn_batches.classes
# val_labels = onehot(val_classes)
# trn_labels = onehot(trn_classes)

# See: https://www.kaggle.com/fchollet/simple-deep-mlp-with-keras/code/code
# for help loading
# I haven't learned how to batch-load .csv files; I'll blow that bridge 
#     when I get to it.

# read data
import pandas as pd
trn_data = pd.read_csv(path + 'train.csv')
trn_labels = trn_data.ix[:,0].values.astype('int32')
trn_input = (trn_data.ix[:,1:].values).astype('float32')
test_input = (pd.read_csv(path + 'test.csv').values).astype('float32')

# one-hot encode labels
trn_labels = onehot(trn_labels)

input_dim = trn_input.shape[1]
nb_classes = trn_labels.shape[1]



In [5]:

    
# To show how we'd know what the input dimensions should be without researching MNIST:
print(trn_input.shape)









    



(42000, 784)

That above is 42,0000 images by 784 pixels. So, the usual 28x28 pixel images. We can do the same to take a look at the output, which, not surprisingly, is the 10 possible digits



In [6]:

    
print(trn_labels.shape)









    



(42000, 10)



In [21]:

    
# I/O Dimensions: determined by data/categories/network
# Output_Cols = 10
# input_dim  = 784 # for 1st layer only: rest do auto-shape-inference

# Hyperparameters
LR = 0.1
optz = SGD(lr=LR)
# optz = RMSprop(lr=LR)
lossFn = 'mse'
# lossFn = 'categorical_cross_entropy'
metric=['accuracy']
# metrics=None

LM = Sequential( [Dense(nb_classes, input_shape=(input_dim,))] )
# LM.compile(optmizer = optz, loss = lossFn, metrics = metric)
LM.compile(optimizer=SGD(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])
# lm.compile(optimizer=RMSprop(lr=0.1), loss='categorical_crossentropy', metrics=['accuracy'])



In [26]:

    
# Train the model on the data
LM.fit(trn_input, trn_labels, nb_epoch=5, batch_size = 4, verbose=1)









    



Epoch 1/5
42000/42000 [==============================] - 4s - loss: 11.4692 - acc: 0.1761      ETA: 3s - los - ETA - ETA: 1s - loss: - ETA: 0s - loss: 11.4513 - acc:  - ETA: 0s - los
Epoch 2/5
42000/42000 [==============================] - 4s - loss: 11.4692 - acc: 0.1761      ETA: 1s - loss: 1 - ETA: 0s - l
Epoch 3/5
42000/42000 [==============================] - 4s - loss: 11.4692 - acc: 0.1761     
Epoch 4/5
42000/42000 [==============================] - 4s - loss: 11.4692 - acc: 0.1761      ETA - ETA: 1s - loss: 11.4770 - acc: 
Epoch 5/5
42000/42000 [==============================] - 4s - loss: 11.4692 - acc: 0.1761      ETA: 0s - loss: 11.4725 - acc: 






    Out[26]:





<keras.callbacks.History at 0x10c410390>

As I'd expect, a single perceptron layer doing a linear mapping between input to output performs.. poorly. But this is one of the first times I'm hand coding this from scratch, and it's good to be in a place where I can start experimenting without spending the bulk of mental effort over getting the machine to work.

By accidentally running more epochs without re-initializing the model, it seems it plateaus at 0.1761 accuracy.

It'll be interesting to see how RMSprop compares to SGD, and mean-squared error vs categorical cross-entropy. More interesting is adding more layers and tweaking their activations and looking at different learning rates: constant or graduated. Even more interesting is adding backpropagation and turning it into a neural network.

Below I'm getting the data separated into training and validation sets.



In [33]:

    
# # Turns out this cell was unnecessary
# import pandas as pd
# test_data = pd.read_csv(path + 'test.csv')
# # wait there are no labels that's the point..
# # test_labels = test_data.ix[:,0].values.astype('int32')
# test_input  = (test_data.ix[:,1:].values).astype('float32')



In [35]:

    
# print(test_data.shape)
# print(test_labels.shape)









    



(28000, 784)
(28000,)

Unfortunately I don't yet know how to separate a .csv file into a validation set. Wait that sounds easy. Take a random permutation, or for the lazy: the first X amount of inputs and labels from the training set and call that validation. Oh. Okay. Maybe I should do that.



In [9]:

    
# # Wondering if a crash I had earlier was because One-Hotting did something to the labels..
# trn_data = pd.read_csv(path + 'train.csv')
# trn_labels = trn_data.ix[:,0].values.astype('int32')
# trn_input = (trn_data.ix[:,1:].values).astype('float32')


# test_data has 42,000 elements. I'll take 2,000 for validation.
# val_data = trn_data[:2000]
# trn_2_data = trn_data[2000:]

# val_input  = val_data.ix[:,0].values.astype('int32')
# val_labels = (val_data.ix[:,1:].values).astype('float32')

# trn_2_input = trn_2_data.ix[:,0].values.astype('int32')
# trn_2_labels = (trn_2_data.ix[:,1:].values).astype('float32')


# trying to do this in a way that doesn't kill the kernel
print(trn_data.shape)
print(trn_input.shape)
print(trn_labels.shape)









    



(42000, 785)
(42000, 784)
(42000, 10)

Okay, so.. training data is the number of images X (number of pixels + label). The label just adds 1 to the vector's length because it's just a decimal {0..9}, giving 42k X 785..

The training input vector has the label removed so its 42k images X 784 pixels..

The trianing labels vector is one-hot encoded to a ten-bit, & of course the 42k images..

So... nothing is saying I can't just cut input & labels and separate those into new training and validation sets. Not sure why I was getting crashes the other way, but we'll see if this works (it should). Ooo, maybe I.. okay maybe I made a mistake with leaving the labels on or something, before.



In [10]:

    
# Old
# print(val_data.shape, val_input.shape, val_labels.shape)
# print(trn_2_input.shape, trn_2_input.shape, trn_2_labels.shape)









    



((2000, 785), (2000,), (2000, 784))
((40000,), (40000,), (40000, 784))



In [11]:

    
val_input  = trn_input[:2000]
val_labels = trn_labels[:2000]

newtrn_input = trn_input[2000:]
newtrn_labels = trn_labels[2000:]

print(val_input.shape, val_labels.shape)
print(newtrn_input.shape, newtrn_labels.shape)









    



((2000, 784), (2000, 10))
((40000, 784), (40000, 10))

Okay, so we got those separated.. let's do the same thing as before: single Linear Model Perceptron Layer, but with a validation set to check against.



In [12]:

    
# The stuff above's One-Hotted.. so don't do it again..

# # I forgot the onehot encode the labels after loading from disk
# val_labels = onehot(val_labels)
# trn_2_labels = onehot(trn_2_labels)
# print(val_labels.shape)
# print(trn_2_labels.shape)



In [17]:

    
LM = Sequential([Dense(nb_classes, input_dim=input_dim)])
LM.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])
# LM = Sequential([Dense(nb_classes, activation='sigmoid', input_dim=input_dim)])

LM.fit(newtrn_input, newtrn_labels, nb_epoch=5, batch_size=4,
       validation_data=(val_input, val_labels))









    



Train on 40000 samples, validate on 2000 samples
Epoch 1/5
40000/40000 [==============================] - 4s - loss: nan - acc: 0.0983 - val_loss: nan - val_acc: 0.0980
Epoch 2/5
40000/40000 [==============================] - 4s - loss: nan - acc: 0.0984 - val_loss: nan - val_acc: 0.0980
Epoch 3/5
40000/40000 [==============================] - 4s - loss: nan - acc: 0.0984 - val_loss: nan - val_acc: 0.0980
Epoch 4/5
40000/40000 [==============================] - 4s - loss: nan - acc: 0.0984 - val_loss: nan - val_acc: 0.0980
Epoch 5/5
40000/40000 [==============================] - 5s - loss: nan - acc: 0.0984 - val_loss: nan - val_acc: 0.0980






    Out[17]:





<keras.callbacks.History at 0x11b16a390>



In [29]:

    
LM2 = Sequential([Dense(nb_classes, activation='sigmoid', input_dim=input_dim)])
LM2.compile(optimizer='sgd', loss='mse', metrics=['accuracy'])
LM2.fit(trn_input, trn_labels, nb_epoch=5, batch_size=4, verbose=1)









    



Epoch 1/5
42000/42000 [==============================] - 4s - loss: 0.0983 - acc: 0.1568     - ETA: 4s - loss: 0.1200 - acc: 0.099 - ETA: 4s - loss: 0.1189 - acc - ETA: 3s - ETA: 0s - loss: 0.0985 - acc: 
Epoch 2/5
42000/42000 [==============================] - 5s - loss: 0.0933 - acc: 0.1843     - ETA: 4s - loss: 0.0932 - - ETA: 3s - loss:  - ETA: 0s - loss: 0.0934 - acc: 0.183 - ETA: 0s - loss: 0.0934 - acc: 0 - ETA: 0s - loss: 0.0934 - 
Epoch 3/5
42000/42000 [==============================] - 4s - loss: 0.0930 - acc: 0.1920     - ETA: 3s - loss: 0.0927 - acc: 0 - ETA: 3s - loss: 0.0927 - - E - ETA: 0s - loss: 0.0
Epoch 4/5
42000/42000 [==============================] - 4s - loss: 0.0928 - acc: 0.1877     - ET - ETA: 0s - loss: 0.0928 
Epoch 5/5
42000/42000 [==============================] - 5s - loss: 0.0925 - acc: 0.1981     - ETA: 3s - loss: 0.0929 - - ETA:  - ETA: 1s - loss: 0.0926 - a - ETA: 0s - loss: 0.0926 - 






    Out[29]:





<keras.callbacks.History at 0x118ad3410>

Yeah, I'm not seeing a real difference in not/using an activation function for just a single linear layer.

A Multilayer Perceptron (MLP) can be built by simple adding on layers. The big difference between an MLP and a NN is no backpropagation is going on. A single forward pass is done through the network each epoch, and there isn't any adjustment of weights.



In [22]:

    
# this notebook will be a bit of a mess; the machine isn't the only one learning
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout



In [23]:

    
??Activation



In [24]:

    
# just here so I have them infront of me
input_dim = input_dim # 784
nb_classes = nb_classes # 10

MLP = Sequential()
MLP.add(Dense(112, input_dim=input_dim)) # I'll just set internal output to 4x28=112
MLP.add(Activation('sigmoid'))
# I dont know what to set the dropout layer too, but I see it in keras.io as 0.5
MLP.add(Dropout(0.5))

# and now to add 3 more layers set to sigmoid
for layer in xrange(3):
    MLP.add(Dense(112))
    MLP.add(Activation('sigmoid'))
    MLP.add(Dropout(0.5))

# and our final layers
MLP.add(Dense(nb_classes, activation='softmax'))

# this will be a beautiful disaster
MLP.compile(loss='mse', optimizer='sgd', metrics=['accuracy'])

MLP.fit(newtrn_input, newtrn_labels, nb_epoch=5, batch_size=4,
        validation_data = (val_input, val_labels))









    



Train on 40000 samples, validate on 2000 samples
Epoch 1/5
40000/40000 [==============================] - 14s - loss: 0.0947 - acc: 0.1021 - val_loss: 0.0900 - val_acc: 0.1085
Epoch 2/5
40000/40000 [==============================] - 14s - loss: 0.0933 - acc: 0.1021 - val_loss: 0.0900 - val_acc: 0.1100
Epoch 3/5
40000/40000 [==============================] - 13s - loss: 0.0928 - acc: 0.1005 - val_loss: 0.0900 - val_acc: 0.1155
Epoch 4/5
40000/40000 [==============================] - 14s - loss: 0.0924 - acc: 0.1002 - val_loss: 0.0900 - val_acc: 0.1655
Epoch 5/5
40000/40000 [==============================] - 14s - loss: 0.0919 - acc: 0.1040 - val_loss: 0.0899 - val_acc: 0.1085






    Out[24]:





<keras.callbacks.History at 0x11be078d0>



In [ ]: