This is a series of posts about deep learning, not how to classify Fashion MNIST but more on how to use the science and it's tools . I will discuss Frameworks, Architecturing, Solving Problems and a Bunch of flash notes for things that we forget about , alas we are not machines .
Deep Learning frameworks are quite interesting, because they require a very hardcore feat of engineering in between providing cross-platform software, ultra-fast computation, numerical correction and most of all a PYTHON interface .
They come in two varieties Dynamic and Static . I like to separate them using another metric UX there are those that can be used , and those that can't be used . Usage = (Time to Solve) - (Time to Fight Tool) .
Away from this comedy of emacs vs vim as it's all about user preference (but really tensorflow ?) Let's try to decipher how the framework is built how can it be used and how to go from architecture to code .
P.S : I didn't mention Keras because Keras is actually the knife compared to Tensorflow the rusty chainsaw or PyTorch the scalpel .
Deep Learning is trying to approximate an unknown function using a set of examples .
Deep Learning is a model that approximates any function by learning representations of the functions and trying to generalize from . Learning Representations is what happens when layers or neurons weights are optimized . The Linear Operation $ Y = W*X+b$ is a way to learn linear relations where as by introducing a new component the activation function that introduces non linearities to the learning process , non-linear as $ Y = Z(W*X+b) = max(0,W*X+b) $ .
Representations a/k/a Features are the characteristics that describe your input data , features are essentially Random Variables , engineering features means constructing meaningful characteristics for your inputs . Good Features are essential for a successful and easier learning t , Deep Neural Networks have this ability to learn good features by training , the subsequent stacking of layers is like a representation filter that tries to learn good representations to give to the output layer for example softmax that act as a classifier .
The hidden layers act slike a feature engineering pipeline that does what used to be a manual domain-driven task automatically and maybe better (ConvNets) .
Layers such as the Convolutional Layer are efficient representation learner that learns small patterns in parts of images . Images are Tensors mathematically but more importantly Images have a visual structure a Flat Image would be hard to understand where as a Normal Image can tell a 1000 words . The Convolution operation that essentially scans a tensor and multiplies by a Filter or Kernel will learn a specific representation from each part and also keeps the structure of the image intact , Nose is in the center becomes a learnable representation (more on this another time )
N.B :
Keras is a modular wrapper around Tensorflow it's the actual reason Tensorflow is used by so many people* . Keras let's you build models brick by brick in the literal sense (Sequential) or by Tele-Kinesis (Functional API <3 ) .
Keras provides you with a highly friendly API to turn any architecture you have in mind to code and train and test it at the same time .
Deep Learning requires some tools , first you have to design the network architecture whether you are using a Fully-Connected-Layer or a series of Conv -> MaxPool you need to have in mind a way to approach the problem . Generally as a rule of thumb we have these heuristics :
N.B :
Next you need to preprocess your data , as you may know NNs love (0,1) values so you will have to often standarize your dataset .
You'll also need to understand Loss Functions yes because you see different problems need different loss functions and the choice of your loss function may affect your convergence .
And a GPU or even better a
I love Google
At this point you are ready to train your neural network and watch it reach 99% accuracy on MNIST .
In [0]:
# KERAS SEQUENTIAL EXAMPLE FOR CLASSIFICATIOn
from keras import models,layers,datasets
In [2]:
(x_train,y_train),(x_test,y_test) = datasets.mnist.load_data()
In [0]:
import numpy as np
In [0]:
from keras import utils
In [0]:
y_test = utils.to_categorical(y_test)
y_train = utils.to_categorical(y_train)
In [26]:
# NORMALIZE THE DATA
def normalizer(x):
x = x.reshape((x.shape[0],28*28))
x = x.astype('float32')
x -= x.mean()
x /= x.std()
return x
x_train = normalizer(x_train)
x_test = normalizer(x_test)
# BUILD AN MLP BY STACKING LAYER OVER LAYER
model = models.Sequential()
model.add(layers.Dense(512,input_shape=(28*28,)))
model.add(layers.Dense(396,activation="relu"))
model.add(layers.Dense(256,activation="relu"))
model.add(layers.Dense(128,activation="elu"))
model.add(layers.Dense(64,activation="elu"))
model.add(layers.Dense(32,activation="elu"))
model.add(layers.Dense(10,activation="softmax"))
# COMPILE THE MODEL
model.compile(optimizer="adam",loss="categorical_crossentropy",metrics=["accuracy"])
# FIT THE MODEL
model.fit(x_train,y_train,epochs=10,batch_size=32,validation_data=(x_test,y_test))
Out[26]:
PyTorch is essentially a library to build and train deep neural nets and also serves as a Numpy on GPU library .
PyTorch gives you modules (optim,nn,torchvision) that can be used collectively to write your model as code and computations will be done dynamically (no model compilation like in Tensorflow)
Let me show you an example similar to what we just did with Keras
In [31]:
# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'
!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.4.0-{platform}-linux_x86_64.whl torchvision
import torch
In [0]:
import torch.nn as nn
import torch.nn.functional as F
In [0]:
class NeuralNet(nn.Module):
def __init__(self):
super(NeuralNet,self).__init__()
# Linear is the affine transformation y = w*x + b
self.fc1 = nn.Linear(784,512)
self.fc2 = nn.Linear(512,396)
self.fc3 = nn.Linear(396,256)
self.fc4 = nn.Linear(256,128)
self.fc5 = nn.Linear(128,64)
self.fc6 = nn.Linear(64,32)
self.fc7 = nn.Linear(32,10)
def forward(self,x):
# The forward pass is what happens from each layer to layer in other words
# the flow of inputs trough the network
# here we describe the activation and maxpooling operations ...
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.relu(self.fc4(x))
x = F.elu(self.fc5(x))
x = F.elu(self.fc6(x))
x = self.fc7(x)
return x
In [35]:
net = NeuralNet()
print(net)
Now we built a class NeuralNetwork with a specific architecture as you may notice you can create a NeuralNetwork factory that generates different models based on inputs but let's keep that for later .
Let's define a train function that trains a Neural Network
P.S : PyTorch deduces and executes the backward pass (backpropagation) from the forward operation .
In [0]:
from torchvision import datasets, transforms, utils
In [0]:
normalizer = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (1.0,))])
# load dataset
train_set = datasets.MNIST(root='./data', train=True, transform=normalizer, download=True)
test_set = datasets.MNIST(root='./data', train=False, transform=normalizer, download=True)
batch_size = 100
train_loader = torch.utils.data.DataLoader(
dataset=train_set,
batch_size=batch_size,
shuffle=True)
test_loader = torch.utils.data.DataLoader(
dataset=test_set,
batch_size=batch_size,
shuffle=False)
Now that our data loaders (PyTorch data generators) are created we can move to implement a train function this could've been a method for the Neural Network defined above or just a function like we did now .
In [0]:
# loss function Categorical Cross Entropy
criterion = nn.CrossEntropyLoss()
# optimizer
optimizer = torch.optim.SGD(net.parameters(),lr=0.01,momentum=0.9)
def train(epochs):
for epoch in range(epochs):
# trainning
loss = 0
for batch_idx, (x, y) in enumerate(train_loader):
optimizer.zero_grad()
x = x.view(-1, 28*28)
x = torch.autograd.Variable(x)
y = torch.autograd.Variable(y)
out = net(x)
loss = criterion(out, y)
loss = loss * 0.9 + loss.data[0] * 0.1
loss.backward()
optimizer.step()
if (batch_idx+1) % 100 == 0 or (batch_idx+1) == len(train_loader):
print('==>>> epoch: {} , loss : {}'.format(epoch,loss))
In [46]:
train(10)
In [0]:
def test(epoch):
net.eval()
test_loss = 0
correct = 0
acc_history = []
for data, target in test_loader:
data = data.view(-1,28*28) # view is equivalent to np.reshape
data = torch.autograd.Variable(data, volatile=True)
target = torch.autograd.Variable(target)
output = net(data)
test_loss += F.nll_loss(output, target).data[0]
pred = output.data.max(1)[1] # get the index of the max log-probability
correct += pred.eq(target.data).cpu().sum()
test_loss = test_loss
test_loss /= len(test_loader) # loss function already averages over batch size
accuracy = 100. * correct / len(test_loader.dataset)
acc_history.append(accuracy)
print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
test_loss, correct, len(test_loader.dataset),
accuracy))
In [56]:
test(10)
In [0]: