This is the first part of demo on the reduced STL-10 dataset taken from Stanford's UFLDL tutorial. The original STL-10 dataset has 10 classes (airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). This reduced dataset contains the images from 4 of the classes (airplane, car, cat, dog).
In this first part of the demo we are going to train a sparse autoencoder on randomly selected small patches of the original images. There are 2 000
training images of size 64x64x3
(width x
height x
3 RGB channels) and from that set 100 000
"patches" of size 8x8x3
have been randomly selected. The aim is to use the learned weights as pre-trained weights for a convolution layer. See the Stanford tutorial for a thorough description of a sparse autoencoder.
In the second part of the demo these weights will be used to initialise a convolutional neural network
In [1]:
using Alice
In [2]:
Base.Threads.nthreads()
Out[2]:
In [3]:
patches = load_patches()
num_feats, num_patches = size(patches)
Out[3]:
In [4]:
display_rgb_cols(patches[:, 1:450], scale = 3)
Out[4]:
Whitening is a pre-processing step with similar steps to PCA (Principle Components Analysis). But instead of using it do reduce the number of dimensions it is used to make the inputs less redundant and more suitable for learning. From the Stanford tutorial:
If we are training on images, the raw input is redundant, since adjacent pixel values are highly correlated. The goal of whitening is to make the input less redundant; more formally, our desiderata are that our learning algorithms sees a training input where (i) the features are less correlated with each other, and (ii) the features all have the same variance.
See this page for a thorough description which explains the steps below.
In [5]:
# Subtract mean patch
mean_patch = mean(patches, 2)
patches .-= mean_patch
# Apply ZCA whitening
ϵ = 0.1
sigma = patches * patches' ./ num_patches
U, S, V = svd(sigma)
ZCAWhite = U * diagm(1 ./ sqrt(S .+ ϵ)) * U'
patches = ZCAWhite * patches;
In [6]:
# Set seed to be able to replicate
srand(123)
# Data Box and Input Layer - the patches are both the input and the target
# Note that for a psarse autoencoder we must use full batch training (i.e. batch_size = num_patches)
databox = Data(patches, patches)
batch_size = num_patches
input = InputLayer(databox, batch_size)
# Sparse Encoder
num_hidden = 400 # number of hidden layer neurons
ρ = 0.035 # desired average activation of the hidden units
β = 5.0 # weight of sparsity penalty term
encoder = SparseEncoderLayer(size(input), num_hidden, ρ, β, activation = :logistic)
# Linear Output Layer
dim_in = 30
num_classes = 10
output = MultiLinearOutputLayer(databox, size(encoder))
# Model
λ = 3e-3 # weight decay parameter
sparse_auto_encoder = NeuralNet(databox, [input, encoder, output], λ, regularisation = :L2)
Out[6]:
In [7]:
check_gradients(sparse_auto_encoder)
Because we are using full batch training we can use the train_nlopt
function that calls the NLopt package that is a wrapper to the NLopt library of nonlinear optimisation routines. The default algorithm is :LD_LBFGS
(low storage BFGS) but you can choose any of the supported algorithms via the named algorithm
keyword into the function. E.g. we could call:
train_nlopt(sparse_autoencoder, maxiter = 400, algorithm = :LD_TNEWTON)
to run the optimisation using the truncated newton algorithm.
In [8]:
train_nlopt(sparse_auto_encoder, maxiter=400);
In [9]:
display_rgb_weights(encoder.W', scale=3)
Out[9]:
We pre-processed the 8x8 image patches by doing the following:
ZCAWhite
When we apply these learned weights in our final neural network we will need to pre-process the images in the same way i.e. we need the following for each of the patch inputs into the activation function ($T$ is the whitening matrix and $\bar{x}$ is the mean patch):
$$Z = W(T(x - \bar{x}) + b)$$But, expanding, we see we can also modify the weights and bias
$$Z = WTx - WT\,\bar{x} + b$$I.e. we can set $$\tilde{W} = WT \hspace{0.8cm} \text{and} \hspace{0.8cm} \tilde{b} = b - WT\,\bar{x}$$
and use these adjusted weights and bias on the images without any further pre-processing.
In [ ]:
W = encoder.W * ZCAWhite
b = encoder.b .- squeeze(W * mean_patch, 2);
In [9]:
using JLD
save("C:/...mypath.../stl_features.jld", "W", W, "b", b);
In [ ]: