In [1]:
nn = require 'nn';
Convolutions
A nice graphical depiction of kernels for convolutions is here. A convolution operation in images applies (through linear combination) a small matrix (called the kernel/filter) of weights to a bigger matrix (image in this case) by sliding it over the bigger matrix in a particular fashion with the aim of generating a new matrix (called the feature map) usually of a size smaller than the bigger matrix.
In [38]:
module1 = nn.SpatialConvolutionMM(1, 16, 3, 3, 1,1, 1,1);
-- nn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW], [dH], [padW], [padH])
Accepts input as either 3d or 4d. If 4d, the 1st dimension is the example number (This way you can pass a mini-batch of examples). If 3d, the 1st dimension is the number of channels. So, the input is usually of this form:
batchsize X number of channels X width X height
or in 3D:
number of channels X width X height
The kW and kH parameters represent the kernel width and height. dW and dH represent the stride in both directions. Stride means the number of pixels you skip between each convolution operation. padW and padH represent the amount of zero padding you want to add if any.
In [3]:
lena = image.lena() -- quintessential image for CV community, image.fabio() also there
In [4]:
--notice the slicing to get one channel
lena[{{1}}]:size()
Out[4]:
In [5]:
p = module1:forward(lena[{{1}}]);
In [6]:
p:size()
Out[6]:
How do you derive dimensions of the output image?
Formula:
owidth = floor((width + 2padW - kW) / dW + 1)
oheight = floor((height + 2padH - kH) / dH + 1)
In [7]:
require 'math';
width = 512
height = 512
padW = 1
kW = 3
dW = 1
owidth = math.floor((width+2*padW-kW)/dW + 1)
In [8]:
owidth
Out[8]:
Image before convolution operation
In [9]:
itorch.image(lena[3])
Out[9]:
16 3x3 kernels applied to the image
In [10]:
module1.weight
Out[10]:
One of the 16 filter banks
In [11]:
itorch.image(p[1])
Out[11]:
Aggregation/Pooling
Summarizes the data of above layers. Reduces number of parameters. Parameters almost the same as Convolution but doesn't apply a filter/kernel to each kernel-sized region of the input image. Instead summarizes that region using L-2 norm of the values in that region, or maximum of those values, etc.
In [12]:
module2 = nn.SpatialMaxPooling(2,2) --nn.SpatialMaxPooling(kW, kH [, dW, dH, padW, padH])
In [13]:
q = module2:forward(p);
In [14]:
q:size()
Out[14]:
In [15]:
itorch.image(q[1])
Out[15]:
Non-linearity
$$ sigmoid(x) = \frac{1}{1+e^{-x}}$$ $$RELU(x) = \max(0,x)$$
In [ ]:
module3 = nn.ReLU()
In [17]:
r = module3:forward(q)
In [18]:
itorch.image(r[3])
Out[18]:
Fully Connected Layer
Usually a 2-layer perceptron -- actual classifier
In [19]:
small_image = torch.randn(16,4,4)
module4 = nn.Sequential()
module4:add(nn.Reshape(16*4*4))
module4:add(nn.Linear(16*4*4, 50))
module4:add(nn.Tanh())
module4:add(nn.Linear(50, 10))
In [20]:
module4:forward(small_image)
Out[20]:
Dropout
Randomly drops connections between neurons. Prevents the network from depending too much on particular connection or group of connections. Refer here.
In [21]:
a = torch.ones(4)
a[2] = 3
In [22]:
a
Out[22]:
In [27]:
module5 = nn.Dropout(0.5,true) -- Dropout:__init(p,v1,inplace)
In [28]:
module5
Out[28]:
In [29]:
module5:forward(a)
Out[29]:
In [34]:
module5:evaluate()
Out[34]:
In [35]:
module5:forward(a)
Out[35]:
Let's make a CNN
input-size - 3x512x512
convolution(number of output planes = 16, kernel-size = 5x5, stride = 2x2, padding = 2x2)
max-pooling(2x2, no stride)
convolution(number of output planes = 32, kernel-size = 5x5, stride = 2x2)
max-pooling(3x3)
In [63]:
model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(3,16,5,5,2,2,2,2))
model:add(nn.Tanh())
model:add(nn.SpatialMaxPooling(2,2))
model:add(nn.SpatialConvolutionMM(16,32,5,5,2,2))
model:add(nn.Tanh())
model:add(nn.SpatialMaxPooling(3,3))
What is the size of the input to the fully-connected layer?
In [64]:
s = model:forward(lena);
In [66]:
s:size();
In [67]:
model:add(nn.Reshape(?))
model:add(nn.Linear(?, 200))
model:add(nn.Tanh())
model:add(nn.Linear(200,10))
In [77]:
model;
In [69]:
t = model:forward(lena)
In [70]:
t
Out[70]:
In [71]:
u = model:backward(lena, torch.randn(10))
In [72]:
u:size()
Out[72]:
In [74]:
u[{{1},{1,4},{1,4}}]
Out[74]:
How to perform Gradient Descent
In [78]:
params, paramx = model:getParameters();
In [85]:
params:size()
Out[85]:
In [90]:
params[1] -- lua is 1-based
Out[90]:
In [86]:
paramx:size()
Out[86]:
In [91]:
paramx[1]
Out[91]:
In [92]:
eta = 0.1
params:add(-eta, paramx)
In [93]:
params[1]
Out[93]:
Examples of CNNs in the wild: