Consider image recognition problem. We want a neural network with the following requirements:
Convolutionl neural networks leverage these ideas
Use local connectivity of hidden units (instead of all units)
Reduces the number of weights
Feature map: $\displaystyle \mathbf{y}_j = \sigma\left(\sum_k \mathbf{w}_{ik} * \mathbf{x}_i \right)$
The convolution of $\mathbf{x}$ with weights $\mathbf{w}$ is computed as foloows
$\displaystyle y[i] = \sum_k x[i-k] w[k]$
flip $\mathbf{x}$, and slide it over $\mathbf{w}$.
$\displaystyle y[i] = \sum_k x[k] w[i-k]$
If we have a $2\times2$ and $2\times 2$, then we get a $3\times 3$.
(related concept): the correlation of $\mathbf{x}$ with weights $\mathbf{w}$ is
$\displaystyle y[i] = \sum_k x[i+k] w[k]$
$\displaystyle y[i] = \sum_k x[i] w[i+k]$
with correlation, we do not flip any vector
Flip $\left[\begin{array}{cc}0&0.5 \\ 0.25 & 1 \end{array}\right]$ on both axes: $\left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right]$
Then, we can use simple element-wise multiplications followed by summing over the products:
First row:
$$\left[\begin{array}{cc}0&0\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}0&0\\0 & 20 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}0&0\\20 & 40 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}0&0\\40 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$second row: $$\left[\begin{array}{cc}0&0\\0 & 80 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$
$$\left[\begin{array}{cc}0&20\\80 & 20 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}20&40\\20 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}40&0\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}0&0\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$Third row: $$\left[\begin{array}{cc}0&0\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$
$$\left[\begin{array}{cc}0&0\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}0&40\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$$$\left[\begin{array}{cc}40&0\\0 & 0 \end{array}\right] \left[\begin{array}{cc}1 &0.25 \\ 0.5 & 0 \end{array}\right] = $$Advantages:
To get from input to output, we need a weight (filter) matrix with dimensionality $(16, f_x, f_y, 10)$ where $f_x,f_y$ are the filter size, for example $5\times 5$.
$$y_i = \sum_{k=1}^{10} X_k * W_{i,.,.,k}$$Let $l$ be our loss function, and $\mathbf{y}_j = \mathbf{x}_i * \mathbf{w}_{ij}$
Gradient of input $\mathbf{x}_i$
$$\frac{\partial l}{\partial \mathbf{x}_i} = \sum_j \left(\frac{\partial l}{\partial \mathbf{y}_j}\right) \star \mathbf{w}_{ij} \ \ \ (\star\text{\equiv correlation})$$
$\star$ represents correlation
Gradient of weights $\mathbf{w}_{ij}$
$$\frac{\partial l}{\partial \mathbf{w}_{ij}} = \sum_j \left(\frac{\partial l}{\partial \mathbf{y}_j}\right) * \mathbf{x}_{ij}$$
Let's assume that the maximum occurs at $x_2$ for this example., i.e. $y = \max\{x_1,x_2,x_3,x_4\} = x_2$
Therefore, we get $\left[\begin{array}{ccc}0&1\\0&0\end{array}\right] $
$\displaystyle \left[\begin{array}{ccc}\frac{1}{4}&\frac{1}{4}\\ \frac{1}{4}&\frac{1}{4}\end{array}\right] $
In [ ]: