When we use neural networks to generate images, it usually involves up-sampling from low resolution to high resolution.

There are various methods to conduct up-sample operation:

- Nearest neighbor interpolation
- Bi-linear interpolation
- Bi-cubic interpolation

All these methods involve some interpolation which we need to chose like a manual feature engineering that the network can not change later on.

Instead, we could use the transposed convolution which has learnable parameters [1].

Examples of the transposed convolution usage:

- the generator in DCGAN takes randomly sampled values to produce a full-size image [2].
- the semantic segmentation uses convolutional layers to extract features in the encoder and then restores the original image size in the encoder so that it can classify every pixel in the original image [3].

The transposed convolution is also known as:

- Fractionally-strided convolution
- Deconvolution

But we will only use the word **transposed convolution** in this notebook.

One caution: the transposed convolution is the cause of the checkerboard artifacts in generated images [4]. The paper recommends an up-sampling followed by convolution to reduce such issues. If the main objective is to generate images without such artifacts, it is worth considering one of the interpolation methods.

```
In [1]:
```import numpy as np
import matplotlib.pyplot as plt
import keras
import keras.backend as K
from keras.layers import Conv2D
from keras.models import Sequential
%matplotlib inline

```
```

```
In [2]:
```inputs = np.random.randint(1, 9, size=(4, 4))
inputs

```
Out[2]:
```

The matrix is visualized as below. The higher the intensity the bright the cell color is.

```
In [3]:
```def show_matrix(m, color, cmap, title=None):
rows, cols = len(m), len(m[0])
fig, ax = plt.subplots(figsize=(cols, rows))
ax.set_yticks(list(range(rows)))
ax.set_xticks(list(range(cols)))
ax.xaxis.tick_top()
if title is not None:
ax.set_title('{} {}'.format(title, m.shape), y=-0.5/rows)
plt.imshow(m, cmap=cmap, vmin=0, vmax=1)
for r in range(rows):
for c in range(cols):
text = '{:>3}'.format(int(m[r][c]))
ax.text(c-0.2, r+0.15, text, color=color, fontsize=15)
plt.show()
def show_inputs(m, title='Inputs'):
show_matrix(m, 'b', plt.cm.Vega10, title)
def show_kernel(m, title='Kernel'):
show_matrix(m, 'r', plt.cm.RdBu_r, title)
def show_output(m, title='Output'):
show_matrix(m, 'g', plt.cm.GnBu, title)

```
In [4]:
```show_inputs(inputs)

```
```

```
In [5]:
```show_inputs(np.random.randint(100, 255, size=(4, 4)))

```
```

Apply a convolution operation on these values can produce big values that are hard to nicely display.

Also, we are ignoring the channel dimension usually used in image processing for a simplicity reason.

```
In [6]:
```kernel = np.random.randint(1, 5, size=(3, 3))
kernel

```
Out[6]:
```

```
In [7]:
```show_kernel(kernel)

```
```

With padding = 0 (padding='VALID') and strides = 1, the convolution produces a 2x2 matrix.

$H_m, W_m$: height and width of the input

$H_k, W_k$: height and width of the kernel

$P$: padding

$S$: strides

$H, W$: height and width of the output

$W = \frac{W_m - W_k + 2P}{S} + 1$

$H = \frac{H_m - H_k + 2P}{S} + 1$

With the 4x4 matrix and 3x3 kernel with no zero padding and stride of 1:

$\frac{4 - 3 + 2\cdot 0}{1} + 1 = 2$

So, with no zero padding and strides of 1, the convolution operation can be defined in a function like below:

```
In [8]:
```def convolve(m, k):
m_rows, m_cols = len(m), len(m[0]) # matrix rows, cols
k_rows, k_cols = len(k), len(k[0]) # kernel rows, cols
rows = m_rows - k_rows + 1 # result matrix rows
cols = m_rows - k_rows + 1 # result matrix cols
v = np.zeros((rows, cols), dtype=m.dtype) # result matrix
for r in range(rows):
for c in range(cols):
v[r][c] = np.sum(m[r:r+k_rows, c:c+k_cols] * k) # sum of the element-wise multiplication
return v

The result of the convolution operation is as follows:

```
In [9]:
```output = convolve(inputs, kernel)
output

```
Out[9]:
```

```
In [10]:
```show_output(output)

```
```

One important point of such convolution operation is that it keeps the positional connectivity between the input values and the output values.

For example, `output[0][0]`

is calculated from `inputs[0:3, 0:3]`

. The kernel is used to link between the two.

```
In [11]:
```output[0][0]

```
Out[11]:
```

```
In [12]:
```inputs[0:3, 0:3]

```
Out[12]:
```

```
In [13]:
``````
kernel
```

```
Out[13]:
```

```
In [14]:
```np.sum(inputs[0:3, 0:3] * kernel) # sum of the element-wise multiplication

```
Out[14]:
```

So, 9 values in the input matrix is used to produce 1 value in the output matrix.

Now, suppose we want to go the other direction. We want to associate 1 value in a matrix to 9 values to another matrix while keeping the same positional association.

For example, the value in the left top corner of the input is associated with the 3x3 values in the left top corner of the output.

This is the core idea of the transposed convolution which we can use to up-sample a small image into a larger one while making sure the positional association (connectivity) is maintained.

Let's first define the convolution matrix and then talk about the transposed convolution matrix.

```
In [15]:
```def convolution_matrix(m, k):
m_rows, m_cols = len(m), len(m[0]) # matrix rows, cols
k_rows, k_cols = len(k), len(k[0]) # kernel rows, cols
# output matrix rows and cols
rows = m_rows - k_rows + 1
cols = m_rows - k_rows + 1
# convolution matrix
v = np.zeros((rows*cols, m_rows, m_cols))
for r in range(rows):
for c in range(cols):
i = r * cols + c
v[i][r:r+k_rows, c:c+k_cols] = k
v = v.reshape((rows*cols), -1)
return v, rows, cols

```
In [16]:
```C, rows, cols = convolution_matrix(inputs, kernel)

```
In [17]:
```show_kernel(C, 'Convolution Matrix')

```
```

```
In [18]:
```def column_vector(m):
return m.flatten().reshape(-1, 1)

```
In [19]:
```x = column_vector(inputs)
x

```
Out[19]:
```

```
In [20]:
```show_inputs(x)

```
```

```
In [21]:
```output = C @ x
output

```
Out[21]:
```

```
In [22]:
```show_output(output)

```
```

We reshape it into the desired shape.

```
In [23]:
```output = output.reshape(rows, cols)
output

```
Out[23]:
```

```
In [24]:
```show_output(output)

```
```

This is exactly the same output as before.

```
In [25]:
```show_kernel(C.T, 'Transposed Convolution Matrix')

```
```

Let's make a new input whose shape is 4x1.

```
In [26]:
```x2 = np.random.randint(1, 5, size=(4, 1))
x2

```
Out[26]:
```

```
In [27]:
```show_inputs(x2)

```
```

We matrix-multiply `C.T`

with `x2`

to up-sample `x2`

from 4 (2x2) to 16 (4x4). This operation has the same connectivity as the convolution but in the backward direction.

As you can see, 1 value in the input `x2`

is connected to 9 values in the output matrix via the transposed convolution matrix.

```
In [28]:
```output2 = (C.T @ x2)
output2

```
Out[28]:
```

```
In [29]:
```show_output(output2)

```
```

```
In [30]:
```output2 = output2.reshape(4, 4)
output2

```
Out[30]:
```

```
In [31]:
```show_output(output2)

```
```

As discussed at the beginning of this notebook the weights in the tranposed convolution matrix can be trained as part of a neural network back-propagation process. As such, it eliminates the necessity for fixed up-sampling methods.

Note: we can emulate the transposed convolution using a direct convolution. We first up-sample the input by adding zeros between the original values in a way that the direct convolution produces the same effect as the transposed convolution. However, it is less efficient due to the need to add zeros to up-sample the input before the convolution.

Vincent Dumoulin, Francesco Visin

https://arxiv.org/abs/1603.07285

Alec Radford, Luke Metz, Soumith Chintala

https://arxiv.org/pdf/1511.06434v2.pdf

Jonathan Long, Evan Shelhamer, Trevor Darrell

https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

Augustus Odena, Vincent Dumoulin, Chris Olah