Useful links:

- original paper http://arxiv.org/abs/1312.6114
- helpful videos explaining the topic

```
In [ ]:
```#The following line fetches you two datasets: images, usable for autoencoder training and attributes.
#Those attributes will be required for the final part of the assignment (applying smiles), so please keep them in mind
from lfw_dataset import fetch_lfw_dataset
data,attrs = fetch_lfw_dataset()

```
In [ ]:
```import numpy as np
X_train = data[:10000].reshape((10000,-1))
print(X_train.shape)
X_val = data[10000:].reshape((-1,X_train.shape[1]))
print(X_val.shape)
image_h = data.shape[1]
image_w = data.shape[2]

For simplicity we want all values of the data to lie in the interval $[0,1]$:

```
In [ ]:
```X_train = np.float32(X_train)
X_train = X_train/255
X_val = np.float32(X_val)
X_val = X_val/255

```
In [ ]:
```%matplotlib inline
import matplotlib.pyplot as plt
def plot_gallery(images, h, w, n_row=3, n_col=6):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.5 * n_col, 1.7 * n_row))
plt.subplots_adjust(bottom=0, left=.01, right=.99, top=.90, hspace=.35)
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w, 3)), cmap=plt.cm.gray, vmin=-1, vmax=1, interpolation='nearest')
plt.xticks(())
plt.yticks(())

```
In [ ]:
```plot_gallery(X_train, image_h, image_w)

```
In [ ]:
```import theano
import theano.tensor as T

```
In [ ]:
```import lasagne
input_X = T.matrix("X")
input_shape = [None,image_h*image_w*3]

```
In [ ]:
```HU_encoder = 2000 #you can play with this values
HU_decoder = 2000
dimZ = 1000 #considering face reconstruction task, which size of representation seems reasonable?
# define the network
# use ReLU for hidden layers' activations
# GlorotUniform initialization for W
# zero initialization for biases
# it's also convenient to put sigmoid activation on output layer to get nice normalized pics
#l_input =
#l_enc =
#l_z =
#l_dec =
#l_out =

```
In [ ]:
```# create prediction variable
prediction = lasagne.layers.get_output(l_out)
# create loss function
loss = lasagne.objectives.squared_error(prediction, input_X).mean()
# create parameter update expressions
params = lasagne.layers.get_all_params(l_out, trainable=True)
updates = lasagne.updates.adam(loss, params, learning_rate=0.001)
# compile training function that updates parameters and returns training loss
# this will take a while
train_fn = theano.function([input_X], loss, updates=updates)
test_fn = theano.function([input_X], prediction)

```
In [ ]:
```def iterate_minibatches(inputs, batchsize, shuffle=True):
if shuffle:
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
if shuffle:
excerpt = indices[start_idx:start_idx + batchsize]
else:
excerpt = slice(start_idx, start_idx + batchsize)
yield inputs[excerpt]

```
In [ ]:
```# train your autoencoder
# visualize progress in reconstruction and loss decay

```
In [ ]:
```for batch in iterate_minibatches(X_val[:10], 1):
pred = test_fn(batch)
plot_gallery([batch[0],pred], image_h, image_w, n_row=1, n_col=2)

```
In [ ]:
```z_sample = T.matrix()
# Your code goes here:
# generated_x =
gen_fn = theano.function([z_sample], generated_x)

```
In [ ]:
```z = np.random.randn(25, dimZ)*0.5
output = gen_fn(np.asarray(z, dtype=theano.config.floatX))
plot_gallery(output, image_h, image_w, n_row=5, n_col=5)

Bayesian approach in deep learning considers everything in terms of distributions. Now our encoder generates not just a vector $z$ but posterior ditribution q(z|x). Technically, the first difference is that you need to split bottleneck layer in two. One dense layer will generate vector $\mu$, and another will generate vector $\sigma$. Reparametrization trick is implemented via the **GaussianSampler** layer, that generates random vetor $\epsilon$ and returns $\mu+\sigma\epsilon$

The code for this layer taken from "recipes" folder of Lasagne github repo:

```
In [ ]:
```import GS
#reload(GS)

```
In [ ]:
```# to compare with conventional AE, keep these hyperparameters
# or change them for the values that you used before
HU_encoder = 2000
HU_decoder = 2000
dimZ = 1000
# define the network
# you can start from https://github.com/Lasagne/Recipes/blob/master/examples/variational_autoencoder/variational_autoencoder.py
# or another example https://github.com/y0ast/Variational-Autoencoder/blob/master/VAE.py
# but remember that this is not your ground truth since the data is not MNIST

And the last, but not least! Place in the code where the most of the formulaes goes to - optimization objective. The objective for VAE has it's own name - variational lowerbound. And as for any lowerbound our intention is to maximize it. Here it is (for one sample $z$ per input $x$):

$$\mathcal{L} = -D_{KL}(q_{\phi}(z|x)||p_{\theta}(z)) + \log p_{\theta}(x|z)$$Your next task is to implement two functions that compute KL-divergence and the second term - log-likelihood of an output. Here is some necessary math for your convenience:

$$D_{KL} = \frac{1}{2}\sum_{i=1}^{dimZ}(1+log(\sigma_i^2)-\mu_i^2-\sigma_i^2)$$$$\log p_{\theta}(x|z) = \sum_{i=1}^{dimX}\log p_{\theta}(x_i|z)=\sum_{i=1}^{dimX} \log \Big( \frac{1}{\sigma_i\sqrt{2\pi}}e^{-\frac{(\mu_I-x)^2}{2\sigma_i^2}} \Big)=...$$Don't forget in the code that you are using $\log\sigma$ as variable. Explain, why not $\sigma$?

```
In [ ]:
```def KL_divergence(mu, logsigma):
return 0
def log_likelihood(x, mu, logsigma):
return 0

```
In [ ]:
```lasagne.layers.get_all_layers(l_output) # should be ~9 layers total

Now build the loss and training function:

```
In [ ]:
```# create prediction variable
# prediction =
# create loss function
# ...
# loss = KL_divergence(..., ...) + log_likelihood(..., ..., ...)
# create parameter update expressions
# params =
# updates =
# compile training and testing functions
# train_fn =
# test_fn =

And train the model:

```
In [ ]:
```# train your autoencoder
# visualize progress in reconstruction and loss decay

If you managed to tune your autoencoders to converge and learn something about the world, now it's time to make fun out of it. As you may have noticed, there are face attributes in dataset. We're interesting in "Smiling" column, but feel free to try others as well! Here is the first task:

1) Extract the "Smilling" column as a separate numpy vector and sort this vector.

```
In [ ]:
```attrs[:10]

```
In [ ]:
``````
#show top- and bottom-10 faces of sorted
```

2) Take z-representations of those top images (you can do it only for positive or for both) and average them to find **"vector representation"** of the attribute.

3) Show how **"feature arithmetics"** works with representations of both VAE and conventional autoencoder. Show how to generate an image with preconditioned attribute. Take some sad faces and make them smiling.

4) (If you didn't manage to tune VAE, just show if it works for just AE.) Discuss the results.