In this homework you are to imlement A Neural algorithm of artistic style. This is an extension of Texture Synthesis Using Convolutional Neural Networks method.
The core of the method -- VGG and constrained optimization. The constrains are of two types: content and style. Given a content image C and style image S we want to generate an image X with content from C and style (whatever it really means) from S.
We want to design a loss function for the optimization process. Considering [1], [2], an input image is easily invertable from the outputs at intermediate layers. This explains the idea of making an intermediate representation $F_X$ of X close to C representation $F_C$.
$$ L_{content} = || F_X - F_C || \rightarrow \min_X $$Note, that representation $F$ preserve spatial information. Idea: let us dismiss it, so we will know what objects are there on the picture, but will not be able to reestablish their localtion. The style can be thought as something independent of content, something we are left with if we let the content off. L. Gatys suggests to dismiss spatial information by computing correlations between the feature maps $F$. If $F$ has dimensions CxWxH
, then correlation matrix will be CxC
, and look there's no spatial dimentions. So the style term will be responsible for mathing these correlation (Gram) matrices.
And finaly we combine the two.
$$ L = \alpha L_{content} + \beta L_{style} \min_X $$Read the paper and the code for the details on layers, features $F$ are got from.
Actually the idea comes from 90th, when mathematical models of texures were developed [3]. They defined a probabolistic model for texture generation. They used an idea, that two images are indeed two samples of a particular texture iff their statistics match. The statistics used are histograms of given texture $I$ filtered with a number of filters: $\{hist(F_i * I), \quad i = 1,\dots, k\}$. And whatever image has the same statistics is thought as a sample of texture $I$. The main drawback was the Gibbs sampling was employed (which is very slow). [4] suggested exactly the scheme we use now: starting from a random image, let's adjust its statistics iteratively so they match the desired.
Now, what is changed: the filters. [4] used carefully crafted set of filters, and now we use neural network based non-linear filters. We still use the idea of matching statistics, but the statistics improved.
[1] A.Mahendran, A.Vedaldi Understanding Deep Image Representations by Inverting Them
[2] A.Dosovitsky, T.Brox Inverting Visual Representations with Convolutional Networks
[3] Zhu et. al. Filters, 1997 Random Fields and Maximum Entropy (FRAME): Towards a Unified Theory for Texture Modeling
[4] Portilla & Simoncelli, 2000 A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients
To prevent you from technical problems, you may use a complete code for the method. Your task will be to play around with it.
Common mandatory part:
We give you two options for the second part.
First one (if you are lazy or do not have GPU do just this):
C
which containes means over feature mapsSecond one (hardcore):
Do everything in this notebook, I need your code as well as the generated images
HINTS:
In case you do not have GPU, you need to substitute the line:
from lasagne.layers.dnn import Conv2DDNNLayer as ConvLayer
with
from lasagne.layers import Conv2DLayer as ConvLayer