Feature Learning for Video Feedback

Victor Shepardson

Dartmouth College Digital Musics

The goal of this project is to discover aesthetically interesting digital video feedback processes by incorporating learned features into a hand constructed feedback process.

Consider a video feedback process defined by the mapping from images to images $x_t = \Delta_\phi(x_{t-1})$, where $\Delta$ is a transition function, $\phi$ is a parameterization which may be spatially varying or interactively controlled, and $x_t$ is the image at time step $t$.

Additionally suppose we have a deep autoencoder $\gamma$ for images: $$h^{\ell+1} = \gamma_\ell(h^\ell)$$ $$h^{\ell} \approx \gamma_\ell^{-1}(h^{\ell+1})$$ $$h^0 = x$$

Combining these two concepts, we can define a new feedback process where position in the feature hierarchy acts like another spatial dimension: $$h_t^\ell = \Delta_\phi( h_{t-1}^\ell, \gamma_{\ell-1}(h_{t-1}^{l-1}), \gamma_\ell^{-1}(h_{t-1}^{\ell+1}) )$$

The goal then is to learn a deep autoencoder which represents abstract image features and admits layer-wise encoding and decoding as above. I propose a convolutional pooling autoencoder based on the convolutional autoencoders of Masci et al. and the upsampling layers of Long et al..

Below I have trained a single layer pooled convolutional autoencoder on the CIFAR-10 dataset using caffe. The code is available at my GitHub. I use a filter size of 3x3x3 and 2x2 max pooling. For this experiment, the data dimensionality is preserved in the intermediate representation by using 12 filters (3 input colors x factor of 4 lost to pooling). I trained on the L2 reconstruction error with momentum but no other regularization. Test error was found to decrease consistently from about 100 at random initialization to about 1.3.



In [4]:

    
#get caffe and pycaffe set up

import numpy as np
import matplotlib.pyplot as plt
import scipy.ndimage
%matplotlib inline

#assuming feature-feedback repo and caffe root are in the same directory
caffe_root = '../../caffe/'
import sys
sys.path.insert(0, caffe_root+'python')

import caffe
from caffe.proto import caffe_pb2
#I have compiled caffe for CPU only (nvidia GPUs only)
caffe.set_mode_cpu()



In [ ]:

    
# L2 reconstruction error for images may not be a fantastic idea in RGB colorspace;
# we may want to preprocess the data by converting to CIELUV or something



In [ ]:

    
#run this cell to solve the model defined in the solver_file
solver_file = 'autoencoder-0-solver.prototxt'
solver = caffe.get_solver(solver_file);
solver.solve();



In [24]:

    
#load the model trained by the previous cell
#(and saved elsewhere in the repo) and set it up on test data
model_def_file = 'autoencoder-0.prototxt'
model_file = '../bin/cifar-tanh-20epoch-unregularized.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[24]:





{'l2_error': array(1.4730193614959717, dtype=float32)}

Visualize Reconstruction

We can pull inputs and reconstructions straight out of the caffe Net and manually undo the mean subtraction:



In [9]:

    
#load the cifar mean into numpy array
blob = caffe_pb2.BlobProto()
data = open('../../caffe/examples/cifar10/mean.binaryproto').read()
blob.ParseFromString(data)
mean = caffe.io.blobproto_to_array(blob)[0].transpose([1,2,0])/256



In [7]:

    
def get_reconstructions(net, mean, n, compare=0):
    inputs = np.hstack([ np.copy(net.blobs['data'].data[i]).transpose([1,2,0])+mean for i in range(n)])
    outputs = np.hstack([ np.copy(net.blobs['decode1neuron'].data[i]).transpose([1,2,0])+mean for i in range(n)])
    #clamp the reconstruction to [0,1]
    #even with tanh activation outputs can be out of bounds once mean is added back
    np.clip(outputs, 0, 1, outputs)
    #compare to cubic resampling through the intermediate spatial resolution
    #this is a good baseline for how well spatial information is stored and 
    #recovered by the convolutional layers
    if compare>0:
        comparisons = np.dsplit(np.copy(inputs), inputs.shape[2])
        comparisons = [scipy.ndimage.zoom(np.squeeze(c), 1./compare, order=3) for c in comparisons]
        comparisons = [scipy.ndimage.zoom(c, compare, order=3) for c in comparisons]
        comparisons = np.dstack(comparisons)
        np.clip(comparisons, 0, 1, comparisons)
        return (inputs, outputs, comparisons)
    return (inputs, outputs)
def vis_reconstructions(rec):
    disp = np.vstack(rec)
    plt.imshow(disp, interpolation='None')



In [60]:

    
rec = get_reconstructions(net, mean, 8, compare=2)
vis_reconstructions(rec)

CIFAR-10 test inputs on top, reconstructions in the middle, cubic interpolation comparison on the bottom. Looks good!

Visualize Filters

Now let's pull our 12 3x3x3 filters out of the model



In [26]:

    
def get_filters(net, layer = 'encode1'):
    filters = np.copy(net.params[layer][0].data).transpose([0,2,3,1])
    biases = np.copy(net.params[layer][1].data)
    print biases
    return filters
def vis_filters(filters, rows):
    #normalize preserving 0 = 50% gray
    filters/=2*abs(filters).max()
    filters+=.5
    disp = np.hstack([np.pad(f,[(1,1),(1,1),(0,0)],'constant', constant_values=[.5]) for f in filters])
    disp = np.vstack(np.hsplit(disp,rows))
    return disp



In [154]:

    
filters = get_filters(net)
disp = vis_filters(filters, 3)
plt.imshow(disp, interpolation='none')









    



[-0.36862749 -0.1923824  -0.22088723 -0.12164118 -0.10508166 -0.49331141
 -0.53125817 -0.48567948 -0.44001433 -0.39040077 -0.32423615 -0.1737113 ]






    Out[154]:





<matplotlib.image.AxesImage at 0x7d4510d2dc50>

Looks like the network mostly learned localized primary and secondary color detectors. Weird! These aren't the usual edge filters, but they seem to at least have some plausible structure.

Visualize filter responses

Now let's see the (max pooled) reponses of all 12 filters to a few inputs:



In [5]:

    
def get_responses(net, layer, filts, n):
    reps = np.hstack([ net.blobs[layer].data[i].transpose([1,2,0]) for i in range(n)])
    # normalize preserving 0 = 50% gray
    reps/=2*abs(reps).max()
    reps+=.5
    reps = np.vstack(np.dsplit(reps, filts))
    return reps.squeeze()    
def vis_responses(reps):
    plt.figure(figsize=(10,10))
    plt.imshow(reps, interpolation='none', cmap='coolwarm')



In [156]:

    
reps = get_responses(net, 'pool1', 12, 8)
vis_responses(reps)









    



(192, 128, 1)






    Out[156]:





<matplotlib.image.AxesImage at 0x7d4510c0da90>

Pooled activations for each of 12 filters. Red is positive response, blue negative.

Dimensionality Reduction

Let's try fewer filters, reducing dimensionality in the intermediate representation



In [3]:

    
solver_file = 'autoencoder-1-solver.prototxt'
solver = caffe.get_solver(solver_file)
solver.solve()



In [66]:

    
model_def_file = 'autoencoder-1.prototxt'
model_file = '../bin/cifar-tanh-20epoch-squeezing.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[66]:





{'l2_error': array(2.5404462814331055, dtype=float32)}



In [67]:

    
rec = get_reconstructions(net, mean, 8, compare=2)
vis_reconstructions(rec)

This time there's some clear loss of detail. The filters are doing something, though; looks better than cubic interpolation



In [24]:

    
filters = get_filters(net)
disp = vis_filters(filters, 2)
plt.imshow(disp, interpolation='none')









    



[-0.09201549 -0.12697266 -0.11692226 -0.10681173 -0.09015708 -0.11839788]






    Out[24]:





<matplotlib.image.AxesImage at 0x7e93b02c8710>

These filters appear to be learning color gradients in a subtractive color space



In [ ]:



In [31]:

    
reps = get_responses(net, 'pool1', 6, 8)
vis_responses(reps)

Another Architecture

What about more aggressively reshaping the data within a single layer? How about larger 5x5 filters, 4x4 pooling, and the corresponding large number of filters (3x4x4 = 48)



In [ ]:

    
solver_file = 'autoencoder-2-solver.prototxt'
solver = caffe.get_solver(solver_file)
solver.solve('autoencoder-2_iter_20000.solverstate')



In [3]:

    
model_def_file = 'autoencoder-2.prototxt'
#model_file = '../bin/cifar-tanh-20epoch-squeezing-pool3.caffemodel'
model_file = 'autoencoder-2_iter_20000.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[3]:





{'l2_error': array(5.954926013946533, dtype=float32)}



In [10]:

    
rec = get_reconstructions(net, mean, 8, compare=4)
vis_reconstructions(rec)

This looks worse than the dimensionality reducing version even. By 40 epochs training had slowed to a crawl.



In [12]:

    
filters = get_filters(net)
disp = vis_filters(filters, 6)
plt.imshow(disp, interpolation='none')









    



[-0.5458473  -0.85954159 -0.10250413 -0.11447078 -0.01339798 -0.31032813
 -0.02660202 -0.77891278  0.04860483  0.17118894 -0.18338218  0.03620156
 -0.21440832 -0.74913269 -0.677212   -0.11030301 -0.01261417 -0.31087366
 -0.56522584 -0.5849033  -0.3219969  -0.14499842  0.04703386 -0.71903616
 -0.64787728 -0.12493432 -0.73442465 -0.80459327 -0.23484692 -0.03186548
  0.02129112 -0.08047031 -0.23331846 -0.149628   -0.72716242 -0.1825075
  0.12050974 -0.21958451 -0.36699957 -0.15040933  0.08417189 -0.1206205
 -0.59321886  0.01488551 -0.28670374 -0.62786371  0.13887869 -0.20581132]






    Out[12]:





<matplotlib.image.AxesImage at 0x7c728a3d36d0>

These filters look like noisy edge detectors. Something prevented the training from finding a good minimum



In [14]:

    
reps = get_responses(net, 'pool1', 18, 8)
vis_responses(reps)

Better Reconstruction

How many parameters do we need to get near-perfect reconstruction with a single layer? Let's step the orginal 2x2 pooling architecture up to 5x5 filters, leaving the reconstruction filters alone.



In [ ]:

    
solver_file = 'autoencoder-6-solver.prototxt'
solver = caffe.get_solver(solver_file)
solver.solve('autoencoder-6_iter_10000.solverstate')



In [10]:

    
model_def_file = 'autoencoder-6.prototxt'
model_file = 'autoencoder-6_iter_20000.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[10]:





{'l2_error': array(1.173688292503357, dtype=float32)}



In [11]:

    
rec = get_reconstructions(net, mean, 8, compare=4)
vis_reconstructions(rec)



In [12]:

    
filters = get_filters(net)
disp = vis_filters(filters, 3)
plt.imshow(disp, interpolation='none')









    



[-0.13053145 -0.51994026 -0.13268141 -0.24337031 -0.35381192 -0.31960326
 -0.1194208  -0.48448056 -0.23158208 -0.30801979 -0.3582561  -0.30901095]






    Out[12]:





<matplotlib.image.AxesImage at 0x7e36041d89d0>

Interesting--these look like 3x3 filters with a random fringe. Curiously it learned better than the first architecture above, even though the extra pixels appear to be wasted. Perhaps it got a better random initialization, or the filter noisiness acts like a kind of regularization. It may have learned small filters because the reconstruction filter size was too small. Let's bump that up too:



In [9]:

    
solver_file = 'autoencoder-7-solver.prototxt'
solver = caffe.get_solver(solver_file)
solver.solve('autoencoder-7_iter_10000.solverstate')



In [5]:

    
model_def_file = 'autoencoder-7.prototxt'
model_file = 'autoencoder-7_iter_40000.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[5]:





{'l2_error': array(0.9563984870910645, dtype=float32)}



In [10]:

    
rec = get_reconstructions(net, mean, 8, compare=4)
vis_reconstructions(rec)



In [27]:

    
filters = get_filters(net)
disp = vis_filters(filters, 3)
plt.imshow(disp, interpolation='none')









    



[-0.12125222 -0.08644268 -0.33110973 -0.0643307  -0.09994981 -0.36133596
 -0.30555385 -0.06790387 -0.26083776 -0.45829996 -0.09892169 -0.32911256]






    Out[27]:





<matplotlib.image.AxesImage at 0x7e3cac0acd50>

The more expressive decoder did reduce error and visual fidelity is now very close to perfect. It did not change the noisy-fringed character of the learned filters. The center filters are mostly in pairs which appear to be mirrors, rotations and/or or color inverses. neat!



In [16]:

    
reps = get_responses(net, 'pool1', 12, 8)
vis_responses(reps)

We could keep going to 7x7 encoders and 8x8 decoders; but at some point I expect larger filters to have trouble with CIFAR since the images are so tiny. With 7x7 filters about half of all convolutions are going to include some padding.

Dump to texture

Let's dump the weights from the best 1-layer model to an image file so openFrameworks can load them into OpenGL



In [37]:

    
def dump_to_img(net, nlayers):
    for l in range(1, nlayers+1):
        encode_name = 'encode'+str(l)
        decode_name = 'decode'+str(l)
        #move source channel to innermost dimension
        filters = np.copy(net.params[encode_name][0].data).transpose([0,2,3,1])
        biases = np.copy(net.params[encode_name][1].data)
        np.save(encode_name+'-filters', filters)
        np.save(encode_name+'-biases', biases)
        filters = np.copy(net.params[decode_name][0].data).transpose([1,2,3,0])
        biases = np.copy(net.params[decode_name][1].data)
        np.save(decode_name+'-filters', filters)
        np.save(decode_name+'-biases', biases)



In [38]:

    
dump_to_img(net, 1)



In [39]:

    
#get_filters(net)
np.copy(net.params['decode1'][0].data).transpose([1,2,3,0])









    Out[39]:





array([[[[  4.03216295e-03,  -4.20631021e-02,   3.99138443e-02, ...,
            2.09246799e-02,  -5.18861972e-02,  -9.08465404e-03],
         [ -2.40608025e-03,  -8.69262144e-02,  -1.49415452e-02, ...,
           -3.65490606e-03,  -1.49783939e-02,   1.22024193e-02],
         [  4.47687618e-02,  -1.28992438e-01,  -1.61473289e-01, ...,
           -1.01139754e-01,  -5.46306185e-02,   1.17239036e-01],
         [  4.92709875e-02,  -4.81632091e-02,  -1.44202903e-01, ...,
           -3.80724780e-02,  -1.35013700e-01,   6.01607412e-02],
         [ -2.50295252e-02,  -2.84774438e-03,  -5.50312288e-02, ...,
            5.94475120e-03,   2.43090442e-03,  -3.36784404e-03],
         [ -2.29858607e-03,  -5.88872768e-02,   3.18653695e-02, ...,
           -3.00800800e-03,   2.89087892e-02,   4.91478341e-03]],

        [[  7.59510174e-02,  -7.21451938e-02,   1.66300219e-02, ...,
            4.52869758e-02,  -8.47237483e-02,  -4.72933464e-02],
         [ -2.42425664e-03,   1.25537798e-01,   1.53963557e-02, ...,
            9.60450992e-03,   5.38892411e-02,  -2.47304775e-02],
         [ -1.05233058e-01,   2.24677995e-01,   2.74912491e-02, ...,
            9.93214399e-02,   1.69565409e-01,  -1.30087152e-01],
         [ -7.08488300e-02,  -1.05821649e-02,   9.47466716e-02, ...,
            7.05352359e-05,   3.56314555e-02,   1.25668440e-02],
         [ -6.18487102e-05,  -3.56983542e-02,  -7.56984390e-03, ...,
            4.28682789e-02,  -4.97329608e-02,  -4.67827916e-02],
         [  6.22607134e-02,   1.59207452e-02,   5.16994372e-02, ...,
            6.95821829e-04,   1.44188292e-02,  -7.22697144e-03]],

        [[  1.17740845e-02,  -9.96363387e-02,  -1.08503126e-01, ...,
           -2.73403302e-02,  -2.04265844e-02,   3.48444916e-02],
         [  6.02563061e-02,   1.80322275e-01,   5.76895885e-02, ...,
           -5.20932265e-02,  -5.48173450e-02,   5.04394174e-02],
         [  3.54836196e-01,   3.63981485e-01,   5.24734378e-01, ...,
            4.27357256e-01,  -2.91826993e-01,  -4.90959018e-01],
         [  3.86703104e-01,  -5.66622289e-03,   6.64241254e-01, ...,
           -6.39766175e-03,  -3.32941264e-01,   1.71451066e-02],
         [  7.36574158e-02,   3.75183008e-04,   1.27894497e-02, ...,
            1.36568882e-02,  -1.03085332e-01,  -1.11893835e-02],
         [ -2.32244041e-02,   4.09938172e-02,  -7.98513368e-02, ...,
           -5.49649037e-02,   5.58785349e-02,   6.15053028e-02]],

        [[  5.98818436e-02,  -4.48243879e-03,  -7.68181309e-02, ...,
           -6.27159029e-02,  -1.16038308e-01,   7.52339810e-02],
         [  4.46237996e-03,   3.27247567e-02,   1.44889757e-01, ...,
           -1.10684805e-01,  -2.05882266e-02,   1.25286415e-01],
         [ -3.82586084e-02,  -7.49145970e-02,   6.68470144e-01, ...,
            3.61590862e-01,  -1.01109287e-02,  -4.09647912e-01],
         [ -5.62037267e-02,   1.02268727e-02,   7.86503136e-01, ...,
           -1.28465211e-02,  -5.22819581e-03,   1.41345691e-02],
         [ -1.29618905e-02,   9.39374715e-02,   9.80711430e-02, ...,
           -7.63982115e-03,  -2.01034956e-02,   1.10646719e-02],
         [  5.13178445e-02,  -7.13376179e-02,  -4.92288098e-02, ...,
           -6.68501109e-02,  -2.37916093e-02,   8.98616835e-02]],

        [[  8.46937001e-02,  -5.75644784e-02,   1.92415938e-02, ...,
            3.24664414e-02,  -1.54087320e-01,  -3.92027386e-02],
         [ -1.72140878e-02,   5.46503887e-02,   2.72305924e-02, ...,
           -5.22541963e-02,   1.38580641e-02,   5.37954606e-02],
         [ -1.17006369e-01,  -3.91871147e-02,   1.57756105e-01, ...,
            9.76430029e-02,   7.72150531e-02,  -1.26754344e-01],
         [ -1.78034797e-01,  -3.01723983e-02,   2.13179782e-01, ...,
            5.55272913e-03,   1.03763282e-01,  -5.07254805e-03],
         [ -2.65403315e-02,  -1.63660932e-03,   5.68110719e-02, ...,
            1.34566557e-02,   1.03080133e-02,  -2.00057086e-02],
         [  6.84533417e-02,  -5.91116957e-02,   4.48397920e-02, ...,
           -2.86400262e-02,  -3.74314301e-02,   3.83197106e-02]],

        [[ -4.98132221e-02,  -9.95196775e-02,   3.69135737e-02, ...,
            6.32792637e-02,   3.14879194e-02,  -6.75493702e-02],
         [ -6.11253036e-03,   4.72099259e-02,  -5.64190522e-02, ...,
            2.25671232e-02,   1.66208800e-02,  -2.21668798e-02],
         [  2.62043983e-01,   1.27947539e-01,  -1.22736625e-01, ...,
            6.17007911e-03,  -2.14129150e-01,  -1.92026310e-02],
         [  2.48923123e-01,  -3.28711681e-02,  -9.99329761e-02, ...,
            1.70476157e-02,  -2.10298419e-01,  -1.39061036e-02],
         [  3.59728150e-02,  -4.17909361e-02,  -2.46050693e-02, ...,
            2.42333841e-02,  -1.85040068e-02,  -3.24059092e-02],
         [ -4.74694371e-02,   2.20093988e-02,   2.87477747e-02, ...,
            7.39659043e-03,   4.94482592e-02,  -1.29361153e-02]]],


       [[[ -3.23815481e-03,  -2.88534793e-03,   1.58877000e-02, ...,
            2.34516449e-02,  -3.49045321e-02,  -2.00051721e-02],
         [  3.49229929e-04,  -7.29385912e-02,  -2.37001833e-02, ...,
           -2.88592298e-02,  -2.37906147e-02,   3.70391980e-02],
         [  5.02705313e-02,  -1.40603542e-01,  -1.07505083e-01, ...,
           -1.08145036e-01,  -5.40324226e-02,   1.20526582e-01],
         [  4.88180444e-02,  -1.14622293e-02,  -7.48219192e-02, ...,
           -5.43338768e-02,  -9.61385518e-02,   6.68083504e-02],
         [ -1.07425591e-02,   2.28076358e-03,  -3.29191126e-02, ...,
            1.13378083e-02,  -1.27491904e-02,  -1.32958740e-02],
         [  6.14055200e-03,  -6.92184269e-02,   3.64572257e-02, ...,
            1.03684654e-02,   2.01310031e-02,  -9.67217423e-03]],

        [[  4.32182588e-02,  -4.61493023e-02,   1.20816994e-02, ...,
            6.48042038e-02,  -5.04832678e-02,  -7.51333162e-02],
         [ -1.72354784e-02,   3.92317623e-02,   5.42158086e-04, ...,
            4.22113016e-03,   4.00711410e-02,  -1.64803620e-02],
         [ -7.96659291e-02,   5.15556857e-02,   4.50972235e-03, ...,
            1.19155884e-01,   1.36696830e-01,  -1.53936833e-01],
         [ -5.12689203e-02,  -7.78877884e-02,   8.38602930e-02, ...,
            1.59931034e-02,   8.70975181e-02,  -1.90954003e-02],
         [  3.13486415e-03,  -4.41946872e-02,  -1.23843353e-03, ...,
            5.80435246e-02,  -2.64921505e-02,  -7.04168230e-02],
         [  5.32890558e-02,   2.11990867e-02,   7.09931701e-02, ...,
            1.84446275e-02,   1.85539573e-02,  -2.70021781e-02]],

        [[ -2.78675314e-02,  -1.33101463e-01,  -6.40220195e-02, ...,
           -5.90159604e-03,   2.30728518e-02,   1.21023189e-02],
         [  2.48424541e-02,  -5.11885211e-02,   3.57067771e-02, ...,
           -1.01864869e-02,  -3.31351794e-02,   1.15374150e-02],
         [  3.78249019e-01,  -4.04330194e-02,   3.21463525e-01, ...,
            5.07261992e-01,  -3.02187264e-01,  -5.83989441e-01],
         [  4.07649040e-01,  -2.80138314e-01,   4.54649240e-01, ...,
            6.67054802e-02,  -2.58413970e-01,  -7.60869682e-02],
         [  5.07382527e-02,  -1.02950446e-01,  -7.54837086e-03, ...,
            2.19967663e-02,  -3.98676470e-02,  -2.57832371e-02],
         [ -5.59694506e-02,   8.16669874e-03,  -1.61072705e-02, ...,
           -7.38371462e-02,   7.35075697e-02,   8.05303752e-02]],

        [[  3.30656879e-02,   2.35542399e-03,  -4.49378341e-02, ...,
           -3.81044224e-02,  -5.41382432e-02,   5.21285869e-02],
         [ -3.02817784e-02,  -1.24193825e-01,   1.03856534e-01, ...,
           -5.05047143e-02,   6.98511526e-02,   6.10189959e-02],
         [ -3.58581580e-02,  -4.13704991e-01,   4.60234046e-01, ...,
            4.69189614e-01,   6.16981350e-02,  -5.41904211e-01],
         [ -4.81851622e-02,  -2.22571149e-01,   5.74605405e-01, ...,
            7.74564445e-02,   7.82595277e-02,  -9.88625437e-02],
         [ -3.11569702e-02,   1.32406326e-02,   6.59967065e-02, ...,
            3.32253915e-03,   3.11849220e-03,  -5.67305554e-03],
         [  2.46231910e-02,  -1.10908583e-01,   1.01592988e-02, ...,
           -9.18000937e-02,  -3.25090662e-02,   1.14020966e-01]],

        [[  9.28953737e-02,  -1.41159901e-02,  -1.00226039e-02, ...,
            2.75074113e-02,  -1.38153240e-01,  -3.47243920e-02],
         [ -2.82054357e-02,   4.68464978e-02,  -3.39669436e-02, ...,
           -5.08068651e-02,   6.34109825e-02,   4.70739007e-02],
         [ -1.73962072e-01,  -1.32308304e-01,   1.06639646e-01, ...,
            1.16023198e-01,   1.82580039e-01,  -1.55477852e-01],
         [ -2.36827418e-01,  -5.05159982e-02,   1.64156333e-01, ...,
            2.45372746e-02,   1.90703228e-01,  -3.25570814e-02],
         [ -3.84627543e-02,   2.80490909e-02,   9.30380821e-03, ...,
            1.40004708e-02,   1.47556504e-02,  -2.37673167e-02],
         [  7.40995407e-02,  -6.32848665e-02,   3.14428210e-02, ...,
           -3.18238623e-02,  -4.70532626e-02,   3.85356620e-02]],

        [[ -3.00224349e-02,  -8.12198743e-02,   2.84497766e-03, ...,
            3.61789502e-02,   9.29461420e-03,  -4.35607508e-02],
         [  9.65221599e-03,   6.33258075e-02,  -9.70540643e-02, ...,
           -1.93784516e-02,  -1.17870905e-02,   1.96574200e-02],
         [  2.29469493e-01,   1.23902529e-01,  -1.03179909e-01, ...,
           -3.71640623e-02,  -1.88393071e-01,   3.08058783e-02],
         [  2.00034648e-01,  -2.55428208e-03,  -7.64253363e-02, ...,
           -1.81666333e-02,  -1.74245819e-01,   2.74679251e-02],
         [  3.27562466e-02,  -1.74979307e-02,  -5.56623936e-02, ...,
            1.14212641e-02,  -1.42757846e-02,  -1.76058318e-02],
         [ -3.60213704e-02,   9.62912384e-03,  -3.33128759e-04, ...,
            4.12927754e-03,   5.78905083e-02,  -1.09686358e-02]]],


       [[[ -7.40190921e-03,   2.89348066e-02,  -6.16005505e-04, ...,
            4.98496592e-02,  -1.38564380e-02,  -4.99061011e-02],
         [  1.18872803e-02,  -6.07928187e-02,  -5.60218059e-02, ...,
           -5.16841263e-02,   3.84128303e-03,   6.63939863e-02],
         [  4.15283702e-02,  -1.27302721e-01,  -1.17820531e-01, ...,
           -1.39667869e-01,  -3.20530944e-02,   1.70773759e-01],
         [  2.95611303e-02,   3.87423746e-02,  -7.79087320e-02, ...,
           -1.06528580e-01,  -5.25491647e-02,   1.29089937e-01],
         [ -1.16713140e-02,   2.21728906e-02,  -6.08355850e-02, ...,
            1.83111720e-03,  -8.78394768e-03,   2.29187729e-03],
         [ -1.74529734e-03,  -4.10717428e-02,   8.18758830e-03, ...,
            3.88212018e-02,   9.88711789e-03,  -3.17524970e-02]],

        [[  4.61883172e-02,  -4.14499305e-02,   3.88424322e-02, ...,
            7.71750733e-02,  -4.11632322e-02,  -9.52902362e-02],
         [  5.02639962e-03,   3.32950568e-03,  -7.91201647e-03, ...,
           -2.60420162e-02,   1.86339226e-02,   1.75970625e-02],
         [ -1.06771968e-01,   6.02070130e-02,  -7.19220489e-02, ...,
            1.13323875e-01,   1.01065576e-01,  -1.31246611e-01],
         [ -1.03593141e-01,   1.67397317e-02,  -6.74593588e-03, ...,
            1.36637837e-02,   1.23551652e-01,  -1.14019336e-02],
         [ -1.02599002e-02,  -3.59173976e-02,  -4.97641265e-02, ...,
            5.00585996e-02,  -2.45725154e-03,  -6.38867393e-02],
         [  4.54480574e-02,   1.63916114e-03,   5.56005165e-02, ...,
            3.41097489e-02,   7.09401444e-03,  -4.33074012e-02]],

        [[ -1.34610021e-02,  -1.10898435e-01,   3.90732400e-02, ...,
           -1.66203026e-02,   9.73978825e-03,   9.41250008e-03],
         [  7.05435872e-02,  -2.63388287e-02,   6.94869757e-02, ...,
            2.48044115e-02,  -1.03246793e-01,  -4.14328948e-02],
         [  3.41825366e-01,   7.40463212e-02,   1.33427024e-01, ...,
            6.53155684e-01,  -3.87175173e-01,  -7.48176396e-01],
         [  3.51606399e-01,  -2.81264801e-02,   2.26930276e-01, ...,
            2.70282745e-01,  -2.33814865e-01,  -2.97438741e-01],
         [  6.02640584e-02,  -2.83471718e-02,  -6.48693293e-02, ...,
            6.69363067e-02,  -3.30096744e-02,  -7.85519928e-02],
         [ -4.32719514e-02,  -8.34823772e-03,   1.35341398e-02, ...,
           -8.24445188e-02,   5.76854125e-02,   8.39453265e-02]],

        [[  4.52150106e-02,   5.98548502e-02,   3.93639803e-02, ...,
           -3.06879636e-02,  -5.06301336e-02,   3.40502225e-02],
         [  6.53816527e-03,  -2.78493911e-02,   1.19407810e-01, ...,
            2.59121228e-02,   6.11114837e-02,  -3.97015288e-02],
         [ -7.22889453e-02,  -2.28029057e-01,   2.60478884e-01, ...,
            6.49444342e-01,   2.90818792e-02,  -7.54503250e-01],
         [ -8.79795998e-02,   7.46027082e-02,   3.41139197e-01, ...,
            2.96744019e-01,   9.25956592e-02,  -3.39878559e-01],
         [ -1.70205012e-02,   1.29456624e-01,   6.74765836e-03, ...,
            5.35186119e-02,  -4.31080209e-03,  -5.63006662e-02],
         [  2.76087411e-02,  -9.89342332e-02,   3.87469940e-02, ...,
           -1.10005170e-01,  -4.02183048e-02,   1.34907901e-01]],

        [[  8.21973309e-02,  -1.48918442e-02,   2.41599847e-02, ...,
            5.95984533e-02,  -1.37505949e-01,  -7.05475584e-02],
         [ -1.18194576e-02,   3.64837050e-02,  -1.24268662e-02, ...,
           -1.23066595e-02,   6.68740496e-02,   6.30637922e-04],
         [ -1.71944425e-01,  -7.81249329e-02,   4.04031426e-02, ...,
            1.53205976e-01,   1.65507227e-01,  -2.03984126e-01],
         [ -1.96872532e-01,   5.99003993e-02,   8.90310779e-02, ...,
            5.09112217e-02,   1.27542809e-01,  -6.81826845e-02],
         [ -9.39696096e-03,   5.47911339e-02,  -2.60598250e-02, ...,
            1.48246801e-02,  -8.36346857e-03,  -2.08242219e-02],
         [  6.33778423e-02,  -8.30988437e-02,   4.12733033e-02, ...,
           -4.12551686e-02,  -4.16599140e-02,   5.14846593e-02]],

        [[ -4.95371073e-02,  -8.35619122e-02,   1.55683449e-02, ...,
            6.65261447e-02,   1.21528413e-02,  -7.67670944e-02],
         [  2.47098245e-02,   3.06585990e-02,  -7.64812604e-02, ...,
           -6.97537884e-03,  -1.22155473e-02,   7.20185600e-03],
         [  2.52628893e-01,   9.91167873e-02,  -9.29374620e-02, ...,
           -5.91587871e-02,  -2.05277160e-01,   6.16275743e-02],
         [  2.55762398e-01,   2.01649661e-03,  -6.11171573e-02, ...,
           -5.01095988e-02,  -2.35551640e-01,   6.31371662e-02],
         [  6.07769191e-02,  -3.09080221e-02,  -6.69595078e-02, ...,
            7.09951390e-03,  -3.70408855e-02,  -4.70916415e-03],
         [ -5.31263426e-02,  -3.63115850e-03,  -1.25650084e-03, ...,
            1.44356051e-02,   6.91798553e-02,  -1.50328130e-02]]]], dtype=float32)

Deeper Network

Let's try stacking another layer on the last architecture, above. We'll freeze the first encoder/decoder and treat the first pooling layer as input to a new autoencoder.



In [ ]:

    
solver_file = 'autoencoder-8-solver.prototxt'
solver = caffe.get_solver(solver_file)
#initialize the first layer with previously trained weights
#first let's try stacking with the lower weights frozen
pre_net = caffe.Net('autoencoder-7.prototxt', 'autoencoder-7_iter_40000.caffemodel', caffe.TEST)
for layer in ['encode1', 'decode1']:
    solver.net.params[layer][0].data[:] = pre_net.params[layer][0].data
    solver.net.params[layer][1].data[:] = pre_net.params[layer][1].data
solver.solve()



In [7]:

    
model_def_file = 'autoencoder-8.prototxt'
#model_file = '../bin/cifar-tanh-20epoch-2layer.caffemodel'
model_file = 'autoencoder-8_iter_40000.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[7]:





{'l2_error1': array(5.685671329498291, dtype=float32),
 'l2_error2': array(2.0855603218078613, dtype=float32)}



In [8]:

    
rec = get_reconstructions(net, mean, 8, compare=4)
vis_reconstructions(rec)

The loss for the new layer went pretty low, but the overall reconstruction error is high. Let's try fine tuning on all weights with original L2 reconstruction error



In [2]:

    
solver_file = 'autoencoder-9-solver.prototxt'
solver = caffe.get_solver(solver_file)
#initialize the first layer with previously trained weights
#this time bring over all the parameters
pre_net = caffe.Net('autoencoder-8.prototxt', 'autoencoder-8_iter_40000.caffemodel', caffe.TEST)
for layer in ['encode1', 'decode1', 'encode2', 'decode2']:
    solver.net.params[layer][0].data[:] = pre_net.params[layer][0].data
    solver.net.params[layer][1].data[:] = pre_net.params[layer][1].data
solver.solve()



In [6]:

    
model_def_file = 'autoencoder-9.prototxt'
model_file = '../bin/cifar-tanh-60epoch-2layer-finetuned-dualobjective.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[6]:





{'l2_error1': array(3.2483773231506348, dtype=float32),
 'l2_error2': array(0.21920974552631378, dtype=float32)}



In [8]:

    
rec = get_reconstructions(net, mean, 8, compare=4)
vis_reconstructions(rec)

Fine tuning reduced both parts of the loss, but still looks much worse than the single layer model



In [7]:

    
filters = get_filters(net)
disp = vis_filters(filters, 3)
plt.imshow(disp, interpolation='none')









    



[ 0.05174831 -0.02071226 -0.15018784  0.01387358  0.00702072 -0.02757885
 -0.11727612  0.00808964  0.01447153 -0.16562356  0.06779021 -0.09179834]






    Out[7]:





<matplotlib.image.AxesImage at 0x7e4e24997e10>

Fine tuning on the first layer appears to have corrupted the nice filters we had before



In [ ]:



In [9]:

    
#map triples of filters to colors
reps = get_responses(net, 'pool2', 16, 8)
vis_responses(reps)

Going Deeper

Can we train 3 layers from scratch?



In [17]:

    
solver_file = 'autoencoder-4-solver.prototxt'
solver = caffe.get_solver(solver_file)
solver.solve()



In [18]:

    
model_def_file = 'autoencoder-4.prototxt'
model_file = '../bin/cifar-tanh-40epoch-3layer.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[18]:





{'l2_error': array(6.462570667266846, dtype=float32)}



In [19]:

    
rec = get_reconstructions(net, mean, 8, compare=8)
vis_reconstructions(rec)

Again, we are recovering a lot of spatial detail. Deeper is still worse, will be interesting to see how heavier training improves the situation



In [20]:

    
#map triples of filters to colors
reps = get_responses(net, 'pool3', 64, 8)
vis_responses(reps)



In [21]:

    
filters = get_filters(net)
disp = vis_filters(filters, 3)
plt.imshow(disp, interpolation='none')









    



[  2.43100571e-03  -2.35213377e-02   2.02042498e-02  -2.02219877e-02
  -3.02740522e-02  -6.09966228e-04   1.50828494e-03   1.43372146e-02
   2.37179417e-02   2.72328760e-02  -5.83042763e-03   4.77930262e-05]






    Out[21]:





<matplotlib.image.AxesImage at 0x7f1a9b921a90>

~~These first-layer filters are hard to interpret, but do seem to have some internal color coordination and symmetry.~~ Most deep convolutional architectures start with a large number of filters; maybe having just 4x the number of colors is asking each filter to do too many things. Then again, maybe that isn't a problem for anything besides filter visualization.

update

Using Adagrad gave similar error, but much more random looking filters, above

Rectifiers

ReLUs have helped to train very deep networks. For a classifier, it's not a problem to have zero mean inputs but nonnegative hidden+output layers. For this application, we rely on hidden layers having the same image properties as the input. Can we get rid of the mean subtraction and use a non negative image representation with ReLU instead of tanh units? Let's start back at the 1-layer, dimensionality preserving autoencoder:



In [13]:

    
solver_file = 'autoencoder-5-solver.prototxt'
solver = caffe.get_solver(solver_file)
solver.solve()



In [14]:

    
model_def_file = 'autoencoder-5.prototxt'
model_file = '../bin/cifar-relu-20epoch.caffemodel'
net = caffe.Net(model_def_file, model_file, caffe.TEST)
#run a batch
net.forward()









    Out[14]:





{'l2_error': array(3.137385845184326, dtype=float32)}



In [15]:

    
rec = get_reconstructions(net, np.zeros(mean.shape), 8, compare=2)
vis_reconstructions(rec)

The ReLU units work with either a reduced learning rate or increased lr and the Adagrad solver, though still not quite as well as tanh units.



In [16]:

    
filters = get_filters(net)
disp = vis_filters(filters, 3)
plt.imshow(disp, interpolation='none')









    



[-0.00570135 -0.01200322  0.01738192 -0.0238696   0.07345211  0.0205862
  0.02235054 -0.01281478  0.03866118 -0.01633324  0.00320861  0.14468348]






    Out[16]:





<matplotlib.image.AxesImage at 0x7e1a9c313810>



In [ ]: