Wayne H Nixalo - 09 Aug 2017
FADL2 - Lesson 9: Generative Models
-- Rerunning Neural Style Notebook;
NOTE: Keras version: 1.2.2
More Notes:
This notebook started as an attempt to run things faster on a GPU, until I realized the code isn't using the GPU... and that's what PyTorch is for.
Note also: the lesson JNB has been updated since the lecture; this codealong mostly follows along the lecture.
Lesson 9 Notebooks: neural-style neural-sr
PyTorch NB: neural-style-pytorch
In [1]:
%matplotlib inline
import importlib
# import utils2; importlib.reload(utils2) # Py3 reload
import os, sys; sys.path.insert(1, os.path.join('../utils'))
from utils2 import *
from scipy.optimize import fmin_l_bfgs_b
from scipy.misc import imsave
import scipy
from keras import metrics
from vgg16_avg import VGG16_Avg
In [2]:
limit_mem() # TF as of now will give you a bad time if you don't do this (generally)
In [ ]:
path = '../data/nst/'
dpath = '../data/'
In [ ]:
fnames = pickle.load(open(path + 'fnames.pkl','rb'))
In [ ]:
img = Image.open(path + fnames[0]); img
In [ ]:
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)
preproc = lambda x: (x - rn_mean)[:, :, :, ::-1]
img_arr = preproc(np.expand_dims(np.array(img), 0))
shp = img_arr.shape
deproc = lambda x,s : np.clip(x.reshape(s)[:, :, :, ::-1] + rn_mean, 0, 255)
In [ ]:
model = VGG16_Avg(include_top=False)
layer = model.get_layer('block5_conv1').output
layer_model = Model(model.input, layer)
targ = K.variable(layer_model.predict(img_arr))
In [8]:
class Evaluator(object):
def __init__(self, f, shp): self.f, self.shp = f, shp
def loss(self, x):
loss_, self.grad_values = self.f([x.reshape(self.shp)])
return loss_.astype(np.float64)
def grads(self, x): return self.grad_values.flatten().astype(np.float64)
# oo lala float64, quite a bit of precision
loss = metrics.mse(layer, targ)
grads = K.gradients(loss, model.input)
fn = K.function([model.input], [loss]+grads)
evaluator = Evaluator(fn, shp)
def solve_image(eval_obj, niter, x):
for i in range(niter):
x, min_val, info = fmin_l_bfgs_b(eval_obj.loss, x.flatten(),
fprime=eval_obj.grads, maxfun=20)
x = np.clip(x, -127,127)
print('Current loss value:', min_val)
imsave(f'{path}/results/res_at_iteration_{i}.png', deproc(x.copy(), shp)[0])
return x
In [11]:
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape)/100
x = rand_img(shp)
plt.imshow(x[0]);
In [12]:
iterations = 10
x = solve_image(evaluator, iterations, x)
In [13]:
Image.open(path + 'results/res_at_iteration_9.png')
Out[13]:
In [14]:
# Rerunning by calculating loss from output of Conv 1 Block 4:
layer = model.get_layer('block4_conv1').output
layer_model = Model(model.input, layer)
targ = K.variable(layer_model.predict(img_arr))
loss = metrics.mse(layer, targ)
grads = K.gradients(loss, model.input)
fn = K.function([model.input], [loss]+grads)
evaluator = Evaluator(fn, shp)
x = solve_image(evaluator, iterations, x)
NOTE: the following result isn't quite right. I accidentally used the result of the first content recreation as input instead of a new random image of pixels.
In [15]:
Image.open(path + 'results/res_at_iteration_9.png')
Out[15]:
In [9]:
from IPython.display import HTML
from matplotlib import animation, rc
In [19]:
fig, ax = plt.subplots()
def animate(i): ax.imshow(Image.open(f'{path}results/res_at_iteration_{i}.png'))
anim = animation.FuncAnimation(fig, animate, frames=10, interval=200)
HTML(anim.to_html5_video())
Out[19]:
In [10]:
def plot_arr(arr): plt.imshow(deproc(arr, arr.shape)[0].astype('uint8'))
In [12]:
style = Image.open('../data/nst/starry-night.png')
style = style.resize(np.divide(style.size, 1.0).astype('int32')); style
Out[12]:
In [13]:
style_arr = preproc(np.expand_dims(style, 0)[:,:,:,:3])
shp = style_arr.shape
model = VGG16_Avg(include_top=False, input_shape=shp[1:])
outputs = {λ.name: λ.output for λ in model.layers}
layers = [outputs['block{}_conv1'.format(o)] for o in range(1,3)]
layers_model = Model(model.input, layers)
targs = [K.variable(o) for o in layers_model.predict(style_arr)]
In [11]:
def gram_matrix(x):
features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
return K.dot(features, K.transpose(features)) / x.get_shape().num_elements()
def style_loss(x, targ): return metrics.mse(gram_matrix(x), gram_matrix(targ))
In [33]:
loss = sum(style_loss(λ1[0], λ2[0]) for λ1, λ2 in zip(layers, targs))
grads = K.gradients(loss, model.input)
style_fn = K.function([model.input], [loss]+grads)
evaluator = Evaluator(style_fn, shp)
rand_img = lambda shape: np.random.uniform(-2.5, 2.5, shape) / 1
x = rand_img(shp)
x = scipy.ndimage.filters.gaussian_filter(x, [0,2,2,0])
plt.imshow(x[0]);
In [34]:
iterations = 10
x = rand_img(shp)
x = solve_image(evaluator, iterations, x)
In [35]:
Image.open(path + 'results/res_at_iteration_9.png')
Out[35]:
In [43]:
w, h = style.size
src = img_arr[:,3*h//4:7*h//4,:w]
plot_arr(src)
In [44]:
style_layers = [outputs['block{}_conv2'.format(o)] for o in range(1,6)]
content_name = 'block4_conv2'
content_layer= outputs[content_name]
style_model = Model(model.input, style_layers)
style_targs = [K.variable(o) for o in style_model.predict(style_arr)]
content_model = Model(model.input, content_layer)
content_targ = K.variable(content_model.predict(src))
In [48]:
style_wgts = [0.05, 0.2, 0.2, 0.25, 0.3]
loss = sum(style_loss(λ1[0], λ2[0])*w
for λ1, λ2, w in zip(style_layers, style_targs, style_wgts))
loss += metrics.mse(content_layer, content_targ) / 10
grads = K.gradients(loss, model.input)
transfer_fn = K.function([model.input], [loss]+grads)
evaluator = Evaluator(transfer_fn, shp)
In [49]:
iterations = 10
x = rand_img(shp)
x = solve_image(evaluator, iterations, x)
In [50]:
Image.open(path + 'results/res_at_iteration_9.png')
Out[50]:
So far we've deomnstrated how to achieve successful results in style transfer. However, there's an obvious drawback to our implementation, namely that we're training an image, not a network, and therefore every new image requires us to retrain. It's not a feasible method for any sort of real-time application. Fortunately we can address this issue by using a Fully-Convolutional Network (FCN), and inparticular we'l look at this implementation for Super Resolution. We are following the approach in this paper
In [ ]:
# I think I do this part once I've downloaded ImageNet:
# you can turn a bcolz array into numpy array by slicing it w/ everything
# arr_lr = bcolz.open(dpath + 'trn_resized_72_r.bc')[:]
# arr_hr = bcolz.open(dpath + 'trn_resized_288_r.bc')[:]
# NOTE: right, JH's already created a folder of 20k ImageNet imgs, in 288x288 and 72x72 sizes.
In [6]:
# # Learning to work with Bcolz 1/3
# HR01 = Image.open('../data/sr-imgs/Jupiter-Juno-HR.jpg')
# HR01 = np.asarray(HR01)
# # HR01 = preproc(np.expand_dims(HR01, 0)[:,:,:,:3])
# HR02 = Image.open('../data/sr-imgs/riceguy-HR.jpg')
# HR02 = np.asarray(HR01)
# # HR02 = preproc(np.expand_dims(HR02, 0)[:,:,:,:3])
# LR01 = Image.open('../data/sr-imgs/Jupiter-Juno-LR.jpeg')
# LR01 = np.asarray(HR01)
# # LR01 = preproc(np.expand_dims(LR01, 0)[:,:,:,:3])
# LR02 = Image.open('../data/sr-imgs/riceguy-LR.jpeg')
# LR02 = np.asarray(HR01)
# # LR02 = preproc(np.expand_dims(LR02, 0)[:,:,:,:3])
# LR = bcolz.carray(np.asarray([LR01,LR02]), rootdir='../data/sr-imgs/bc/LR.bc', mode='w')
# HR = bcolz.carray(np.asarray([HR01,HR02]), rootdir='../data/sr-imgs/bc/HR.bc', mode='w')
# LR.flush()
# HR.flush()
# del HR01, HR02, LR01, LR02
# # bcolz.carray()
In [58]:
# Learning to work with Bcolz 2/3
# %ls ../data/sr-imgs
os.listdir('../data/sr-imgs')
# bcolz.carray(os.listdir('../data/sr-imgs'))
Out[58]:
In [7]:
# # Learning to work with Bcolz 3/3
# arr_lr = bcolz.open('../data/sr-imgs/bc/LR.bc')[:]
# arr_hr = bcolz.open('../data/sr-imgs/bc/HR.bc')[:]
In [12]:
hr_fnames = ['../data/sr-imgs/Jupiter-Juno-HR.jpg', '../data/sr-imgs/riceguy-HR.jpg']
lr_fnames = ['../data/sr-imgs/Jupiter-Juno-LR.jpeg', '../data/sr-imgs/riceguy-LR.jpeg']
arr_lr = np.array([np.array(Image.open(f)) for f in lr_fnames])
arr_hr = np.array([np.array(Image.open(f)) for f in hr_fnames])
In [7]:
arr_lr = bcolz.open(dpath + 'trn_resized_72.bc')[:]
arr_hr = bcolz.open(dpath + 'trn_resized_288.bc')[:]
In [8]:
pars = {'verbose': 0, 'callbacks': [TQDMNotebookCallback(leave_inner=True)]}
To start we'll define some of the building blocks of our network. In particular recall the Residual Block (as used in ResNet), which is just a sequence of 2 Convolutional layers that's added to the initial block input. We also have a de-Convolutional layer (also known as a 'Transposed Convolution' or 'Fractionally Strided Convolution'), whose purpose is to learn to 'undo' the convolutional function. It does this by padding the smaller image in such a way as to apply filters on it to produce a larger image.
ResNet blocks stacked upon oneanother can learn to gradually hone in on whatever they're trying to do -- in this case get the information it's going to need to upscale the image in a smart way.
In [9]:
def conv_block(x, filters, size, stride=(2,2), mode='same', act=True):
x = Convolution2D(filters, size, size, subsample=stride, border_mode=mode)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x) if act else x
# A ResNet block takes some input ip; does 2 Conv blocks on that input, then it adds the
# result of those convs back to the original input.
# Acc. to a recent paper, we genrly don't want activations at the end of ResNet blocks
def res_block(ip, nf=64):
x = conv_block(ip, nf, 3, (1,1))
x = conv_block(x, nf, 3, (1,1), act=False)
return merge([x, ip], mode='sum')
def deconv_block(x, filters, size, shape, stride=(2,2)):
x = Deconvolution2D(filters, size, size, subsample=stride,
border_mode='same', output_shape=(None,)+shape)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x)
def up_block(x, filters, size):
x = keras.layers.UpSampling2D()(x)
x = Convolution2D(filters, size, size, border_mode='same')(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x)
This model here is using the previously defined blocks to encode a low resolution image and then upsample it to match the same image in high resolution.
In [10]:
inp_shape = arr_lr.shape[1:]
out_shape = arr_hr.shape[1:]
# NOTE: arr_lr.shape[0] is the no. of files; all other idxs are filesize (72 or 288 etc)
In [11]:
arr_lr.shape[2]
Out[11]:
In [11]:
# we start off by taking in a batch of low resolution images:
inp = Input(inp_shape)
# and the 1st thing we do is stick them through a convolutional block w/ a stride of 1
x = conv_block(inp, 64, 9, (1,1))
# after the conv block, we have the computation: 4 ResNet Blocks
for i in range(4): x = res_block(x)
x = up_block(x, 64, 3) # <-- JH used to have 2: x=deconv_block(x, 64, 3, (144, 144, 64))
x = up_block(x, 64, 3) # <-/
x = Convolution2D(3, 9, 9, activation='tanh', border_mode='same')(x)
outp = Lambda(lambda x: (x+1)*127.5)(x)
# tanh activation gives you smth [-1:+1], add 1 and multip by 127.5 gives you
# smth in range [0:255] <-- the range we want
# NOTE: on reddit author removed the tanh activation & final deproc Lambda layer,
# and said model worked just as well.
# we call this whole block the UpSampling Network
# Oh... Keras has an UpSampling2D() already... well that simplifies things..
Having a stride of (1,1) doesn't change the dimensions of the image, but having a receptive field size of 9, and 64 such filters, allows us to effectively increase the receptive field of all subsequent layers. Many modern networks have a single input layer w/ such a large filter size for this reason.
We're not losing any information bc we're going from 3 channels to 64 9x9 filters, each of which can hold a good amt of information.
After the convolutional block, you have the actual computation. In any kind of generative network, there's the key kind of work it has to do. In this case it's figuring out what the objects are so it knows what to draw. In Generative Models we generally want to do that work at a lower resolution.
2 Reasons for work at Low Res:
A Receptive Field is genrly how much space a Conv filter can cover.
What if you had a 3x3 filter that took another 3x3 filter as it's input? Then it's receptive field is 5x5, assuming a stride of 1, bc that's the full-area coverage it's taking in from that prevs layer. So the Receptive Field depends on 2 things:
Those 2 things increase the Receptive Field. A good thing about doing layer computations w/ a large ReceptFld is that it allows you to look at the big picture / context.
The method of training this network is almost exactly the same as training the pixels from our previous implementations. The idea here is we're going to feed two images to VGG16 and compare their convolutional outputs at some layer. These two images are the target image (which in our case is the same as the original but at higher resolution), and the output of the previous network we just defined, which we hope will learn to output a high resolution image.
The key then is to train this other network to produce an image that minimizes the loss between the outputs of some convolutional layer in VGG16 (which the paper referse to as "Perceptual Loss"). In doing so, we're able to train a network that can upsample an image and recreate the higher resolution details.
We take this UpSampling Network and attach it to VGG; VGG will be only used as a Loss Function to get our Content Loss. Before we can attach our output outp
to VGG, we have to run it through our preprocessing (subtract mean, reverse color channels):
In [12]:
vgg_λ = Lambda(preproc) # <-- layer that does preprocessing
outp_λ = vgg_λ(outp) # <-- outp_λ is outp run through vgg_λ(..)
In [13]:
# now we can create our vgg network
vgg_inp = Input(out_shape)
vgg = VGG16(include_top=False, input_tensor=vgg_λ(vgg_inp))
for λ in vgg.layers: λ.trainable=False
# set all layers to untraiable: can never have loss function be trainable.
*The below code failed on my first GPU run bc the 4096 FC layer in Keras' VGG16 implementation was just too big for my gfx card to handle. Perhaps this was bc I had a TF GPU test script running in another JNB w/o limit_mem()
running..?
EDIT: holy shit. yes. ~2840/3017 MiB usage (then crash/error) vs 439/3017 MiB. TF's mem hunger is no joke, and limit_mem()
is a necessity.*
In [14]:
# now we can create our vgg network
vgg_inp = Input(out_shape)
vgg = VGG16(include_top=False, input_tensor=vgg_λ(vgg_inp))
for λ in vgg.layers: λ.trainable=False
# set all layers to untrainable: can never have loss function be trainable.
In [17]:
help(VGG16)
In [15]:
# # now we can create our vgg network
# vgg_inp = Input(out_shape)
# vgg = VGG16_Avg(include_top=False, input_tensor=vgg_λ(vgg_inp))
# for λ in vgg.layers: λ.trainable=False
# # set all layers to untrainable: can never have loss function be trainable.
Which part of the VGG Network do we want? We can try a few things. Earlier layers better for content reconstruction.
In [15]:
# using block2_conv2 as our content/perceptual loss
vgg_content = Model(vgg_inp, vgg.get_layer('block2_conv2').output)
# creating 2 vsns of VGG output
vgg1 = vgg_content(vgg_inp) # <-- based on the hi-res input
vgg2 = vgg_content(outp_λ) # <-- based on output of upscaling network
Thanks to Keras' Functional API, any layer (and a model is a layer as far as Keras is concerned) can be treated as a function. So we can take the vgg_content
model, treat it as a function and pass it any tensor we'd like: What that does is create a new model where those two pieces are joined together. vgg2
is now vgg_content
on top, and outp_λ
on bot.
An important difference in training for super-resolution is the loss function. We use what's known as a perceptual loss function (which is simply the content loss for some layer).
In [16]:
# we finally just take the mean sum of squares between the 2 loss functions
loss = Lambda(lambda x: K.sqrt(K.mean((x[0]-x[1])**2, (1,2))))([vgg1,vgg2])
m_final = Model([inp, vgg_inp], loss) # final model returns loss fn as output
targ = np.zeros((arr_hr.shape[0], 128))
# in the github vsn targ = np.zeros(arr_hr.shape[0], 1) <-- why?
# ah: it's 128 bc that layer has 128 filters. The vsn w/ 1 must have 1 then.
Finally we compile this chain of models and we can pass it the original low resolution image as well as the high resolution to train on. We also define a zero vector as a target parameter, which is a necessary parameter when calling fit on a Keras model.
NOTE: apparently I hadn't installed TensorFlow w/ GPU support enabled, which is why this was taking forever.. going to go fix that and rerun this below.
In [21]:
m_final.compile('adam', 'mse')
m_final.fit([arr_lr, arr_hr], targ, 8, 2, **pars)
In [18]:
help(m_final.fit)
(Above cell) Ahh! so **pars
will pass in the dictionary elements of pars
to the parameter arguments of the function you put it in. pars
consists of keys: verbose
and callbacks
; so their values are substituted for the default args corresponding to their keys -- though idk if this is sequentially or by key name.
In [19]:
m_final.compile('adam', 'mse')
m_final.fit([arr_lr, arr_hr], targ, 8, 2, **pars)
Out[19]:
Finally, got it to work. @ 2408/3017 MiB, it's under my mem limit, and training significantly faster than on the CPU.
But at an avg of 4918.77 seconds/epoch though, wow.
Also got another resource exhaustion error when trying to do LR Annealing below. Hopefully saving the current weights and continuing with an unloaded gfx card will do the trick.
Furthermore, having a batchsize of 16 would on its own stop my computer: its at 2536/3017 MiB @ BS=8
In [23]:
m_final.save_weights(dpath + 'sr-imgs/' + 'm_final_2eps.h5')
In [26]:
%ls ../data/sr-imgs/
We use learning rate annealing to get a better fit.
In [20]:
K.set_value(m_final.optimizer.lr, 1e-4)
m_final.fit([arr_lr, arr_hr], targ, 16, 2, **pars)
Looks like restarting the kernel, loading saved weights, and halving the batchsize to 8 got it to work.
In [17]:
m_final.compile('adam','mse')
In [20]:
m_final.load_weights(dpath + 'sr-imgs/' + 'm_final_2eps.h5')
In [17]:
m_final.load_weights(dpath + 'sr-imgs/' + 'm_final_2eps.h5')
K.set_value(m_final.optimizer.lr, 1e-4)
m_final.fit([arr_lr, arr_hr], targ, 8, 2, **pars)
Out[17]:
We're only interested in the trained part of the model, which does the actual upsampling. the upsampling model
In [21]:
top_model = Model(inp, outp)
In [ ]:
top_model.save_weights('../data/sr-imgs/' + 'top_final.h5')
# top_model.load_weights('../data/sr-imgs/' + 'top_final.h5')
In [22]:
top_model = Model(inp, outp)
top_model.load_weights('../data/sr-imgs/' + 'top_final.h5')
In [25]:
# Ahaa, I understand now. The model was supposed to be trained on all of ImageNet.
# The mini version (still dl'ng ImgNt) is only trained on 4 images hah..
# p = top_model.predict(arr_lr[10:11])
p = top_model.predict(arr_lr[0:1])
In [181]:
# original low-res Jupiter
plt.imshow(arr_lr[0].astype('uint8'));
In [106]:
# Ohh here's a cool one:
plt.imshow(p[0].astype('uint8'))
# NOTE: not sure if I forgot a preproc/deproc step, or this is the result of
# not enough training data..
Out[106]:
In [12]:
# trying to get the ultra-violet one again after redoing a bit
# gonna stop after getting the evil black hole of doom
plt.imshow(p[0].astype('uint8'))
Out[12]:
Test here: to see why Jupiter is all indigo and ultra violet, gonna redo the steps for loading data & training / predicting the model; making sure pre/de-processing is done
In [11]:
def conv_block(x, filters, size, stride=(2,2), mode='same', act=True):
x = Convolution2D(filters, size, size, subsample=stride, border_mode=mode)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x) if act else x
def res_block(ip, nf=64):
x = conv_block(ip, nf, 3, (1,1))
x = conv_block(x, nf, 3, (1,1), act=False)
return merge([x, ip], mode='sum')
def deconv_block(x, filters, size, shape, stride=(2,2)):
x = Deconvolution2D(filters, size, size, subsample=stride,
border_mode='same', output_shape=(None,)+shape)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x)
def up_block(x, filters, size):
x = keras.layers.UpSampling2D()(x)
x = Convolution2D(filters, size, size, border_mode='same')(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x)
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)
preproc = lambda x: (x - rn_mean)[:, :, :, ::-1]
deproc = lambda x,s : np.clip(x.reshape(s)[:, :, :, ::-1] + rn_mean, 0, 255)
pars = {'verbose': 0, 'callbacks': [TQDMNotebookCallback(leave_inner=True)]}
hr_fnames = ['../data/sr-imgs/Jupiter-Juno-HR.jpg', '../data/sr-imgs/riceguy-HR.jpg']
lr_fnames = ['../data/sr-imgs/Jupiter-Juno-LR.jpeg', '../data/sr-imgs/riceguy-LR.jpeg']
arr_lr = np.array([np.array(Image.open(f)) for f in lr_fnames])
arr_hr = np.array([np.array(Image.open(f)) for f in hr_fnames])
inp_shape = arr_lr.shape[1:]
out_shape = arr_hr.shape[1:]
inp = Input(inp_shape)
x = conv_block(inp, 64, 9, (1,1))
for i in range(4): x = res_block(x)
x = up_block(x, 64, 3)
x = up_block(x, 64, 3)
x = Convolution2D(3, 9, 9, activation='tanh', border_mode='same')(x)
outp = Lambda(lambda x: (x+1)*127.5)(x)
vgg_λ = Lambda(preproc)
outp_λ = vgg_λ(outp)
vgg_inp = Input(out_shape)
vgg = VGG16(include_top=False, input_tensor=vgg_λ(vgg_inp))
for λ in vgg.layers: λ.trainable=False
vgg_content = Model(vgg_inp, vgg.get_layer('block2_conv2').output)
vgg1 = vgg_content(vgg_inp)
vgg2 = vgg_content(outp_λ)
loss = Lambda(lambda x: K.sqrt(K.mean((x[0]-x[1])**2, (1,2))))([vgg1,vgg2])
m_final = Model([inp, vgg_inp], loss)
targ = np.zeros((arr_hr.shape[0], 128))
m_final.compile('adam', 'mse')
m_final.fit([arr_lr, arr_hr], targ, 8, 2, **pars)
K.set_value(m_final.optimizer.lr, 1e-4)
m_final.fit([arr_lr, arr_hr], targ, 16, 2, **pars)
top_model = Model(inp, outp)
top_model.save_weights('../data/sr-imgs/' + 'top_final.h5')
p = top_model.predict(arr_lr[0:1])
In [26]:
plt.imshow(arr_lr[0].astype('uint8'))
Out[26]:
In [27]:
plt.imshow(p[0].astype('uint8'))
Out[27]:
In [23]:
p = top_model.predict(arr_lr[0:1])
In [24]:
# Original low res breakfast:
plt.imshow(arr_lr[0].astype('uint8'))
Out[24]:
In [25]:
# Our model's super-resolution breakfast:
plt.imshow(p[0].astype('uint8'))
Out[25]:
I must be doing something wrong here....Will redo the super-resolution part of this lesson.
Did it screw up? Did it just need proper de/post-processing? Am I seeing convolutional activations? Will answer that in the follow-up NB to this.
27 AUG 2017
In [44]:
deproc = lambda x,s : np.clip(x.reshape(s)[:, :, :, ::-1] + rn_mean, 0, 255)
# plt.imshow(deproc(p[0], p[0].shape).astype('uint8')) # <-- aha, not quite like this
# but more like this:
def plot_arr(arr): plt.imshow(deproc(arr, arr.shape)[0].astype('uint8'))
# plt.imshow(deproc(p[0], p[0].shape)[0].astype('uint8'))
plot_arr(p[0])
In [37]:
deproc = lambda x : np.clip(x[:, :, :, ::-1] + rn_mean, 0, 255)
deproc(p[0])
In [41]:
plt.imshow(p[0].astype('uint8'))
Out[41]:
In [45]:
p[0].shape
Out[45]:
In [ ]:
deproc = lambda x,s : np.clip(x.reshape(s)[:, :, :, ::-1] + rn_mean, 0, 255)
def plot_arr(arr): plt.imshow(deproc(arr, arr.shape)[0].astype('uint8'))
img = Image.open(path + fnames[0]); img
img_arr = preproc(np.expand_dims(np.array(img), 0))
shp = img_arr.shape
src = img_arr[:,:,:]
plot_arr(src)
Redoing the above Super Resolution section after getting bcolzarrayiterator working in neural-sr-attempt3.ipynb
Update: it looks like a standard iteration through batches isn't going to work.. There's an issue with the target. What is it. In the neural-sr.ipynb for the class, target is a 1 x batch_size numpy array of zeros. In neural-style.ipynb target is a None x 128 numpy vector of zeros. The model architecture is also subtly different in nueral-sr.ipynb. In neural-style.ipynb, Keras is expecting a None x 128 numpy vector in the lambda_3 or lambda_4 layer of the model.. This is confusing. I think, if I can keep the batch size very low, I can load everything into memory and fit the model on it without hitting a ResourceExhaustionError. We'll see. Although for future SR work, I'm going to be using the updated methods in the neural-sr.ipynb JNB. I still want to get this JNB done for good.
06 SEP 2017
In [3]:
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)
preproc = lambda x: (x - rn_mean)[:, :, :, ::-1]
deproc = lambda x,s: np.clip(x.reshape(s)[:, :, :, ::-1])
pars = {'verbose': 0, 'callbacks': [TQDMNotebookCallback(leave_inner=True)]}
In [4]:
# # Using BcolzArrayIterator instead of loading the full arrays into memory:
# # the '_c6' suffix specifies a chunk length of 6 in this NB.
# arr_lr_c6 = bcolz.open('../data/' + 'trn_resized_72_c6.bc')
# arr_hr_c6 = bcolz.open('../data/' + 'trn_resized_288_c6.bc')
In [5]:
def conv_block(x, filters, size, stride=(2,2), mode='same', act=True):
x = Convolution2D(filters, size, size, subsample=stride, border_mode=mode)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x) if act else x
# A ResNet block takes some input ip; does 2 Conv blocks on that input, then it adds the
# result of those convs back to the original input.
# Acc. to a recent paper, we genrly don't want activations at the end of ResNet blocks
def res_block(ip, nf=64):
x = conv_block(ip, nf, 3, (1,1))
x = conv_block(x, nf, 3, (1,1), act=False)
return merge([x, ip], mode='sum')
def deconv_block(x, filters, size, shape, stride=(2,2)):
x = Deconvolution2D(filters, size, size, subsample=stride,
border_mode='same', output_shape=(None,)+shape)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x)
def up_block(x, filters, size):
x = keras.layers.UpSampling2D()(x)
x = Convolution2D(filters, size, size, border_mode='same')(x)
return Activation('relu')(x)
In [6]:
inp_shape = arr_lr_c6.shape[1:]
out_shape = arr_hr_c6.shape[1:]
In [7]:
arr_lr_c6.shape[2]
Out[7]:
In [8]:
# we start off by taking in a batch of low resolution images:
inp = Input(inp_shape)
# and the 1st thing we do is stick them through a convolutional block w/ a stride of 1
x = conv_block(inp, 64, 9, (1,1))
# after the conv block, we have the computation: 4 ResNet Blocks
for i in range(4): x = res_block(x)
x = up_block(x, 64, 3) # <-- JH used to have 2: x=deconv_block(x, 64, 3, (144, 144, 64))
x = up_block(x, 64, 3) # <-/
x = Convolution2D(3, 9, 9, activation='tanh', border_mode='same')(x)
outp = Lambda(lambda x: (x+1)*127.5)(x)
# tanh activation gives you smth [-1:+1], add 1 and muliply by 127.5 gives you
# smth in range [0:255] <-- range we want
# NOTE: on reddit, author removed the tanh activtn & final deproc Lambda layer,
# & said model worked just as well.
# we call this whole block the UpSampling Network -- Keras has an Upsampling2D() function
In [9]:
vgg_λ = Lambda(preproc) # <-- layer that does proprocessing
outp_λ = vgg_λ(outp) # <-- outp_λ is outp run thru vgg_λ(..)
# now we can create our vgg network
vgg_inp = Input(out_shape)
vgg = VGG16(include_top=False, input_tensor=vgg_λ(vgg_inp))
for λ in vgg.layers: λ.trainable=False
# set all layers to untrainable: can never have loss function be trainable.
In [10]:
# using block2_ocnv2 as our content/perceptual loss
vgg_content = Model(vgg_inp, vgg.get_layer('block2_conv2').output)
# creatign 2 vsns of VGG output
vgg1 = vgg_content(vgg_inp) # <-- based on hires input
vgg2 = vgg_content(outp_λ) # <-- based on output of upscaling network
In [20]:
# we finally just take of mean sum of squares between the 2 loss functions
loss = Lambda(lambda x: K.sqrt(K.mean((x[0]-x[1])**2, (1,2))))([vgg1,vgg2])
m_final = Model([inp, vgg_inp], loss) # final model returns loss fn as output
# targ = np.zeros((arr_hr_c6.shape[0], 128)) # <-- '128' here bc layer has 128 filters
In [21]:
from tqdm import tqdm
from bcolz_array_iterator import BcolzArrayIterator
def train(bs, niter=10):
targ = np.zeros((bs, 1)) # <-- 'targ' was already defined above in this NB
# \--> wait but don't I need the target to match input size?
# is it just different in the lecture vsn of the JNB?
bc = BcolzArrayIterator(arr_hr_c6, arr_lr_c6, batch_size=bs)
for i in tqdm(range(niter)): # loop wrapped in tqdm progress bar
hr,lr = next(bc)
# m_final.train_on_batch([lr[:bs], hr[:bs]], targ[bs*i:bs*(i+1)])
m_final.train_on_batch([lr[:bs], hr[:bs]], targ[:len(hr)])
# NOTE: I'm doing targ[:len(hr)] to avoid remainder-errors at last batch
In [15]:
# # Test of Concept
# from bcolz_array_iterator import BcolzArrayIterator
# dude = BcolzArrayIterator(arr_hr_c6, arr_lr_c6, batch_size=6)
# hr, lr = next(dude)
# print(len(hr), len(lr))
# print(hr.shape, lr.shape)
In [16]:
# temp = [0,1,2,3,4]
# temp[:10] # <-- Aha.
In [23]:
m_final.compile('adam', 'mse')
# manually-coded training due to iterated batch-loading from bcolzarrayiter.
# As of yet, I still don't know why a num.iteratns of (len(arr_lr_c6) // batch_size) + 1
# causes a ValueError. Results in Input & Target sample arrays being off by 1 element in the
# final iteration. arr_hr_c6 and arr_lr_c6 both contain the same number of elements...
# by leaving out that final iteration, am I forgoing training on that last bit? I think so.
train(bs=6, niter=(len(arr_lr_c6)//6 + 1))
In [1]:
len(arr_lr_c6)
In [50]:
temp = np.array([None]*128)
In [51]:
temp.shape
Out[51]:
In [3]:
arr_lr = bcolz.open('../data/trn_resized_72.bc')
In [4]:
arr_lr.chunklen
Out[4]:
In [3]:
rn_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32)
preproc = lambda x: (x - rn_mean)[:, :, :, ::-1]
deproc = lambda x,s: np.clip(x.reshape(s)[:, :, :, ::-1])
pars = {'verbose': 0, 'callbacks': [TQDMNotebookCallback(leave_inner=True)]}
In [4]:
arr_lr = bcolz.open('../data/trn_resized_72.bc')[:]
arr_hr = bcolz.open('../data/trn_resized_288.bc')[:]
In [5]:
def conv_block(x, filters, size, stride=(2,2), mode='same', act=True):
x = Convolution2D(filters, size, size, subsample=stride, border_mode=mode)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x) if act else x
def res_block(ip, nf=64):
x = conv_block(ip, nf, 3, (1,1))
x = conv_block(x, nf, 3, (1,1), act=False)
return merge([x, ip], mode='sum')
def deconv_block(x, filters, size, shape, stride=(2,2)):
x = Deconvolution2D(filters, size, size, subsample=stride,
border_mode='same', output_shape=(None,)+shape)(x)
x = BatchNormalization(mode=2)(x)
return Activation('relu')(x)
def up_block(x, filters, size):
x = keras.layers.UpSampling2D()(x)
x = Convolution2D(filters, size, size, border_mode='same')(x)
return Activation('relu')(x)
In [6]:
inp_shape = arr_lr.shape[1:]
out_shape = arr_hr.shape[1:]
In [7]:
inp = Input(inp_shape)
x = conv_block(inp, 64, 9, (1,1))
for i in range(4): x = res_block(x)
x = up_block(x, 64, 3) # <-- JH used to have 2: x=deconv_block(x, 64, 3, (144, 144, 64))
x = up_block(x, 64, 3) # <-/
x = Convolution2D(3, 9, 9, activation='tanh', border_mode='same')(x)
outp = Lambda(lambda x: (x+1)*127.5)(x)
In [8]:
vgg_λ = Lambda(preproc) # <-- layer that does proprocessing
outp_λ = vgg_λ(outp) # <-- outp_λ is outp run thru vgg_λ(..)
# now we can create our vgg network
vgg_inp = Input(out_shape)
vgg = VGG16(include_top=False, input_tensor=vgg_λ(vgg_inp))
for λ in vgg.layers: λ.trainable=False
In [9]:
# using block2_ocnv2 as our content/perceptual loss
vgg_content = Model(vgg_inp, vgg.get_layer('block2_conv2').output)
# creatign 2 vsns of VGG output
vgg1 = vgg_content(vgg_inp) # <-- based on hires input
vgg2 = vgg_content(outp_λ) # <-- based on output of upscaling network
In [11]:
loss = Lambda(lambda x: K.sqrt(K.mean((x[0]-x[1])**2, (1,2))))([vgg1,vgg2])
m_final = Model([inp, vgg_inp], loss) # final model returns loss fn as output
targ = np.zeros((arr_hr.shape[0], 128))
In [12]:
m_final.compile('adam', 'mse')
m_final.fit([arr_lr, arr_hr], targ, 6, 2, **pars)
Out[12]:
Looks like the names of some things have been changed since the class lecutre. m_final
is now m_sr
. Unfortunately, I don't know why/how exactly the model architectures are different between this notebook and the neural-sr.ipynb one.. Unable to use bcolzarrayiterator here like there.
In [13]:
# getting a better fit w/ LR annealing
K.set_value(m_final.optimizer.lr, 1e-4)
m_final.fit([arr_lr, arr_hr], targ, 6, 1, **pars)
Out[13]:
In [14]:
# only want the trained part of the model, which does the actual upsampling.
top_model = Model(inp, outp)
In [15]:
p = top_model.predict(arr_lr[10:11])
In [16]:
plt.imshow(arr_lr[10].astype('uint8'))
Out[16]:
In [17]:
plt.imshow(p[0].astype('uint8'))
Out[17]:
Right then, going to have to redo the SR portion of this notebook.
06 SEP 2017
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: