Training good neural networks for vision tasks requires a lot of data. For example the ImageNet dataset contains 1000 classes and a total of ~10million images. Most people do not have access to an annotated dataset of similar size. As a result what is often done is to train a network based on the the ImageNet data and then either:
You can treat a whole neural network as a single "building block" in your setup. As long as it is differentiable you do not really care what is inside, you can keep building on top of it and keep training via back propagation.
A good example of training with a (very) small dataset: https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Below we load a ResNet50 pretrained model. Use it to extract features for images from the Pascal VOC dataset.
(This code is based on https://github.com/m2dsupsdlclass/lectures-labs/blob/master/labs/03_conv_nets/Pretrained_ConvNets_with_Keras_rendered.ipynb to run it you will have to install h5py and pillow: conda install h5py pillow)
In [1]:
%config InlineBackend.figure_format='retina'
%matplotlib inline
# Silence warnings
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
warnings.simplefilter(action="ignore", category=UserWarning)
warnings.simplefilter(action="ignore", category=RuntimeWarning)
warnings.simplefilter(action='ignore', category=DeprecationWarning)
import numpy as np
np.random.seed(123)
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (8, 8)
plt.rcParams["font.size"] = 14
In [2]:
from keras.applications.resnet50 import ResNet50
from keras.models import Model
from keras.preprocessing import image
model = ResNet50(include_top=True, weights='imagenet')
Now load an example image and feed it through the model.
In [3]:
# unzip some example images
import os.path as op
from zipfile import ZipFile
if not op.exists("images_resize"):
print('Extracting image files...')
zf = ZipFile('images_pascalVOC.zip')
zf.extractall('.')
In [5]:
from scipy.misc import imread, imresize
from keras.applications.imagenet_utils import preprocess_input
from keras.applications.imagenet_utils import decode_predictions
path = "images_resize/000007.jpg"
path = "images_resize/000005.jpg"
path = "images_resize/000228.jpg"
#path = "images_resize/000486.jpg"
img = imread(path)
plt.imshow(img)
img = imresize(img, (224,224)).astype("float32")
# add a dimension for a "batch" of 1 image
img_batch = preprocess_input(img[np.newaxis])
predictions = model.predict(img_batch)
decoded_predictions= decode_predictions(predictions)
for s, name, score in decoded_predictions[0]:
print(name, score)
In [ ]:
In [6]:
input = model.layers[0].input
output = model.layers[-2].output
short_model = Model(input, output)
In [7]:
representation = short_model.predict(img_batch)
print("shape of representation:", representation.shape)
# we now have a 2048 dimensional representation
In [8]:
import h5py
# Load pre-calculated representations
h5f = h5py.File('img_emb.h5','r')
out_tensors = h5f['img_emb'][:]
h5f.close()
How good are these representations? Can we look at them? Yes, use a non-linear dimensionality reduction algorithm called TSNE.
In [9]:
from sklearn.manifold import TSNE
img_emb_tsne = TSNE(perplexity=30).fit_transform(out_tensors)
plt.scatter(img_emb_tsne[:,0], img_emb_tsne[:,1])
plt.xticks(())
plt.yticks(());
In [10]:
from sklearn.neighbors import KDTree
tree = KDTree(img_emb_tsne)
In [11]:
import os
paths = ["images_resize/" + path
for path in sorted(os.listdir("images_resize/"))]
In [28]:
distance, query_inds = tree.query([img_emb_tsne[20]], k=3)
distance, query_inds = tree.query([img_emb_tsne[0]], k=3)
distance, query_inds = tree.query([img_emb_tsne[40]], k=3)
distance, query_inds = tree.query([img_emb_tsne[1]], k=3)
print(query_inds)
In [29]:
for idx in query_inds[0]:
fig = plt.figure()
img = imread(paths[idx])
plt.imshow(img)
plt.show()
In [ ]: