In [ ]:
%matplotlib inline

import numpy as np
import matplotlib.pyplot as plt

Loading a JPEG file as a numpy array

Let's use scikit-image to load the content of a JPEG file into a numpy array:


In [ ]:
from skimage.io import imread

image = imread('laptop.jpeg')
type(image)

The dimensions of the array are:

  • height
  • width
  • color channels (RGB)

In [ ]:
image.shape

For efficiency reasons, the pixel intensities of each channel are stored as 8-bit unsigned integer taking values in the [0-255] range:


In [ ]:
image.dtype

In [ ]:
image.min(), image.max()

In [ ]:
plt.imshow(image);

Size of a numpy array

The size in bytes can be computed by multiplying the number of element by the size in byte of each element in the array.

The size of one element depend of the data type.

1 byte == 8 bits

A byte in English is an octet in French.


In [ ]:
np.product(image.shape)

In [ ]:
450 * 800 * 3 * (8 / 8)

Let's check by asking numpy:


In [ ]:
print("image size: {:0.3} MB".format(image.nbytes / 1e6))

Indexing on the last dimension makes it possible to extract the 2D content of a specific color channel, for instance the red channel:


In [ ]:
red_channel = image[:, :, 0]
red_channel

In [ ]:
plt.imshow(image[:, :, 0], cmap=plt.cm.Reds_r);

Exercise

  • Compute a grey-level version of the image with shape (height, width) by averaging the values across color channels using image.mean.

  • Plot the result with plt.imshow using a grey levels colormap.

  • Can the uint8 integer data type represent those average values? Check the data type used by numpy.

  • What is the size in (mega) bytes of this image?

  • What are the expected range of values for the new pixels?


In [ ]:


In [ ]:
# %load solutions/grey_levels.py

Resizing images, handling data types and dynamic ranges

When dealing with an heterogeneous collection of image of various sizes, it is often necessary to resize the image to the same size. More specifically:

  • for image classification, most networks expect a specific fixed input size;

  • for object detection and instance segmentation, networks have more flexibility but the image should have approximately the same size as the training set images.

Furthermore large images can be much slower to process than smaller images (the number of pixels varies quadratically with the height and width).


In [ ]:
from skimage.transform import resize

image = imread('laptop.jpeg')
lowres_image = resize(image, (50, 50), mode='reflect', anti_aliasing=True)
lowres_image.shape

In [ ]:
plt.imshow(lowres_image, interpolation='nearest');

The values of the pixels of the low resolution image are computed from by combining the values of the pixels in the high resolution image. The result is therefore represented as floating points.


In [ ]:
lowres_image.dtype

By conventions, both skimage.transform.imresize and plt.imshow assume that floating point values range from 0.0 to 1.0 when using floating points as opposed to 0 to 255 when using 8-bit integers:


In [ ]:
lowres_image.min(), lowres_image.max()

Note that keras on the other hand might expect images encoded with values in the [0.0 - 255.0] range irrespectively of the data type of the array. To avoid the implicit conversion to the [0.0 - 1.0] range we use the preserve_range=True option.


In [ ]:
lowres_large_range_image = resize(image, (50, 50), mode='reflect',
                                  anti_aliasing=True, preserve_range=True)

In [ ]:
lowres_large_range_image.shape

In [ ]:
lowres_large_range_image.dtype

In [ ]:
lowres_large_range_image.min(), lowres_large_range_image.max()

Warning: the behavior of plt.imshow depends on both the dtype and the dynamic range when displaying RGB images. In particular it does not work on RGB images with float64 values in the [0.0 - 255.0] range:


In [ ]:
plt.imshow(lowres_large_range_image, interpolation='nearest');

Question

Suggest two possible ways to correctly display an RGB array with floating point values in the [0.0 - 255.0] range:


In [ ]:
# %load solutions/question_imshow_dtype_and_range.py

Using a pretrained model

Objectives:

  • Load a pre-trained ResNet50 pre-trained model using Keras Zoo
  • Build a headless model and compute representations of images
  • Explore the quality of representation with t-SNE
  • Retrain last layer on cat vs dog dataset

In [ ]:
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing import image

model = ResNet50(include_top=True, weights='imagenet')

In [ ]:
print(model.summary())

Classification of an image

Exercise

  • Open an image, preprocess it and build a batch of 1 image
  • Use the model to classify this image
  • Decode the predictions using decode_predictions from Keras

Notes:

  • Test your code with "images_resize/000007.jpg"
  • You may need preprocess_input for preprocessing the image.
  • The Keras resnet expects floating point images of size (224, 224) with a dynamic in [0, 255] before preprocessing. skimage's resize has a preserve_range flag that you might find useful.

In [ ]:
from tensorflow.keras.applications.imagenet_utils import preprocess_input
from tensorflow.keras.applications.imagenet_utils import decode_predictions

path = "laptop.jpeg"

# TODO

In [ ]:
# %load solutions/predict_image.py

Taking snapshots from the webcam

Let's use the python API of OpenCV to take pictures.


In [ ]:
import cv2

def camera_grab(camera_id=0, fallback_filename=None):
    camera = cv2.VideoCapture(camera_id)
    try:
        # take 10 consecutive snapshots to let the camera automatically tune
        # itself and hope that the contrast and lighting of the last snapshot
        # is good enough.
        for i in range(10):
            snapshot_ok, image = camera.read()
        if snapshot_ok:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        else:
            print("WARNING: could not access camera")
            if fallback_filename:
                image = imread(fallback_filename)
    finally:
        camera.release()
    return image

In [ ]:
image = camera_grab(camera_id=0, fallback_filename='laptop.jpeg')
plt.imshow(image)
print("dtype: {}, shape: {}, range: {}".format(
    image.dtype, image.shape, (image.min(), image.max())))

Exercise

  • Write a function named classify that takes a snapshot of the webcam and displays it along with the decoded predictions of model and their confidence level.

  • If you don't have access to a webcam take a picture with your mobile phone or a photo of your choice from the web, store it as a JPEG file on the disk instead and pass that file to the neural network to make the prediction.

  • Try to classify a photo of your face. Look at the confidence level. Can you explain the results?

  • Try to classify photos of common objects such as a book, a mobile phone, a cup... Try to center the objects and remove clutter to get confidence higher than 0.5.


In [ ]:
def classify():
    # TODO: write me
    pass

    
classify()

In [ ]:
# %load solutions/classify_webcam.py

Computing the representations of images

First let's make sure that we have access to a subset of image files from the PASCAL VOC dataset we'll be using


In [ ]:
import os.path as op
from zipfile import ZipFile

if not op.exists("images_resize"):
    print('Extracting image files...')
    zf = ZipFile('images_pascalVOC.zip')
    zf.extractall('.')

Let's build a new model that maps the image input space to the output of the layer before the last layer of the pretrained Resnet 50 model. We call this new model the "base model":


In [ ]:
input = model.layers[0].input
output = model.layers[-2].output
base_model = Model(input, output)
base_model.output_shape

The base model can transform any image into a flat, high dimensional, semantic feature vector:


In [ ]:
representation = base_model.predict(img_batch)
print("Shape of representation:", representation.shape)

Computing representations of all images can be time consuming. This is usually made by large batches on a GPU for massive performance gains.

For the remaining part, we will use pre-computed representations saved in h5 format.

For those interested, this is done using the process_images.py script


In [ ]:
import os
paths = ["images_resize/" + path
         for path in sorted(os.listdir("images_resize/"))]

In [ ]:
import h5py

with h5py.File('img_emb.h5', 'r') as h5f:
    out_tensors = h5f['img_emb'][:]
    
out_tensors.shape

In [ ]:
out_tensors.dtype

Exercise

  • What is the proportion of 0 values in this representation?
  • Can you find any negative values?
  • Why are there so many zero values?
  • Are the zero always located in the same dimensions for different input images?

In [ ]:
# %load solutions/representations.py

Let's find a 2D representation of that high dimensional feature space using T-SNE:


In [ ]:
from sklearn.manifold import TSNE

img_emb_tsne = TSNE(perplexity=30).fit_transform(out_tensors)

In [ ]:
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 10))
plt.scatter(img_emb_tsne[:, 0], img_emb_tsne[:, 1]);
plt.xticks(()); plt.yticks(());
plt.show()

Let's add thumnails of the original images at their TSNE locations:


In [ ]:
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
from skimage.io import imread
from skimage.transform import resize

def imscatter(x, y, paths, ax=None, zoom=1, linewidth=0):
    if ax is None:
        ax = plt.gca()
    x, y = np.atleast_1d(x, y)
    artists = []
    for x0, y0, p in zip(x, y, paths):
        try:
            im = imread(p)
        except:
            print(p)
            continue
        im = resize(im, (224, 224), preserve_range=False, mode='reflect')
        im = OffsetImage(im, zoom=zoom)
        ab = AnnotationBbox(im, (x0, y0), xycoords='data',
                            frameon=True, pad=0.1, 
                            bboxprops=dict(edgecolor='red',
                                           linewidth=linewidth))
        artists.append(ax.add_artist(ab))
    ax.update_datalim(np.column_stack([x, y]))
    ax.autoscale()
    return artists

In [ ]:
fig, ax = plt.subplots(figsize=(50, 50))
imscatter(img_emb_tsne[:, 0], img_emb_tsne[:, 1], paths, zoom=0.5, ax=ax)
plt.savefig('tsne.png')

Visual Search: finding similar images


In [ ]:
def display(img):
    plt.figure()
    img = imread(img)
    plt.imshow(img)

In [ ]:
idx = 57

def most_similar(idx, top_n=5):
    dists = np.linalg.norm(out_tensors - out_tensors[idx], axis = 1)
    sorted_dists = np.argsort(dists)
    return sorted_dists[:top_n]

sim = most_similar(idx)
[display(paths[s]) for s in sim];

Bonus: Classification from Nearest Neighbors?

Using these representations, it may be possible to build a nearest neighbor classifier. However, the representations are learnt on ImageNet, which are centered images, when we input images from PascalVOC, more plausible inputs of a real world system.

The next section explores this possibility by computing the histogram of similarities between one image and the others.


In [ ]:
out_norms = np.linalg.norm(out_tensors, axis=1, keepdims=True)
normed_out_tensors = out_tensors / out_norms

In [ ]:
item_idx = 208
dists_to_item = np.linalg.norm(out_tensors - out_tensors[item_idx],
                               axis=1)
cos_to_item = np.dot(normed_out_tensors, normed_out_tensors[item_idx]) 
plt.hist(cos_to_item, bins=30)
display(paths[item_idx])

Unfortunately there is no clear separation of class boundaries visible in the histogram of similarities alone. We need some supervision to be able to classify images.

With a labeled dataset, even with very little labels per class, one would be able to:

  • build a k-Nearest Neighbor model,
  • build a classification model such as a SVM.

These approximate classifiers are useful in practice. See the cat vs dog home assignment with GPU for another example of this approach.


In [ ]:
items = np.where(cos_to_item > 0.5)
print(items)
[display(paths[s]) for s in items[0]];

In [ ]: