Using a feature representation learned for signature images

This notebook contains code to pre-process signature images and to obtain feature-vectors using the learned feature representation on the GPDS dataset


In [2]:
import numpy as np

# Functions to load and pre-process the images:
from scipy.misc import imread, imsave
from preprocess.normalize import normalize_image, resize_image, crop_center, preprocess_signature

# Functions to load the CNN model
import signet
from cnn_model import CNNModel

# Functions for plotting:
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['image.cmap'] = 'Greys'

Pre-processing a single image


In [3]:
original = imread('data/some_signature.png')

In [4]:
# Manually normalizing the image following the steps provided in the paper.
# These steps are also implemented in preprocess.normalize.preprocess_signature

normalized = 255 - normalize_image(original, size=(952, 1360))
resized = resize_image(normalized, (170, 242))
cropped = crop_center(resized, (150,220))

In [5]:
# Visualizing the intermediate steps

f, ax = plt.subplots(4,1, figsize=(6,15))
ax[0].imshow(original, cmap='Greys_r')
ax[1].imshow(normalized)
ax[2].imshow(resized)
ax[3].imshow(cropped)

ax[0].set_title('Original')
ax[1].set_title('Background removed/centered')
ax[2].set_title('Resized')
ax[3].set_title('Cropped center of the image')


Out[5]:
<matplotlib.text.Text at 0x7fe197a6fc90>

Processing multiple images and obtaining feature vectors


In [6]:
user1_sigs  = [imread('data/a%d.png' % i) for i in  [1,2]]
user2_sigs  = [imread('data/b%d.png' % i) for i in  [1,2]]

canvas_size = (952, 1360)

processed_user1_sigs = np.array([preprocess_signature(sig, canvas_size) for sig in user1_sigs])
processed_user2_sigs = np.array([preprocess_signature(sig, canvas_size) for sig in user2_sigs])

In [7]:
# Shows pre-processed samples of the two users

f, ax = plt.subplots(2,2, figsize=(10,6))
ax[0,0].imshow(processed_user1_sigs[0])
ax[0,1].imshow(processed_user1_sigs[1])

ax[1,0].imshow(processed_user2_sigs[0])
ax[1,1].imshow(processed_user2_sigs[1])


Out[7]:
<matplotlib.image.AxesImage at 0x7fe1978eb090>

Using the CNN to obtain the feature representations


In [8]:
# Path to the learned weights
model_weight_path = 'models/signet.pkl'

In [9]:
# Instantiate the model
model = CNNModel(signet, model_weight_path)

In [10]:
# Obtain the features. Note that you can process multiple images at the same time

user1_features = model.get_feature_vector_multiple(processed_user1_sigs, layer='fc2')
user2_features = model.get_feature_vector_multiple(processed_user2_sigs, layer='fc2')

Inspecting the learned features

The feature vectors have size 2048:


In [11]:
user1_features.shape


Out[11]:
(2, 2048)

In [12]:
print('Euclidean distance between signatures from the same user')
print(np.linalg.norm(user1_features[0] - user1_features[1]))
print(np.linalg.norm(user2_features[0] - user2_features[1]))


Euclidean distance between signatures from the same user
19.447027
20.110455

In [13]:
print('Euclidean distance between signatures from different users')

dists = [np.linalg.norm(u1 - u2) for u1 in user1_features for u2 in user2_features]
print(dists)


Euclidean distance between signatures from different users
[34.48648, 38.47806, 31.770254, 34.43613]

In [14]:
# Other models:
# model_weight_path = 'models/signetf_lambda0.95.pkl'
# model_weight_path = 'models/signetf_lambda0.999.pkl'

Using SPP models (signatures from different sizes)

For the SPP models, we can use images of any size as input, to obtain a feature vector of a fixed size. Note that in the paper we obtained better results by padding small images to a fixed canvas size, and processed larger images in their original size. More information can be found in the paper: https://arxiv.org/abs/1804.00448


In [15]:
from preprocess.normalize import remove_background

# To illustrate that images from any size can be used, let's process the signatures just 
# by removing the background and inverting the image

normalized_spp = 255 - remove_background(original)

plt.imshow(normalized_spp)


Out[15]:
<matplotlib.image.AxesImage at 0x7fe11802bb50>

In [16]:
# Note that now we need to use lists instead of numpy arrays, since the images will have different sizes. 
# We will also process each image individually

processed_user1_sigs_spp = [255-remove_background(sig) for sig in user1_sigs]
processed_user2_sigs_spp = [255-remove_background(sig) for sig in user2_sigs]

In [17]:
# Shows pre-processed samples of the two users

f, ax = plt.subplots(2,2, figsize=(10,6))
ax[0,0].imshow(processed_user1_sigs_spp[0])
ax[0,1].imshow(processed_user1_sigs_spp[1])

ax[1,0].imshow(processed_user2_sigs_spp[0])
ax[1,1].imshow(processed_user2_sigs_spp[1])


Out[17]:
<matplotlib.image.AxesImage at 0x7fe11028af10>

In [18]:
import signet_spp_300dpi
# Instantiate the model
model = CNNModel(signet_spp_300dpi, 'models/signet_spp_300dpi.pkl')

In [19]:
# Obtain the features. Note that we need to process them individually here since they have different sizes

user1_features_spp = [model.get_feature_vector(sig, layer='fc2') for sig in processed_user1_sigs_spp]
user2_features_spp = [model.get_feature_vector(sig, layer='fc2') for sig in processed_user2_sigs_spp]

In [20]:
print('Euclidean distance between signatures from the same user')
print(np.linalg.norm(user1_features_spp[0] - user1_features_spp[1]))
print(np.linalg.norm(user2_features_spp[0] - user2_features_spp[1]))


Euclidean distance between signatures from the same user
22.755056
25.688372

In [21]:
print('Euclidean distance between signatures from different users')

dists = [np.linalg.norm(u1 - u2) for u1 in user1_features_spp for u2 in user2_features_spp]
print(dists)


Euclidean distance between signatures from different users
[33.54788, 36.62496, 29.461115, 33.750706]