This chapter contains the followings:
In [1]:
import numpy
import pqkmeans
import tqdm
import matplotlib.pyplot as plt
%matplotlib inline
In this chapter, we show an example of image clustering. A deep feature (VGG16 fc6 activation) is extracted from each image using Keras, then the features are clustered using PQk-means.
First, let's read images from the CIFAR10 dataset.
In [2]:
from keras.datasets import cifar10
(img_train, _), (img_test, _) = cifar10.load_data()
When you run the above cell for the first time, this would take several minutes to download the dataset to your local space (typically ~/.keras/datasets).
The CIFAR10 dataset contains small color images, where each image is uint8 RGB 32x32 array. The shape of img_train
is (50000, 32, 32, 3), and that of img_test
is (10000, 32, 32, 3). Let's see some of them.
In [3]:
print("The first image of img_train:\n")
To train a PQ-encoder, we pick up the top 1000 images from img_train
. The clustering will be run on the top 5000 images from img_test
In [4]:
img_train = img_train[0:1000]
img_test = img_test[0:5000]
Next, let us extract a 4096-dimensional deep feature from each image. For the feature extactor, we employ an activation from the 6th full connected layer (in Keras implementation, it is called fc1
) of the ImageNet pre-trained VGG16 model. See the tutorial of keras for more details.
In [5]:
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
from keras.models import Model
from scipy.misc import imresize
base_model = VGG16(weights='imagenet') # Read the ImageNet pre-trained VGG16 model
model = Model(inputs=base_model.input, outputs=base_model.get_layer('fc1').output) # We use the output from the 'fc1' layer
def extract_feature(model, img):
# This function takes a RGB image (np.array with the size (H, W, 3)) as an input, then return a 4096D feature vector.
# Note that this can be accelerated by batch-processing.
x = imresize(img, (224, 224)) # Resize to 224x224 since the VGG takes this size as an input
x = numpy.float32(x) # Convert from uint8 to float32
x = numpy.expand_dims(x, axis=0) # Convert the shape from (224, 224) to (1, 224, 224)
x = preprocess_input(x) # Subtract the average value of ImagNet.
feature = model.predict(x)[0] # Extract a feature, then reshape from (1, 4096) to (4096, )
feature /= numpy.linalg.norm(feature) # Normalize the feature.
return feature
For the first time, this also takes several minutes to download the ImageNet pre-trained weights.
Let us extract features from images as follows. This takes several minutes using a usual GPU such as GTX1080.
In [6]:
features_train = numpy.array([extract_feature(model, img) for img in tqdm.tqdm(img_train)])
features_test = numpy.array([extract_feature(model, img) for img in tqdm.tqdm(img_test)])
Now we have a set of 4096D features for both the train-dataset and the test-dataset. Note that features_train[0]
is an image descriptor for img_train[0]
Let us train a PQ-encoder using the training dataset, and compress the deep features into PQ-codes
In [7]:
# Train an encoder
encoder = pqkmeans.encoder.PQEncoder(num_subdim=4, Ks=256)
# Encode the deep features to PQ-codes
pqcodes_test = encoder.transform(features_test)
# Run clustering
K = 10
print("Runtime of clustering:")
%time clustered = pqkmeans.clustering.PQKMeans(encoder=encoder, k=K).fit_predict(pqcodes_test)
Now we can visualize image clusters. As can be seen, each cluster has similar images such as "horses", "cars", etc.
In [8]:
for k in range(K):
print("Cluster id: k={}".format(k))
img_ids = [img_id for img_id, cluster_id in enumerate(clustered) if cluster_id == k]
cols = 10
img_ids = img_ids[0:cols] if cols < len(img_ids) else img_ids # Let's see the top 10 results
# Visualize images assigned to this cluster
imgs = img_test[img_ids]
plt.figure(figsize=(20, 5))
for i, img in enumerate(imgs):
plt.subplot(1, cols, i + 1)