This chapter contains the followings:
Requisites:
conda install faiss-cpu -c pytorch
conda install faiss-gpu -c pytorch
Our final suggestions are as follows:
In [1]:
import numpy
import pqkmeans
from sklearn.cluster import KMeans
import faiss
In this chapter, we compare our PQk-means to k-means in the faiss library. Faiss provides one of the most efficient implementations of nearest neighbor algorithms for both CPU(s) and GPU(s). It also provides an implementation of vanilla k-means, which we will compare to. The core part of faiss is implemented by C++, and the python binding is available.
We compare PQk-means to both CPU- and GPU-version. Our configurations are:
For the comparison, we leverage the SIFT1M dataset.
In [2]:
Xt, X = pqkmeans.evaluation.get_sift1m_dataset() # Xt: the training data. X: the testing data to be clusterd
First, you can download the data by a helper script. This would take several minutes, and consume 168 MB of the disk space.
In [3]:
Xt = Xt.astype(numpy.float32)
X = X.astype(numpy.float32)
D = X.shape[1]
print("Xt.shape:{}\nX.shape:{}".format(Xt.shape, X.shape))
Because faiss takes 32-bit float vectors as inputs, the data is converted to float32.
First, let us compare the k-means implementation of faiss and sklearn using 100K vectors from SIFT1M. Then we show that faiss is much faster than sklearn with almost the same error.
Note that it is hard to run k-means-sklearn with a large K because it is too slow (that is the reason for this small-scale experiment)
In [4]:
K_small = 1000
N_small = 100000
# Setup clustering instances. We stop each algorithm with 10 iterations
kmeans_faiss_cpu_small = faiss.Kmeans(d=D, k=K_small, niter=10)
kmeans_sklearn_small = KMeans(n_clusters=K_small, n_jobs=-1, max_iter=10)
Let's run each algorithm
In [5]:
%%time
print("faiss-cpu:")
kmeans_faiss_cpu_small.train(X[:N_small])
_, ids_faiss_cpu_small = kmeans_faiss_cpu_small.index.search(X[:N_small], 1)
In [6]:
%%time
print("sklearn:")
ids_sklearn_small = kmeans_sklearn_small.fit_predict(X[:N_small])
In [7]:
_, faiss_cpu_small_error, _ = pqkmeans.evaluation.calc_error(ids_faiss_cpu_small.reshape(-1), X[:N_small], K_small)
_, sklearn_small_error, _ = pqkmeans.evaluation.calc_error(ids_sklearn_small, X[:N_small], K_small)
print("k-means, faiss-cpu, error: {}".format(faiss_cpu_small_error))
print("k-means, sklearn, error: {}".format(sklearn_small_error))
We observed that
Because faiss-CPU is faster thant sklearn, sklearn is not compared in the next section.
In [8]:
# Setup GPUs for faiss-gpu
# In my environment, the first GPU (id=0) is for rendering, and the second (id=1) and the third (id=2) GPUs are GPGPU (GTX1080).
# We activate only the second and the third GPU
import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # make sure the order is identical to the result of nvidia-smi
os.environ["CUDA_VISIBLE_DEVICES"] = "1,2" # Please change here for your environment
Next, let us compare PQk-meas with faiss-CPU and faiss-GPU using the whole dataset (N=10^6, K=10^4). Note that this is 100x larter setting compared to Sec 2 (NK=10^8 vs NK=10^10).
First, as pre-processing for PQk-means, let's train a PQencoder and encode all data. It will take around 10 sec.
In [9]:
%%time
# Train the encoder
encoder = pqkmeans.encoder.PQEncoder(num_subdim=4, Ks=256)
encoder.fit(Xt)
# Encode the vectors to PQ-codes
X_code = encoder.transform(X)
Note that X_code is 128x more memory efficient than X:
In [10]:
print("X.shape: {}, X.dtype: {}, X.nbytes: {} MB".format(X.shape, X.dtype, X.nbytes / 10**6))
print("X_code.shape: {}, X_code.dtype: {}, X_code.nbytes: {} MB".format(X_code.shape, X_code.dtype, X_code.nbytes / 10**6))
Then each algorithms are instantiated as follows
In [11]:
K = 10000 # Set larger K
# Setup k-means instances. The number of iteration is set as 20 for all methods
# PQ-kmeans
kmeans_pqkmeans = pqkmeans.clustering.PQKMeans(encoder=encoder, k=K, iteration=20)
# Faiss-cpu
kmeans_faiss_cpu = faiss.Kmeans(d=D, k=K, niter=20)
kmeans_faiss_cpu.cp.max_points_per_centroid = 1000000 # otherwise the kmeans implementation sub-samples the training set
Because some configurations are required for GPU, we wrap up the gpu clustering as one function:
In [12]:
def run_faiss_gpu(X, K, ngpu):
# This code is based on https://github.com/facebookresearch/faiss/blob/master/benchs/kmeans_mnist.py
D = X.shape[1]
clus = faiss.Clustering(D, K)
# otherwise the kmeans implementation sub-samples the training set
clus.max_points_per_centroid = 10000000
clus.niter = 20
res = [faiss.StandardGpuResources() for i in range(ngpu)]
flat_config = []
for i in range(ngpu):
cfg = faiss.GpuIndexFlatConfig()
cfg.useFloat16 = False
cfg.device = i
flat_config.append(cfg)
if ngpu == 1:
index = faiss.GpuIndexFlatL2(res[0], D, flat_config[0])
else:
indexes = [faiss.GpuIndexFlatL2(res[i], D, flat_config[i])
for i in range(ngpu)]
index = faiss.IndexProxy()
for sub_index in indexes:
index.addIndex(sub_index)
# Run clustering
clus.train(X, index)
# Return the assignment
_, ids = index.search(X, 1)
return ids
Run each method and see the computational cost.
In [13]:
%%time
print("PQk-means:")
ids_pqkmeans = kmeans_pqkmeans.fit_predict(X_code)
In [14]:
%%time
print("faiss-cpu:")
kmeans_faiss_cpu.train(X)
_, ids_faiss_cpu = kmeans_faiss_cpu.index.search(X, 1)
In [15]:
%%time
print("faiss with GPU:")
ids_faiss_gpu = run_faiss_gpu(X, K, ngpu=2) # Please adjust ngpu for your environment
In [16]:
_, pqkmeans_error, _ = pqkmeans.evaluation.calc_error(ids_pqkmeans, X, K)
_, faiss_cpu_error, _ = pqkmeans.evaluation.calc_error(ids_faiss_cpu.reshape(-1), X, K)
_, faiss_gpu_error, _ = pqkmeans.evaluation.calc_error(ids_faiss_gpu.reshape(-1), X, K)
print("PQk-means, error: {}".format(pqkmeans_error))
print("k-means, faiss-cpu, error: {}".format(faiss_cpu_error))
print("k-means, faiss-gpu, error: {}".format(faiss_gpu_error))
Our observations are:
Our final suggestions are as follows: