Quantum Genetic K-Means

Theoretical introduction

The implemented algorithm is described in more detail in [1,3].

This algorithm is based on the quantum bit (qubit) model. A qubit can have any value between 0 and 1 (superposition property) until it is observed, which is when the system collapses to either state. However, the probability with which the system collapses to either state may be different. The superposition property or linear combination of states can be expressed as

$$ [\psi] = \alpha[0] + \beta[1] $$

where $\psi$ is an arbitrary state vector and $\alpha$, $\beta$ are the the probability amplitude coefficients of basis states $[0]$ and $[1]$, respectevely. The basis states correspond to the spin of the modeled particle (in this case, a ferminion, e.g. electron). The coefficients are subjected to the following normalization:

$$|\alpha|^2 + |\beta|^2 = 1$$

where $|\alpha|^2$, $|\beta|^2$ are the probabilities of observing states $[0]$ and $[1]$, respectevely. $\alpha$ and $\beta$ are complex quantities and represent a qubit:

$$\begin{bmatrix} \alpha \\ \beta \end{bmatrix}$$

Moreover, a qubit string may be represented by: $$ \begin{bmatrix} \left.\begin{matrix} \alpha_1\\ \beta_1 \end{matrix}\right| & \left.\begin{matrix} \alpha_2\\ \beta_2 \end{matrix}\right| & \begin{matrix} \alpha_3\\ \beta_3 \end{matrix} \end{bmatrix} $$

The probability of observing the state $[000]$ will be $|\alpha_1|^2 \times |\alpha_2|^2 \times |\alpha_3|^2$

To use this model for computing purposes, black-box objects called oracles are used. Oracles contain strings of qubits and generate their own input by observing the state of the qubits. After collapsing, the qubit value becomes analog to a classical bit. Each string of qubits represents a number, so the number of qubits in each string will define its precision. The number of strings chosen for the oracles depends on the number of clusters and dimensionality of the problem (e.g. for 3 clusters of 2 dimensions, 6 strings will be used since 6 numbers are required). Each oracle will represent a possible solution.

Algorithm

The algorithm has the following steps:

  1. initialize population of oracles
  2. Collapse oracles
  3. K-Means
  4. Compute cluster fitness
  5. Store
  6. Quantum Rotation Gate
  7. Collapse oracles
  8. Repeat 3-7 until generation (iteration) limit is reached

Initialize population of oracles

The oracles are created in this step and all qubit coefficients are initialized with $\frac{1}{\sqrt{2}}$, so that the system will observe either state with equal probability.

Collapse oracles

Collapsing the oracles implies making an observation of each qubit of each qubit string in each oracle. This is done by first choosing a coefficient to use (which is irrelevant), e.g. $\alpha$. Then, we generate a random value $r$ between 0 and 1. If $\alpha \ge r$ then the system collapses to $[0]$, otherwise to $[1]$.

K-Means

In this step we convert the binary representation of the qubit strings to base 10 and use them those values as initial centroids for K-Means. For each oracle, classical K-Means is then executed until it stabilizes or reaches the iteration limit. The solution centroids are returned to the oracles in binary representation.

Compute cluster fitness

Cluster fitness is computed using the Davies-Bouldin index for each oracle. The score of each oracle is stored in the oracle itself.

Store

The best scoring oracle is stored.

Quantum Rotation Gate

So far, we've had classical K-Means with a complex random number generation for the centroids and complicated datastructures. This is the step that fundamentally differs from the classical version. In this step a quantum gate (in this case a rotation gate) is applied to all oracles except the best one. The basic idea is to shift the qubit coefficients of the least scoring oracles so they'll have a higher probability of collapsing into initial centroid values closer to the best solution so far. This way, in future generations, we'll not initiate with the best centroids so far (which will not converge further into a better solution) but we'll be closer while still ensuring diversity (which is also a desired property of the genetic computing paradigm). In conclusion, we want to look for better solutions than the one we got before in each oracle while moving in the direction of the best we found so far.

In the original formulation of this algorithm two extra step existed to further increase diversity: quantum crossover and quantum mutation inversion. Both are part of the genetic algorithms toolbox, but were not implemented due to the suggestion from [1] that they are unnecessary steps with the careful choice of the rotatin angle.

Import modules


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d

import numpy as np
import sys
import os
import pickle
from datetime import datetime
from sklearn.cluster import KMeans

This should be rerun every time the code changes.


In [245]:
# add path for custom modules
mypath=os.path.abspath('/home/chiroptera/workspace/thesis/quantum k-means/implementation')
if not mypath in sys.path:
    sys.path.insert(1, mypath)
del mypath

import oracle
import qubitLib
import DaviesBouldin

reload(DaviesBouldin)
reload(oracle)
reload(qubitLib)


Out[245]:
<module 'qubitLib' from 'qubitLib.pyc'>

Generate data

To test the algorithm, a mixture of 4 Gaussian distribution is generated.


In [20]:
class gaussMix:
    def __init__(self,**kwargs):
        if kwargs is None:
            raise Exception('Some arguments must be supplied. dim=[dimension] \\
                            or pack=[dictionary with a previously created package]')
        else:
            if 'dim' in kwargs:
                self.dim=kwargs['dim']
                self.idNum=0
                self.gaussList=list()
                self.cardinality=0

            elif 'pack' in kwargs:
                self.unpack(kwargs['pack'])
            else:
                raise Exception('dim=[dimension] or pack=[dictionary with a previously created package]')
        
    def unpack(self,pack):
        self.dim=pack['dim']
        self.gaussList=pack['all']
        self.idNum=len(self.gaussList)
        self.cardinality=pack['cardinality']
        
    def add(self,mean,variance,numPoints):
        if len(mean)!=self.dim:
            raise Exception("wrong dimension")
        
        self.cardinality += numPoints
        
        g={'id':self.idNum,'mean':mean,'var':variance,
           'card':numPoints,'dist':np.random.normal(mean,variance,(numPoints,self.dim))}
        """
        if self.mixture== None:
            self.mixture=g['dist']
        else:
            self.mixture=np.concatenate((self.mixture,g['dist']))
        """
        self.gaussList.append(g)
        self.updateWeights()
        
        self.idNum+=1
        
    def getMixture(self):
        returnMix=[]
        
        for i,g in enumerate(self.gaussList):
            returnMix.append(g['dist'])
            
        returnMix=np.concatenate(tuple(returnMix))
        return returnMix
        
    def updateWeights(self):
        for g in self.gaussList:
            g['weight']=float(g['card'])/self.cardinality

    def sample(self,numPoints):
        returnMix=[]

        if numPoints > self.cardinality:
            return self.getMixture()
       
        for i,g in enumerate(self.gaussList):
            num=int(numPoints*g['weight'])
            returnMix.append(g['dist'][0:num,:])
                
        returnMix=np.concatenate(tuple(returnMix))
        return returnMix
    
    def getWeights(self):
        w=[0]*len(self.gaussList)
        for i,g in enumerate(self.gaussList):
            w[i]=g['weight']
                   
        return w
    
    def package(self):
        pack=dict()

        pack['all']=self.gaussList
        pack['data']=self.getMixture()
        pack['dim']=self.dim
        pack['cardinality']=self.cardinality
        
        return pack
    
    def gaussAsList(self):
        m=list()
        for i,g in enumerate(self.gaussList):
            m.append(g['dist'])
        return m
    
    """
    # better function without redundacy
    def package(self):
        pack=dict()
        
        pack['all']=[None]*len(self.gaussList)
            pack['all']
        for i,g in enumerate(self.gaussList):
            
        pack['all']=self.gaussList
        pack['data']=self.getMixture()
        pack['dim']=self.dim
        pack['cardinality']=self.cardinality
        
        return pack
        
        """

Genetaring datasets

bi36

Bidimensional, cardinality with order of magnitude 3, 6 gaussians.


In [24]:
name='bi36'
m=gaussMix(dim=2)
m.add((-5,3),[0.25,2],200)
m.add((2,-2),(0.25,0.25),400)
m.add((2,10),[0.75,0.25],300)
m.add((-3,2),[0.3,0.3],300)
m.add((3,7),[0.3,0.3],300)
m.add((-2,7),[0.3,1],300)

for i,g in enumerate(m.gaussAsList()):
    plt.title(name)
    plt.plot(g[:,0],g[:,1],'.')
    
plt.savefig("figure.png")



In [253]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!ls *pkl


bi36.pkl  tri36.pkl

bi56

Bidimensional, cardinality with order of magnitude 5, 6 gaussians.


In [14]:
name='bi46'
m=gaussMix(2)
m.add((-5,3),[0.25,2],200)
m.add((2,-2),(0.25,0.25),400)
m.add((2,10),[0.75,0.25],300)
m.add((-3,2),[0.3,0.3],300)
m.add((3,7),[0.3,0.3],300)
m.add((-2,7),[0.3,1],300)
s=m.getMixture()
plt.title(name)
plt.plot(s[:,0],s[:,1],'.')


Out[14]:
[<matplotlib.lines.Line2D at 0x7f68a6905fd0>]

In [5]:
cd workspace/QCThesis/QuantumGeneticKMeans/datasets/


/home/chiroptera/workspace/QCThesis/QuantumGeneticKMeans/datasets

In [6]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!ls *pkl


bi36_k.pkl	 bi36_qk.pkl  sex36_k.pkl	sex36_qk.pkl  tri56.pkl
bi36_params.pkl  bi46.pkl     sex36_params.pkl	sex56.pkl
bi36.pkl	 bi56.pkl     sex36.pkl		tri36.pkl

tri36

Tridimensional, cardinality with order of magnitude 3, 6 gaussians.


In [271]:
name='tri36'
m=gaussMix(3)
m.add((-5,3,3),[0.25,2,1],200)
m.add((2,-2,4),(0.25,0.25,0.25),400)
m.add((2,10,5),[0.75,0.25,0.5],300)
m.add((-3,2,3),[0.3,0.3,0.3],300)
m.add((3,7,4),[0.3,0.3,0.3],300)
m.add((-2,7,5),[0.3,1,0.6],300)

fig = plt.figure()
ax = fig.gca(projection='3d')

# plot points in 3D
for i,g in enumerate(m.gaussList):
    s=g['dist']
    ax.plot(s[:,0],s[:,1],s[:,2],'.')
    
plt.title(name)
#ax.view_init(elev=10., azim=45)


Out[271]:
<matplotlib.text.Text at 0x7f10740510d0>

In [272]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!du -sh *pkl


164K	bi36.pkl
16M	bi56.pkl
240K	tri36.pkl
24M	tri56.pkl

tri56

Tridimensional, cardinality with order of magnitude 5, 6 gaussians.


In [269]:
name='tri56'
m=gaussMix(3)
m.add((-5,3,3),[0.25,2,1],20000)
m.add((2,-2,4),(0.25,0.25,0.25),40000)
m.add((2,10,5),[0.75,0.25,0.5],30000)
m.add((-3,2,3),[0.3,0.3,0.3],30000)
m.add((3,7,4),[0.3,0.3,0.3],30000)
m.add((-2,7,5),[0.3,1,0.6],30000)

fig = plt.figure()
ax = fig.gca(projection='3d')

# plot points in 3D
for i,g in enumerate(m.gaussList):
    s=g['dist']
    ax.plot(s[:,0],s[:,1],s[:,2],'.')
    
plt.title(name)
#ax.view_init(elev=10., azim=45)


Out[269]:
<matplotlib.text.Text at 0x7f1074140a90>

In [270]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!du -sh *pkl


164K	bi36.pkl
16M	bi56.pkl
24M	tri36.pkl
24M	tri56.pkl

sex36

Six-dimensional, cardinality with order of magnitude 3, 6 gaussians.


In [7]:
name='sex46'
m=gaussMix(6)
m.add((-5,3,3,3,3,3),[0.25,2,1,0.25,0.25,1],2000)
m.add((2,-2,4,4,4,4),(0.25,0.25,0.25,0.25,0.25,0.25),4000)
m.add((2,10,5,5,5,5),[0.75,0.25,0.5,0.75,0.25,0.5],3000)
m.add((-3,2,3,3,3,3),[0.3,0.3,0.3,0.3,0.3,0.3],3000)
m.add((3,7,4,4,4,4),[0.3,0.3,0.3,0.3,0.3,0.3],3000)
m.add((-2,7,5,5,5,5),[0.3,1,0.6,0.3,1,0.6],3000)
"""
fig = plt.figure()
ax = fig.gca(projection='3d')

# plot points in 3D
for i,g in enumerate(m.gaussList):
    s=g['dist']
    ax.plot(s[:,0],s[:,1],s[:,2],'.')
    
plt.title(name)"""
#ax.view_init(elev=10., azim=45)


Out[7]:
"\nfig = plt.figure()\nax = fig.gca(projection='3d')\n\n# plot points in 3D\nfor i,g in enumerate(m.gaussList):\n    s=g['dist']\n    ax.plot(s[:,0],s[:,1],s[:,2],'.')\n    \nplt.title(name)"

In [8]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!du -sh *pkl


164K	bi36.pkl
1,6M	bi36_k.pkl
4,0K	bi36_params.pkl
1,8M	bi36_qk.pkl
1,6M	bi46.pkl
16M	bi56.pkl
472K	sex36.pkl
4,7M	sex36_k.pkl
4,0K	sex36_params.pkl
4,8M	sex36_qk.pkl
4,6M	sex46.pkl
46M	sex56.pkl
240K	tri36.pkl
24M	tri56.pkl

sex36

Six-dimensional, cardinality with order of magnitude 5, 6 gaussians.


In [276]:
name='sex56'
m=gaussMix(6)
m.add((-5,3,3,3,3,3),[0.25,2,1,0.25,0.25,1],20000)
m.add((2,-2,4,4,4,4),(0.25,0.25,0.25,0.25,0.25,0.25),40000)
m.add((2,10,5,5,5,5),[0.75,0.25,0.5,0.75,0.25,0.5],30000)
m.add((-3,2,3,3,3,3),[0.3,0.3,0.3,0.3,0.3,0.3],30000)
m.add((3,7,4,4,4,4),[0.3,0.3,0.3,0.3,0.3,0.3],30000)
m.add((-2,7,5,5,5,5),[0.3,1,0.6,0.3,1,0.6],30000)
"""
fig = plt.figure()
ax = fig.gca(projection='3d')

# plot points in 3D
for i,g in enumerate(m.gaussList):
    s=g['dist']
    ax.plot(s[:,0],s[:,1],s[:,2],'.')
    
plt.title(name)"""
#ax.view_init(elev=10., azim=45)


Out[276]:
"\nfig = plt.figure()\nax = fig.gca(projection='3d')\n\n# plot points in 3D\nfor i,g in enumerate(m.gaussList):\n    s=g['dist']\n    ax.plot(s[:,0],s[:,1],s[:,2],'.')\n    \nplt.title(name)"

In [277]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!du -sh *pkl


164K	bi36.pkl
16M	bi56.pkl
472K	sex36.pkl
46M	sex56.pkl
240K	tri36.pkl
24M	tri56.pkl

In [22]:
200+400+300+300+300+300


Out[22]:
1800

In [12]:
name='sex310'
m=gaussMix(10)
m.add((-5,3,3,3,3,3,3,3,3,3),0.6,200)
m.add((2,-2,4,4,4,4,4,4,4,4),0.25,400)
m.add((2,10,5,5,5,5,5,5,5,5),0.4,300)
m.add((-3,2,3,3,3,3,3,3,3,3),0.3,300)
m.add((3,7,4,4,4,4,4,4,4,4),0.3,300)
m.add((-2,7,5,5,5,5,5,5,5,5),0.6,300)
"""
fig = plt.figure()
ax = fig.gca(projection='3d')

# plot points in 3D
for i,g in enumerate(m.gaussList):
    s=g['dist']
    ax.plot(s[:,0],s[:,1],s[:,2],'.')
    
plt.title(name)"""
#ax.view_init(elev=10., azim=45)


Out[12]:
"\nfig = plt.figure()\nax = fig.gca(projection='3d')\n\n# plot points in 3D\nfor i,g in enumerate(m.gaussList):\n    s=g['dist']\n    ax.plot(s[:,0],s[:,1],s[:,2],'.')\n    \nplt.title(name)"

In [13]:
output = open(name+'.pkl', 'wb')

pickle.dump(m.package(), output)
output.close()

!du -sh *pkl


164K	bi36.pkl
1,6M	bi36_k.pkl
4,0K	bi36_params.pkl
1,8M	bi36_qk.pkl
1,6M	bi46.pkl
16M	bi56.pkl
784K	sex310.pkl
472K	sex36.pkl
4,7M	sex36_k.pkl
4,0K	sex36_params.pkl
4,8M	sex36_qk.pkl
7,7M	sex410.pkl
4,6M	sex46.pkl
46M	sex56.pkl
240K	tri36.pkl
24M	tri56.pkl

End generate datasets


In [100]:
# generate gaussians
numPoints=1000
numGaussians=6
pointsPerGaussian=int(numPoints/numGaussians)
gMix=list()
gMix.append(np.random.normal((-5,3),[0.25,2],(pointsPerGaussian,2)))
gMix.append(np.random.normal((2,-2),[0.25,0.25],(pointsPerGaussian,2)))
gMix.append(np.random.normal((2,10),[0.75,0.25],(pointsPerGaussian,2)))
gMix.append(np.random.normal((-3,2),[0.3,0.3],(pointsPerGaussian,2)))
gMix.append(np.random.normal((3,7),[0.3,0.3],(pointsPerGaussian,2)))
gMix.append(np.random.normal((-2,7),[0.3,1],(pointsPerGaussian,2)))

mixture=np.concatenate(tuple(gMix))

# add outliers

# plot data points
for i in range(0,len(gMix)):
    plt.plot(gMix[i][:,0],gMix[i][:,1],'.')



In [77]:
output = open('data.pkl', 'wb')

pickle.dump(mixture, output)
output.close()


/home/chiroptera/workspace/QCThesis/QuantumGeneticKMeans

In [ ]:

Quantum Algorithm


In [7]:
## Initialization  step
numClusters=15
numOracles=5
qubitStringLen=5
qGenerations=100

In [241]:
%load /home/chiroptera/workspace/QCThesis/QuantumGeneticKMeans/QK-Means.py

In [8]:
# needs:
#  - mixture
#  - numOracles
#  - numClusters
#  - qubitStringLen
#  - qGenerations
#  - 
# returns:
#  - qk_timings_cg
#  - qk_centroids (centroids of oracles from last generation)
#  - qk_assignment(assignment of oracles from last generation)
#  - fitnessEvolution
#  - qk_total
#  - qk_genTimes
#  - qk_assignedData
def qk_means(mixture,numOracles,numClusters,qubitStringLen,qGenerations):
    fitnessEvolution = np.zeros((qGenerations,numOracles+1))
    
    qk_timings_cg = list()
    start = datetime.now()
    
    best = 0 #index of best oracle (starts at 0)
    
    oras = list()
    qk_centroids = [0]*numOracles
    qk_estimator = [0]*numOracles
    qk_assignment = [0]*numOracles
    
    for i in range(0,numOracles):
        oras.append(oracle.Oracle())
        oras[i].initialization(numClusters*2,qubitStringLen)
        oras[i].collapse()
    
    qk_timings_cg.append((datetime.now() - start).total_seconds())
    start = datetime.now()
    
    for qGen_ in range(0,qGenerations):
        ## Clustering step
        for i,ora in enumerate(oras):
            if qGen_ != 0 and i == best: # current best shouldn't be modified
                continue 
            qk_centroids[i] = np.vstack(np.hsplit(ora.getIntArrays(),numClusters))
            qk_estimator[i] = KMeans(n_clusters=numClusters,init=qk_centroids[i],n_init=1)
            qk_assignment[i] = qk_estimator[i].fit_predict(mixture)
            qk_centroids[i] = qk_estimator[i].cluster_centers_
            ora.setIntArrays(np.concatenate(qk_centroids[i]))
        
        ## Compute fitness
            score = DaviesBouldin.DaviesBouldin(mixture,qk_centroids[i],qk_assignment[i])
            ora.score = score.eval()
        
        ## Store best from this generation
        for i in range(1,numOracles):
            if oras[i].score < oras[best].score:
                best = i
                
        ## Quantum Rotation Gate 
        for i in range(0,numOracles):
            if i == best:
                continue
            
            oras[i].QuantumGateStep(oras[best])
            
        ## Collapse qubits
            oras[i].collapse()
            
        qk_timings_cg.append((datetime.now() - start).total_seconds())
    
        for i in range(0,numOracles):
            fitnessEvolution[qGen_,i]=oras[i].score
            fitnessEvolution[qGen_,-1]=best
        print '.', # simple "progress bar"
        start = datetime.now()
        
    print 'Done!'
    
    return qk_centroids,qk_assignment,fitnessEvolution,qk_timings_cg
#  - qk_timings_cg
#  - qk_centroids (centroids of oracles from last generation)
#  - qk_assignment(assignment of oracles from last generation)
#  - fitnessEvolution
#  - qk_total
#  - qk_genTimes
#  - qk_assignedData

In [9]:
numRounds=5
qk_results=list()
for i in range(numRounds):
    print 'round ',i
    qk_centroids,qk_assignment,fitnessEvolution,qk_timings_cg=qk_means(mixture,numOracles,numClusters,qubitStringLen,qGenerations)
    qk_results.append([qk_centroids,qk_assignment,fitnessEvolution,qk_timings_cg])


round  0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Done!
round  1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Done!
round  2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Done!
round  3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Done!
round  4
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Done!

Organize data in dictionary.


In [10]:
qk_rounds=dict()
qk_rounds['centroids']=list()
qk_rounds['assignment']=list()
qk_rounds['fitness']=list()
qk_rounds['times']=list()

for i in range(numRounds):
    qk_rounds['centroids'].append(qk_results[i][0])
    qk_rounds['assignment'].append(qk_results[i][1])
    qk_rounds['fitness'].append(qk_results[i][2])
    qk_rounds['times'].append(qk_results[i][3])
    
print 'Done!'


Done!

In [71]:
len(qk_rounds['centroids'][0]),type(qk_rounds['centroids'][0])


Out[71]:
(5, list)

Assign points to clusters.


In [12]:
qk_rounds['assignedData']=list() #assigned data for the best solution in each round

for i in range(numRounds):
    # assign clusters to data
    best=int(qk_rounds['fitness'][i][-1,-1])
    qk_assignment=qk_rounds['assignment'][i]
    
    qk_assignedData = [None]*numClusters
    for i,j in enumerate(qk_assignment[best]):
        if qk_assignedData[j] != None:
            qk_assignedData[j] = np.vstack((qk_assignedData[j],mixture[i]))
        else:
            qk_assignedData[j] = mixture[i]
    
    qk_rounds['assignedData'].append(qk_assignedData)
    
print 'Done!'


Done!

In [67]:
qk_rounds['assignedData'][0]


Out[67]:
[array([[-2.81844067,  2.04388705],
       [-3.47245889,  2.03520442],
       [-2.55025048,  2.22325261],
       [-3.13337199,  1.31342035],
       [-3.21986356,  2.108664  ],
       [-3.17934404,  1.9791999 ],
       [-3.58053078,  1.99544435],
       [-3.56222564,  2.39310111],
       [-2.96841403,  1.94929658],
       [-2.85475116,  1.97013862],
       [-2.51571827,  1.65281783],
       [-2.31088815,  1.59751511],
       [-2.56787069,  2.005329  ],
       [-2.35943764,  2.16084927],
       [-2.91066039,  2.07297591],
       [-2.70256397,  1.92951459],
       [-3.14055209,  1.76980544],
       [-3.22671813,  1.90827485],
       [-3.30116133,  2.27413584],
       [-3.14495889,  1.79469924],
       [-2.53883426,  1.97937212],
       [-2.67509913,  2.31515326],
       [-3.2159128 ,  2.02939127],
       [-2.54010794,  2.09783929],
       [-3.58712181,  1.79704942],
       [-2.73371426,  2.11254948],
       [-2.94369471,  2.12891159],
       [-2.9666015 ,  1.93497853],
       [-2.85813906,  2.17027856],
       [-2.72886013,  2.38663958],
       [-2.71172757,  2.2618372 ],
       [-2.84180167,  2.07595954],
       [-2.90200636,  1.84093061]]),
 array([[ 1.80599869, -1.97854574],
       [ 2.26977418, -1.9981279 ],
       [ 1.83938192, -2.3668606 ],
       [ 1.91903827, -2.45916003],
       [ 2.1139499 , -2.12803706],
       [ 1.77019371, -2.1532512 ],
       [ 1.64384335, -1.63781922],
       [ 1.86913919, -2.00822076],
       [ 2.52624114, -1.81440902],
       [ 2.14613762, -2.15451971],
       [ 2.03076633, -1.83792586],
       [ 1.74205124, -2.40973226],
       [ 2.28912571, -1.96019455],
       [ 2.07079299, -2.23545124],
       [ 2.16062544, -2.59320126],
       [ 2.5371791 , -2.18016623],
       [ 2.12455777, -2.2719215 ],
       [ 2.06378593, -2.13736807],
       [ 1.60914741, -1.81927943],
       [ 2.10982045, -1.94449056],
       [ 1.93247763, -1.47793567],
       [ 1.73366444, -2.15059708],
       [ 1.80305615, -1.67695941],
       [ 2.2378356 , -2.01001569],
       [ 1.9056484 , -2.36253329],
       [ 1.96001684, -2.11702957],
       [ 2.01187404, -1.96750466],
       [ 1.77807163, -2.16814919],
       [ 1.90571608, -2.19283456],
       [ 2.25183411, -1.59905975],
       [ 1.6638006 , -1.85424192],
       [ 2.22738762, -1.98533213],
       [ 2.06436143, -1.77647238]]),
 array([-4.98626636, -1.89088356]),
 array([-4.786391  , -1.28143804]),
 array([-5.35266205,  0.77689589]),
 array([-5.07099833,  0.49279954]),
 array([-4.79500649,  0.178925  ]),
 array([[-2.03939773,  7.14555792],
       [-2.04516803,  7.41250095],
       [-2.00989028,  6.99245851],
       [-2.15000764,  7.27013382],
       [-1.60992514,  4.98810434],
       [-2.13517934,  6.09479421],
       [-1.7133824 ,  5.90660021],
       [-1.83420013,  8.41380829],
       [-1.95836298,  7.02755084],
       [-1.51806331,  7.32512589],
       [-2.22020752,  7.70891907],
       [-2.0464847 ,  7.92708097],
       [-1.86709163,  7.07750016],
       [-1.76884218,  6.45329896],
       [-1.83866966,  7.94374372],
       [-2.08176506,  6.39050685],
       [-1.84344352,  6.26432028],
       [-2.33777777,  7.83362974],
       [-1.65079476,  6.21876307],
       [-1.79689792,  8.55378705],
       [-1.97250394,  7.60289793],
       [-1.62921441,  5.7073938 ],
       [-2.28827001,  4.72750525],
       [-1.51781975,  5.67309127],
       [-1.61151318,  6.54558852],
       [-1.73429593,  6.29415346],
       [-2.27421551,  7.19660075],
       [-2.24084089,  7.6045606 ],
       [-2.29499495,  7.00661758],
       [-1.94657533,  7.09857624],
       [-1.96397555,  8.66004631],
       [-2.33536095,  8.66065374],
       [-2.5636781 ,  7.15256652]]),
 array([-5.10190488,  0.94444428]),
 array([[  1.61173467,   9.95000258],
       [  1.80272473,  10.17155543],
       [  2.75922404,   9.82252372],
       [  2.3139734 ,   9.74584861],
       [  1.59741116,   9.58322703],
       [  1.5942076 ,  10.47434604],
       [  2.18467774,   9.81124373],
       [  2.52380824,  10.02977778],
       [  2.17961369,   9.81859972],
       [  2.08271347,   9.88393859],
       [  2.39295858,  10.21920769],
       [  2.40887823,   9.57553507],
       [  1.12672866,  10.44684941],
       [  1.84158272,  10.33341832],
       [  2.64883964,  10.07750906],
       [  2.28470621,  10.13549784],
       [  0.01627554,  10.2481508 ],
       [  3.12158256,   9.97084907],
       [  1.21922788,   9.97190406],
       [  3.07103531,  10.4704329 ],
       [  3.04343441,  10.1000064 ],
       [  2.98294705,   9.99884389],
       [  0.4947887 ,  10.08214329],
       [  1.01783161,   9.87552869],
       [  3.20400656,  10.47465276],
       [  0.68885269,  10.05567721],
       [  3.45276484,  10.22594419],
       [  0.76107326,   9.83723146],
       [  2.97696155,   9.95116191],
       [  3.15047094,   9.65119827],
       [  0.60688   ,   9.82376979],
       [  1.20836697,  10.00736694],
       [  1.13194967,   9.75365396]]),
 array([[-4.83570808,  2.29719545],
       [-4.90612646,  2.12002173],
       [-4.72860391,  2.04206386],
       [-5.27840819,  1.82161285],
       [-5.08401073,  2.0419549 ],
       [-5.07549237,  2.41112785],
       [-5.34833973,  2.3797585 ],
       [-4.87101441,  2.05138828],
       [-4.77950279,  2.39399703],
       [-5.28731442,  2.25034938]]),
 array([-4.8583785 ,  1.14855069]),
 array([[ 3.09741317,  6.99660849],
       [ 2.9788277 ,  6.68831087],
       [ 2.74910068,  6.58855149],
       [ 3.02702051,  6.89963032],
       [ 3.55020319,  7.15701474],
       [ 2.59218553,  6.96901588],
       [ 2.71455305,  7.01311029],
       [ 3.15809783,  6.3467808 ],
       [ 2.84089602,  6.89733235],
       [ 2.95231798,  6.74725147],
       [ 3.10971471,  7.15688205],
       [ 3.53845282,  7.50570034],
       [ 3.35535574,  6.94440068],
       [ 2.71463423,  7.23509205],
       [ 2.72580882,  7.18817978],
       [ 3.52611653,  7.05729893],
       [ 3.11292329,  6.90333824],
       [ 3.0245853 ,  7.33470118],
       [ 3.23177301,  7.11685977],
       [ 3.07749271,  6.8955104 ],
       [ 3.15926284,  7.40245338],
       [ 3.65853419,  6.95601606],
       [ 2.92550594,  7.15599989],
       [ 3.11705076,  7.06556082],
       [ 2.74783644,  6.95333989],
       [ 2.7034372 ,  7.23173858],
       [ 3.00732531,  6.7380642 ],
       [ 3.29325151,  7.41437819],
       [ 3.49797518,  6.909729  ],
       [ 3.09591001,  6.72645969],
       [ 3.00896909,  6.41479346],
       [ 3.3682527 ,  6.59713443],
       [ 3.4032807 ,  7.28270365]]),
 array([[-4.64433996,  5.27142362],
       [-5.34634873,  4.89349189],
       [-5.18540189,  6.38705495],
       [-5.31460217,  5.32238062],
       [-5.30185408,  7.15571745],
       [-4.54760461,  6.00854048],
       [-4.96428059,  4.85107797]]),
 array([[-4.8842357 ,  3.62849661],
       [-4.59304761,  2.93016854],
       [-4.99435211,  3.60462994],
       [-5.019574  ,  3.85115844],
       [-4.82434246,  2.87429971],
       [-5.06589621,  3.9640957 ],
       [-4.80680415,  3.3164272 ],
       [-4.94313435,  3.83922991],
       [-4.77991258,  3.0603488 ]])]

Computational time

Compute total time for each round, mean, variance, worst, best for all rounds.


In [13]:
qk_rounds['total time'] = list()

for i in range(numRounds):
    qk_times=qk_rounds['times'][i]
    qk_total = np.sum(np.array(qk_times))
    qk_rounds['total time'].append(qk_total)

qk_time_mean = np.mean(np.array(qk_rounds['total time']))
qk_time_var = np.var(np.array(qk_rounds['total time']))
qk_time_best = np.min(np.array(qk_rounds['total time']))
qk_time_worst = np.max(np.array(qk_rounds['total time']))

print 'Total execution times'
print 'mean\t\tvariance\tbest\t\tworst'
print qk_time_mean,'\t',qk_time_var,'\t',qk_time_best,'\t',qk_time_worst


Total execution times
mean		variance	best		worst
9.692219 	0.000982871018 	9.65627 	9.731704

Population DB index mean and variance, best solution evolution

Compute mean and variance of population DB index for each generation. Evolution of DB index for best solution in each generation.


In [14]:
qk_rounds['best evolution'] = list()
qk_rounds['pop variance'] = list()
qk_rounds['pop mean'] = list()

for i in range(numRounds):
    bestSeq=[]
    bestSeqIndex=[]
    fitnessEvolution=qk_rounds['fitness'][i]
    for i in range(0,fitnessEvolution.shape[0]):
        bestSeq.append(fitnessEvolution[i,fitnessEvolution[i,-1]])
    bestSeq=np.array(bestSeq)
    
    genVar=np.zeros(qGenerations)
    genMean=np.zeros(qGenerations)
    
    for i,ar in enumerate(fitnessEvolution):
        genVar[i]=np.var(ar[:-1])
        genMean[i]=np.mean(ar[:-1])
        
    qk_rounds['best evolution'].append(bestSeq)
    qk_rounds['pop variance'].append(genVar)
    qk_rounds['pop mean'].append(genMean)

Plot best solution DB index evolution


In [15]:
for i in range(numRounds):
    plt.plot(range(qGenerations),qk_rounds['best evolution'][i],label="round "+str(i))
    
plt.legend(loc='best', framealpha=0.3,prop={'size':'small'})


Out[15]:
<matplotlib.legend.Legend at 0x7f107603fad0>

Plot population mean per generation


In [14]:
for i in range(numRounds):
    plt.plot(range(qGenerations),qk_rounds['pop mean'][i],label="round "+str(i))
    
plt.legend(loc='best', framealpha=0.3,prop={'size':'small'})


Out[14]:
<matplotlib.legend.Legend at 0x7f54ea564610>

Plot population variance per generation


In [15]:
for i in range(numRounds):
    plt.plot(range(qGenerations),qk_rounds['pop variance'][i],label="round "+str(i))
    
plt.legend(loc='best', framealpha=0.3,prop={'size':'small'})


Out[15]:
<matplotlib.legend.Legend at 0x7f54ea21d5d0>

Save data in files

Save QK centroids. Each row has the format:

[number of round, number of oracle, centroids in line]


In [16]:
qk_saveCentroids=list()
for i in range(numRounds):
    for j in range(numOracles):
        line=np.concatenate(qk_rounds['centroids'][i][j]).tolist()
        line.insert(0,j)
        line.insert(0,i)
        qk_saveCentroids.append(line)

qk_saveCentroids=np.array(qk_saveCentroids)
qk_saveCentroids.shape


Out[16]:
(25, 32)

K-Means

To make the run of classic K-Means approximate to that of QK-Means, we'll set the number of initializations to $qGenerations \times numOracles$. This is chosen since the classical K-Means is executed once for each oracle in each generation.


In [16]:
def k_means(mixture,numClusters,numInits):

	k_timings_cg=list()
	start=datetime.now()

	k_assignment=list()
	k_centroids=list()

	for i in range(numInits):
		estimator = KMeans(n_clusters=numClusters,init='k-means++',n_init=1)
		assignment = estimator.fit_predict(mixture)
		centroids = estimator.cluster_centers_

		k_centroids.append(centroids)
		k_assignment.append(assignment)

		k_timings_cg.append((datetime.now() - start).total_seconds())
		start=datetime.now()

	return k_centroids,k_assignment,k_timings_cg

In [17]:
initsPercentage=1.15
numInits=np.int(qGenerations*numOracles*initsPercentage)

k_results=list()

for i in range(numRounds):
    print 'round ',i
    k_centroids,k_assignment,k_timings=k_means(mixture,numClusters,numInits)
    k_results.append([k_centroids,k_assignment,k_timings])
print 'done!'


round  0
round  1
round  2
round  3
round  4
done!

In [18]:
k_rounds=dict()
k_rounds['centroids']=list()
k_rounds['assignment']=list()
k_rounds['times']=list()

for i in range(numRounds):
    k_rounds['centroids'].append(k_results[i][0])
    k_rounds['assignment'].append(k_results[i][1])
    k_rounds['times'].append(k_results[i][2])
    
print 'Done!'


Done!

In [62]:
k_rounds['assignedData']=list() #assigned data for the best solution in each round

for i in range(numRounds):
    # assign clusters to data
    best=int(qk_rounds['fitness'][i][-1,-1])
    k_assignment=k_rounds['assignment'][i]
    
    k_assignedData = [None]*numClusters
    for i,j in enumerate(k_assignment[best]):
        if k_assignedData[j] != None:
            k_assignedData[j] = np.vstack((k_assignedData[j],mixture[i]))
        else:
            k_assignedData[j] = mixture[i]
    
    k_rounds['assignedData'].append(k_assignedData)
    
print 'Done!'


Done!

In [63]:
k_rounds['assignedData'][0]


Out[63]:
[array([[ 1.80599869, -1.97854574],
       [ 2.26977418, -1.9981279 ],
       [ 1.83938192, -2.3668606 ],
       [ 1.91903827, -2.45916003],
       [ 2.1139499 , -2.12803706],
       [ 1.77019371, -2.1532512 ],
       [ 1.64384335, -1.63781922],
       [ 1.86913919, -2.00822076],
       [ 2.52624114, -1.81440902],
       [ 2.14613762, -2.15451971],
       [ 2.03076633, -1.83792586],
       [ 1.74205124, -2.40973226],
       [ 2.28912571, -1.96019455],
       [ 2.07079299, -2.23545124],
       [ 2.16062544, -2.59320126],
       [ 2.5371791 , -2.18016623],
       [ 2.12455777, -2.2719215 ],
       [ 2.06378593, -2.13736807],
       [ 1.60914741, -1.81927943],
       [ 2.10982045, -1.94449056],
       [ 1.93247763, -1.47793567],
       [ 1.73366444, -2.15059708],
       [ 1.80305615, -1.67695941],
       [ 2.2378356 , -2.01001569],
       [ 1.9056484 , -2.36253329],
       [ 1.96001684, -2.11702957],
       [ 2.01187404, -1.96750466],
       [ 1.77807163, -2.16814919],
       [ 1.90571608, -2.19283456],
       [ 2.25183411, -1.59905975],
       [ 1.6638006 , -1.85424192],
       [ 2.22738762, -1.98533213],
       [ 2.06436143, -1.77647238]]),
 array([[-2.13517934,  6.09479421],
       [-1.7133824 ,  5.90660021],
       [-1.76884218,  6.45329896],
       [-2.08176506,  6.39050685],
       [-1.84344352,  6.26432028],
       [-1.65079476,  6.21876307],
       [-1.61151318,  6.54558852],
       [-1.73429593,  6.29415346]]),
 array([[ 3.09741317,  6.99660849],
       [ 2.9788277 ,  6.68831087],
       [ 2.74910068,  6.58855149],
       [ 3.02702051,  6.89963032],
       [ 3.55020319,  7.15701474],
       [ 2.59218553,  6.96901588],
       [ 2.71455305,  7.01311029],
       [ 3.15809783,  6.3467808 ],
       [ 2.84089602,  6.89733235],
       [ 2.95231798,  6.74725147],
       [ 3.10971471,  7.15688205],
       [ 3.53845282,  7.50570034],
       [ 3.35535574,  6.94440068],
       [ 2.71463423,  7.23509205],
       [ 2.72580882,  7.18817978],
       [ 3.52611653,  7.05729893],
       [ 3.11292329,  6.90333824],
       [ 3.0245853 ,  7.33470118],
       [ 3.23177301,  7.11685977],
       [ 3.07749271,  6.8955104 ],
       [ 3.15926284,  7.40245338],
       [ 3.65853419,  6.95601606],
       [ 2.92550594,  7.15599989],
       [ 3.11705076,  7.06556082],
       [ 2.74783644,  6.95333989],
       [ 2.7034372 ,  7.23173858],
       [ 3.00732531,  6.7380642 ],
       [ 3.29325151,  7.41437819],
       [ 3.49797518,  6.909729  ],
       [ 3.09591001,  6.72645969],
       [ 3.00896909,  6.41479346],
       [ 3.3682527 ,  6.59713443],
       [ 3.4032807 ,  7.28270365]]),
 array([[-3.47245889,  2.03520442],
       [-3.13337199,  1.31342035],
       [-3.21986356,  2.108664  ],
       [-3.17934404,  1.9791999 ],
       [-3.58053078,  1.99544435],
       [-3.56222564,  2.39310111],
       [-3.14055209,  1.76980544],
       [-3.22671813,  1.90827485],
       [-3.30116133,  2.27413584],
       [-3.14495889,  1.79469924],
       [-3.2159128 ,  2.02939127],
       [-3.58712181,  1.79704942]]),
 array([[  2.75922404,   9.82252372],
       [  2.3139734 ,   9.74584861],
       [  2.18467774,   9.81124373],
       [  2.52380824,  10.02977778],
       [  2.17961369,   9.81859972],
       [  2.08271347,   9.88393859],
       [  2.39295858,  10.21920769],
       [  2.40887823,   9.57553507],
       [  2.64883964,  10.07750906],
       [  2.28470621,  10.13549784],
       [  3.12158256,   9.97084907],
       [  3.07103531,  10.4704329 ],
       [  3.04343441,  10.1000064 ],
       [  2.98294705,   9.99884389],
       [  3.20400656,  10.47465276],
       [  3.45276484,  10.22594419],
       [  2.97696155,   9.95116191],
       [  3.15047094,   9.65119827]]),
 array([[-4.64433996,  5.27142362],
       [-5.34634873,  4.89349189],
       [-5.18540189,  6.38705495],
       [-5.31460217,  5.32238062],
       [-5.30185408,  7.15571745],
       [-4.54760461,  6.00854048],
       [-4.96428059,  4.85107797]]),
 array([[-4.83570808,  2.29719545],
       [-4.90612646,  2.12002173],
       [-4.72860391,  2.04206386],
       [-5.27840819,  1.82161285],
       [-5.08401073,  2.0419549 ],
       [-5.07549237,  2.41112785],
       [-5.34833973,  2.3797585 ],
       [-4.87101441,  2.05138828],
       [-4.77950279,  2.39399703],
       [-5.28731442,  2.25034938]]),
 array([[-4.98626636, -1.89088356],
       [-4.786391  , -1.28143804]]),
 array([[-4.8842357 ,  3.62849661],
       [-4.59304761,  2.93016854],
       [-4.99435211,  3.60462994],
       [-5.019574  ,  3.85115844],
       [-4.82434246,  2.87429971],
       [-5.06589621,  3.9640957 ],
       [-4.80680415,  3.3164272 ],
       [-4.94313435,  3.83922991],
       [-4.77991258,  3.0603488 ]]),
 array([[  1.61173467,   9.95000258],
       [  1.80272473,  10.17155543],
       [  1.59741116,   9.58322703],
       [  1.5942076 ,  10.47434604],
       [  1.12672866,  10.44684941],
       [  1.84158272,  10.33341832],
       [  0.01627554,  10.2481508 ],
       [  1.21922788,   9.97190406],
       [  0.4947887 ,  10.08214329],
       [  1.01783161,   9.87552869],
       [  0.68885269,  10.05567721],
       [  0.76107326,   9.83723146],
       [  0.60688   ,   9.82376979],
       [  1.20836697,  10.00736694],
       [  1.13194967,   9.75365396]]),
 array([[-1.83420013,  8.41380829],
       [-1.79689792,  8.55378705],
       [-1.96397555,  8.66004631],
       [-2.33536095,  8.66065374]]),
 array([[-2.81844067,  2.04388705],
       [-2.55025048,  2.22325261],
       [-2.96841403,  1.94929658],
       [-2.85475116,  1.97013862],
       [-2.51571827,  1.65281783],
       [-2.31088815,  1.59751511],
       [-2.56787069,  2.005329  ],
       [-2.35943764,  2.16084927],
       [-2.91066039,  2.07297591],
       [-2.70256397,  1.92951459],
       [-2.53883426,  1.97937212],
       [-2.67509913,  2.31515326],
       [-2.54010794,  2.09783929],
       [-2.73371426,  2.11254948],
       [-2.94369471,  2.12891159],
       [-2.9666015 ,  1.93497853],
       [-2.85813906,  2.17027856],
       [-2.72886013,  2.38663958],
       [-2.71172757,  2.2618372 ],
       [-2.84180167,  2.07595954],
       [-2.90200636,  1.84093061]]),
 array([[-2.03939773,  7.14555792],
       [-2.04516803,  7.41250095],
       [-2.00989028,  6.99245851],
       [-2.15000764,  7.27013382],
       [-1.95836298,  7.02755084],
       [-1.51806331,  7.32512589],
       [-2.22020752,  7.70891907],
       [-2.0464847 ,  7.92708097],
       [-1.86709163,  7.07750016],
       [-1.83866966,  7.94374372],
       [-2.33777777,  7.83362974],
       [-1.97250394,  7.60289793],
       [-2.27421551,  7.19660075],
       [-2.24084089,  7.6045606 ],
       [-2.29499495,  7.00661758],
       [-1.94657533,  7.09857624],
       [-2.5636781 ,  7.15256652]]),
 array([[-1.60992514,  4.98810434],
       [-1.62921441,  5.7073938 ],
       [-2.28827001,  4.72750525],
       [-1.51781975,  5.67309127]]),
 array([[-5.07099833,  0.49279954],
       [-5.35266205,  0.77689589],
       [-4.8583785 ,  1.14855069],
       [-4.79500649,  0.178925  ],
       [-5.10190488,  0.94444428]])]

Computational time

Mean computational total time for each round, variance, worst, best for all rounds.


In [20]:
k_rounds['total time'] = list()

for i in range(numRounds):
    k_times=k_rounds['times'][i]
    k_total = np.sum(np.array(k_times))
    k_rounds['total time'].append(k_total)

k_time_mean = np.mean(np.array(k_rounds['total time']))
k_time_var = np.var(np.array(k_rounds['total time']))

k_rounds['fastest round']=np.argmin(np.array(k_rounds['total time']))
k_rounds['slowest round']=np.argmax(np.array(k_rounds['total time']))

k_time_best = k_rounds['total time'][k_rounds['fastest round']]
k_time_worst = k_rounds['total time'][k_rounds['slowest round']]

print 'Total execution times'
print 'mean\t\tvariance\tbest\t\tworst'
print k_time_mean,'\t',k_time_var,'\t',k_time_best,'\t',k_time_worst


Total execution times
mean		variance	best		worst
3.5615876 	0.00744403803544 	3.502209 	3.731282

In [60]:
# assign data to clusters
k_rounds['assignedData']=[None]*numRounds #assigned data for the best solution in each round

for i in range(numRounds):
    # assign clusters to data
    k_assignment=k_rounds['assignment'][i][k_rounds['fitness best index'][i]]
    
    k_assignedData = [None]*numClusters
    for point,j in enumerate(k_assignment):
        if k_assignedData[j] != None:
            k_assignedData[j] = np.vstack((k_assignedData[j],mixture[point]))
        else:
            k_assignedData[j] = mixture[i]
    
    k_rounds['assignedData'][i]=k_assignedData

In [64]:
k_rounds['assignedData'][0]


Out[64]:
[array([[ 1.80599869, -1.97854574],
       [ 2.26977418, -1.9981279 ],
       [ 1.83938192, -2.3668606 ],
       [ 1.91903827, -2.45916003],
       [ 2.1139499 , -2.12803706],
       [ 1.77019371, -2.1532512 ],
       [ 1.64384335, -1.63781922],
       [ 1.86913919, -2.00822076],
       [ 2.52624114, -1.81440902],
       [ 2.14613762, -2.15451971],
       [ 2.03076633, -1.83792586],
       [ 1.74205124, -2.40973226],
       [ 2.28912571, -1.96019455],
       [ 2.07079299, -2.23545124],
       [ 2.16062544, -2.59320126],
       [ 2.5371791 , -2.18016623],
       [ 2.12455777, -2.2719215 ],
       [ 2.06378593, -2.13736807],
       [ 1.60914741, -1.81927943],
       [ 2.10982045, -1.94449056],
       [ 1.93247763, -1.47793567],
       [ 1.73366444, -2.15059708],
       [ 1.80305615, -1.67695941],
       [ 2.2378356 , -2.01001569],
       [ 1.9056484 , -2.36253329],
       [ 1.96001684, -2.11702957],
       [ 2.01187404, -1.96750466],
       [ 1.77807163, -2.16814919],
       [ 1.90571608, -2.19283456],
       [ 2.25183411, -1.59905975],
       [ 1.6638006 , -1.85424192],
       [ 2.22738762, -1.98533213],
       [ 2.06436143, -1.77647238]]),
 array([[-2.13517934,  6.09479421],
       [-1.7133824 ,  5.90660021],
       [-1.76884218,  6.45329896],
       [-2.08176506,  6.39050685],
       [-1.84344352,  6.26432028],
       [-1.65079476,  6.21876307],
       [-1.61151318,  6.54558852],
       [-1.73429593,  6.29415346]]),
 array([[ 3.09741317,  6.99660849],
       [ 2.9788277 ,  6.68831087],
       [ 2.74910068,  6.58855149],
       [ 3.02702051,  6.89963032],
       [ 3.55020319,  7.15701474],
       [ 2.59218553,  6.96901588],
       [ 2.71455305,  7.01311029],
       [ 3.15809783,  6.3467808 ],
       [ 2.84089602,  6.89733235],
       [ 2.95231798,  6.74725147],
       [ 3.10971471,  7.15688205],
       [ 3.53845282,  7.50570034],
       [ 3.35535574,  6.94440068],
       [ 2.71463423,  7.23509205],
       [ 2.72580882,  7.18817978],
       [ 3.52611653,  7.05729893],
       [ 3.11292329,  6.90333824],
       [ 3.0245853 ,  7.33470118],
       [ 3.23177301,  7.11685977],
       [ 3.07749271,  6.8955104 ],
       [ 3.15926284,  7.40245338],
       [ 3.65853419,  6.95601606],
       [ 2.92550594,  7.15599989],
       [ 3.11705076,  7.06556082],
       [ 2.74783644,  6.95333989],
       [ 2.7034372 ,  7.23173858],
       [ 3.00732531,  6.7380642 ],
       [ 3.29325151,  7.41437819],
       [ 3.49797518,  6.909729  ],
       [ 3.09591001,  6.72645969],
       [ 3.00896909,  6.41479346],
       [ 3.3682527 ,  6.59713443],
       [ 3.4032807 ,  7.28270365]]),
 array([[-3.47245889,  2.03520442],
       [-3.13337199,  1.31342035],
       [-3.21986356,  2.108664  ],
       [-3.17934404,  1.9791999 ],
       [-3.58053078,  1.99544435],
       [-3.56222564,  2.39310111],
       [-3.14055209,  1.76980544],
       [-3.22671813,  1.90827485],
       [-3.30116133,  2.27413584],
       [-3.14495889,  1.79469924],
       [-3.2159128 ,  2.02939127],
       [-3.58712181,  1.79704942]]),
 array([[  2.75922404,   9.82252372],
       [  2.3139734 ,   9.74584861],
       [  2.18467774,   9.81124373],
       [  2.52380824,  10.02977778],
       [  2.17961369,   9.81859972],
       [  2.08271347,   9.88393859],
       [  2.39295858,  10.21920769],
       [  2.40887823,   9.57553507],
       [  2.64883964,  10.07750906],
       [  2.28470621,  10.13549784],
       [  3.12158256,   9.97084907],
       [  3.07103531,  10.4704329 ],
       [  3.04343441,  10.1000064 ],
       [  2.98294705,   9.99884389],
       [  3.20400656,  10.47465276],
       [  3.45276484,  10.22594419],
       [  2.97696155,   9.95116191],
       [  3.15047094,   9.65119827]]),
 array([[-4.64433996,  5.27142362],
       [-5.34634873,  4.89349189],
       [-5.18540189,  6.38705495],
       [-5.31460217,  5.32238062],
       [-5.30185408,  7.15571745],
       [-4.54760461,  6.00854048],
       [-4.96428059,  4.85107797]]),
 array([[-4.83570808,  2.29719545],
       [-4.90612646,  2.12002173],
       [-4.72860391,  2.04206386],
       [-5.27840819,  1.82161285],
       [-5.08401073,  2.0419549 ],
       [-5.07549237,  2.41112785],
       [-5.34833973,  2.3797585 ],
       [-4.87101441,  2.05138828],
       [-4.77950279,  2.39399703],
       [-5.28731442,  2.25034938]]),
 array([[-4.98626636, -1.89088356],
       [-4.786391  , -1.28143804]]),
 array([[-4.8842357 ,  3.62849661],
       [-4.59304761,  2.93016854],
       [-4.99435211,  3.60462994],
       [-5.019574  ,  3.85115844],
       [-4.82434246,  2.87429971],
       [-5.06589621,  3.9640957 ],
       [-4.80680415,  3.3164272 ],
       [-4.94313435,  3.83922991],
       [-4.77991258,  3.0603488 ]]),
 array([[  1.61173467,   9.95000258],
       [  1.80272473,  10.17155543],
       [  1.59741116,   9.58322703],
       [  1.5942076 ,  10.47434604],
       [  1.12672866,  10.44684941],
       [  1.84158272,  10.33341832],
       [  0.01627554,  10.2481508 ],
       [  1.21922788,   9.97190406],
       [  0.4947887 ,  10.08214329],
       [  1.01783161,   9.87552869],
       [  0.68885269,  10.05567721],
       [  0.76107326,   9.83723146],
       [  0.60688   ,   9.82376979],
       [  1.20836697,  10.00736694],
       [  1.13194967,   9.75365396]]),
 array([[-1.83420013,  8.41380829],
       [-1.79689792,  8.55378705],
       [-1.96397555,  8.66004631],
       [-2.33536095,  8.66065374]]),
 array([[-2.81844067,  2.04388705],
       [-2.55025048,  2.22325261],
       [-2.96841403,  1.94929658],
       [-2.85475116,  1.97013862],
       [-2.51571827,  1.65281783],
       [-2.31088815,  1.59751511],
       [-2.56787069,  2.005329  ],
       [-2.35943764,  2.16084927],
       [-2.91066039,  2.07297591],
       [-2.70256397,  1.92951459],
       [-2.53883426,  1.97937212],
       [-2.67509913,  2.31515326],
       [-2.54010794,  2.09783929],
       [-2.73371426,  2.11254948],
       [-2.94369471,  2.12891159],
       [-2.9666015 ,  1.93497853],
       [-2.85813906,  2.17027856],
       [-2.72886013,  2.38663958],
       [-2.71172757,  2.2618372 ],
       [-2.84180167,  2.07595954],
       [-2.90200636,  1.84093061]]),
 array([[-2.03939773,  7.14555792],
       [-2.04516803,  7.41250095],
       [-2.00989028,  6.99245851],
       [-2.15000764,  7.27013382],
       [-1.95836298,  7.02755084],
       [-1.51806331,  7.32512589],
       [-2.22020752,  7.70891907],
       [-2.0464847 ,  7.92708097],
       [-1.86709163,  7.07750016],
       [-1.83866966,  7.94374372],
       [-2.33777777,  7.83362974],
       [-1.97250394,  7.60289793],
       [-2.27421551,  7.19660075],
       [-2.24084089,  7.6045606 ],
       [-2.29499495,  7.00661758],
       [-1.94657533,  7.09857624],
       [-2.5636781 ,  7.15256652]]),
 array([[-1.60992514,  4.98810434],
       [-1.62921441,  5.7073938 ],
       [-2.28827001,  4.72750525],
       [-1.51781975,  5.67309127]]),
 array([[-5.07099833,  0.49279954],
       [-5.35266205,  0.77689589],
       [-4.8583785 ,  1.14855069],
       [-4.79500649,  0.178925  ],
       [-5.10190488,  0.94444428]])]

In [38]:
k_rounds['total time'] = list()


for i in range(numRounds):
    k_times=k_rounds['times'][i]
    k_total = np.sum(np.array(k_times))
    k_rounds['total time'].append(k_total)

k_rounds['time mean'] = np.mean(np.array(k_rounds['total time']))
k_rounds['time variance'] = np.var(np.array(k_rounds['total time']))

k_rounds['time best']=np.min(np.array(k_rounds['total time']))
k_rounds['time worst']=np.max(np.array(k_rounds['total time']))

print 'Total execution times'
print 'mean\t\tvariance\t\tbest\t\tworst'
print k_rounds['time mean'],'\t',k_rounds['time variance'],'\t',k_rounds['time best'],'\t',k_rounds['time worst']


Total execution times
mean		variance		best		worst
3.5615876 	0.00744403803544 	3.502209 	3.731282

In [30]:
k_score=DaviesBouldin.DaviesBouldin(mixture,k_rounds['centroids'][i][j],k_rounds['assignment'][i][j])
print k_score.eval()
print numInits
print type(k_score.eval())
print numInits*64/1024


126.45214826
575
<type 'numpy.float64'>
35

In [52]:
#compute fitnesses
k_rounds['fitness']=[np.inf]*numRounds
k_rounds['fitness best index']=[None]*numRounds

k_bestScore=np.inf
for i in range(numRounds):
    print 'round ',i
    init_scores=list()
    k_best=np.inf
    for j in range(numInits):
        k_score=DaviesBouldin.DaviesBouldin(mixture,k_rounds['centroids'][i][j],k_rounds['assignment'][i][j])
        s=k_score.eval()
        if s<k_rounds['fitness'][i]:
        	k_rounds['fitness'][i]=s
        	k_rounds['fitness best index'][i]=j


round  0
round  1
round  2
round  3
round  4

In [55]:
print k_rounds['fitness']
print k_rounds['fitness best index']


[103.17556102215431, 102.77272612874017, 102.37514054643245, 102.77272612874017, 99.158549595131063]
[5, 560, 340, 289, 162]

In [21]:
k_rounds['fitness']=list()
start=datetime.now()
k_bestScore=np.inf
for i in range(numRounds):
    print 'round ',i
    init_scores=list()
    for j in range(numInits):
        # compute cluster fitness
        k_score=DaviesBouldin.DaviesBouldin(mixture,k_rounds['centroids'][i][j],k_rounds['assignment'][i][j])
        init_scores.append(k_score.eval())
    k_rounds['fitness'].append(np.array(init_scores))
    m=np.min(init_scores)
    if m < k_bestScore:
        k_bestScore=m
        k_rounds['best fit round']=i

print 'done!',(datetime.now() - start).total_seconds()


round  0
round  1
round  2
round  3
round  4
done! 27.921954

In [48]:
k_rounds['fitness best']=list()
k_rounds['fitness worst']=list()
k_rounds['fitness mean']=list()
k_rounds['fitness variance']=list()

for fit in k_rounds['fitness']:
    k_rounds['fitness best'].append(np.min(fit))
    k_rounds['fitness worst'].append(np.max(fit))
    k_rounds['fitness mean'].append(np.mean(fit))
    k_rounds['fitness variance'].append(np.var(fit))

for i in range(len(k_rounds['fitness'])):
    print k_rounds['fitness best'][i],k_rounds['fitness worst'][i],k_rounds['fitness mean'][i],k_rounds['fitness variance'][i]
print np.mean(k_rounds['fitness best']),np.mean(k_rounds['fitness worst']),np.mean(k_rounds['fitness mean']),np.mean(k_rounds['fitness variance'])


103.175561022 175.637446912 131.810391378 199.521647599
102.772726129 178.375147637 131.23130278 190.13815772
102.375140546 184.108922173 130.285954562 214.903359008
102.772726129 176.299144868 130.284179451 196.303710508
99.1585495951 174.142560159 129.727510695 174.485941501
102.050940684 177.71264435 130.667867773 195.070563267

In [25]:
k_rounds.keys()
#choose only best/worst assignment
#choose only best/worst centroid


Out[25]:
['slowest round',
 'best fit round',
 'total time',
 'assignment',
 'fastest round',
 'centroids',
 'times',
 'fitness',
 'assignedData']

In [30]:
# save
output = open('k_ass.pkl', 'wb')

pickle.dump(k_rounds['centroids'], output)
output.close()

Comparison

Accuracy


In [54]:
# print DB scores
print "Davies-Bouldin cluster fitness (less is better)"
qk_best=np.inf
for i in range(numRounds):
    m=np.min(qk_rounds['best evolution'][i])
    if m < qk_best:
        qk_best = m
print "QK-Means cluster fitness (DB score):\t",qk_best
print "K-Means cluster fitness (DB score):\t",np.min(k_rounds['fitness'])


Davies-Bouldin cluster fitness (less is better)
QK-Means cluster fitness (DB score):	65.6735103941
K-Means cluster fitness (DB score):	99.1585495951

Performance


In [182]:
print 'Total execution times'
print '\t\tmean\t\tvariance\t\tbest\t\tworst'
print 'K-Means:\t',k_time_mean,'\t',k_time_var,'\t',k_time_best,'\t',k_time_worst
print 'QK-Means:\t',qk_time_mean,'\t',qk_time_var,'\t\t',qk_time_best,'\t',qk_time_worst


print "\nQK-Means / K-Means ratio:\t",qk_time_mean/k_time_mean


Total execution times
		mean		variance		best		worst
K-Means:	3.6853944 	0.00049971065344 	3.658593 	3.709445
QK-Means:	10.32754 	0.565094852961 		9.694789 	11.742673

QK-Means / K-Means ratio:	2.80228894905

In [ ]:
# display results
fig1=plt.figure()
ax1 = fig1.add_subplot(3,2,1, adjustable='box', aspect=0.5, position=[0.1,0.1, 1, 1])
ax2 = fig1.add_subplot(3,2,2,adjustable='box', aspect=0.5,position=[1.2,0.1, 1, 1])
ax3 = fig1.add_subplot(3,2,3,adjustable='box', aspect=0.5, position=[0.1,-1.2, 1, 1])

ax1.set_title("Gaussian mixture")
for i in range(0,len(gMix)):
    ax1.plot(gMix[i][:,0],gMix[i][:,1],'.')

ax2.set_title("QK-Means")
for i in range(0,numClusters):
    ax2.plot(qk_assignedData[i][:,0],qk_assignedData[i][:,1],'.')
    ax2.plot(qk_centroids[best][i][0],qk_centroids[best][i][1],'ko')

ax3.set_title("K-Means")
for i in range(0,numClusters):
    ax3.plot(k_assignedData[i][:,0],k_assignedData[i][:,1],'.')
    ax3.plot(k_centroids[i][0],k_centroids[i][1],'ko')     
    
fig1.savefig('out.png', dpi=600)

Conclusion

The results in [1] suggest QK-Means presents a moderate improvement on the accuracy of the results relative to K-Means, while the same algorithm in [3] doesn't show significant improvement. The results above suggest that QK-Means presents a marginal accuracy improvement but a big handicap concerning computation time, taking significantly more time than classical K-Means to compute the same number of K-Means executions (each with a 300 iteration limit and $1\times10^{-4}$ convergence tolerance). It should be noted, however, that the algorithm in [1,3] uses the two quantum genetic operations not implemented here. Plus, their application was in image segmentation, while here we're using data from a mixture of Gaussians.

In conclusion, QK-Means doesn't offer a good tradeoff between accuracy improvement and computation time.

References

[1] E. Casper, C. Hung, E. Jung, and M. Yang, “A Quantum-Modeled K-Means Clustering Algorithm for Multi-band Image Segmentation”

[2] W. Liu, H. Chen, Q. Yan, Z. Liu, J. Xu, and Y. Zheng, “A novel quantum-inspired evolutionary algorithm based on variable angle-distance rotation”

[3] E. Casper and C. Hung, “Quantum Modeled Clustering Algorithms for Image Segmentation”