Introduction to Machine Learning with scikit-learn

Lab 4: Clustering (extra material)

In this lab, we are going to implement the $k$-means algorithm using the functionalities from the numpy library. Let's load the usual libraries and generate some datapoints to work with.

Recap from the lecture

Remind that the $k$-means algorithm takes the number of classes $k$ as a parameter and, given a set of $n$ points $X = \left\{ x_1, x_2, \dots, x_n \right\}$, aims at finding a partition $S = \left\{ S_1, S_2, \dots, S_k\right\}$ of $X$ minimizing the within-cluster sum of squares, that is: $$\min_S \sum_{i = 1}^{k} \sum_{x \in S_i} \left\lVert x - \mu_i\right\rVert_2^2$$ where $$\mu_i = \dfrac{1}{|S_i|} \sum_{x \in S_i} x$$ is the centroid (mean) of points in $S_i$ and $|S_i|$ is the cardinality (number of elements) of $S_i$. Recall that $\mu_i$ is not necessarily in $X$.

As we saw during lecture 4, one of the ways to find the optimal partition $S$ is to iteratively apply following two steps:

  • Define the Voronoi diagram generated by the $\mu_i$s: $$S_i^t = \left\{ x_p \mid \left\lVert x - \mu_i^t\right\rVert \leq \left\lVert x - \mu_j^t\right\rVert, 1 \leq j \leq k \right\}$$
  • Update the centroid: $$\mu_i^{t+1} = \text{centroid}(S_i^t) = \dfrac{1}{|S_i^t|} \sum_{x \in S_i^t} x$$

for all $i \in \left\{ 1, \dots, k\right\}$.

Hence, all we need is a function that computes the centroid, and a function that will define the Voronoi diagram given centroids and a set of points.


In [1]:
%matplotlib inline
import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning)

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

In [3]:
# Initial data
n_samples = 1500
random_state = 8
X, y = make_blobs(n_samples=n_samples, centers=3, random_state=random_state)
plt.scatter(X[:, 0], X[:, 1], c=y)
plt.title("Initial data")


Out[3]:
<matplotlib.text.Text at 0x1107c5450>

Computing the centroids

In this section we're interested in computing the centroid $\mu$ of given a set of points $S$, that is: $$\mu = \text{centroid}(S) = \dfrac{1}{|S|} \sum_{x \in S} x$$ This step is easy if we use the numpy.mean function:


In [4]:
def compute_centroid(X):
    return np.mean(X, axis = 0)

Let us now test the function with the data we've generated above. First, we can select the points corresponding to the first blob this way:


In [5]:
X[y == 0]


Out[5]:
array([[  5.86749807,   8.17715188],
       [  5.61369982,   9.93295527],
       [  7.22508428,  10.44886194],
       [  6.51192277,   9.81342902],
       [  8.80002143,   8.54323521],
       [  7.95311372,   8.36897664],
       [  6.10846066,   8.23343995],
       [  8.33975514,   8.88814957],
       [  8.88361084,   9.01589874],
       [  7.03443241,   9.69736738],
       [  8.24840842,   8.2829127 ],
       [  8.95362407,   9.48903417],
       [  6.43491959,   9.97416094],
       [  7.26697254,   9.87045836],
       [  7.70035004,   8.34107187],
       [  7.26946955,  10.98320369],
       [  6.73982636,  10.11672427],
       [  7.78224005,   8.55083151],
       [  9.19377268,  10.26786712],
       [  7.75931516,   9.12963513],
       [  8.21079341,   9.62231376],
       [  7.27775514,  10.76824835],
       [  7.24831298,   8.09462266],
       [  7.98301059,   9.85753553],
       [  6.91511696,   8.64812384],
       [  6.15895483,   8.70208685],
       [  7.82958178,   8.89779121],
       [  9.68246216,   9.31157492],
       [  8.81875827,  11.0276883 ],
       [  8.85658233,   9.52204043],
       [  8.81439223,  11.14310633],
       [  7.82510107,   8.41865266],
       [  7.56833386,   9.32443309],
       [  6.25773234,   9.49643174],
       [  8.18240421,   8.16999978],
       [  6.07074949,   9.84640299],
       [  8.3865564 ,   8.82577962],
       [  6.02545232,   8.30623923],
       [  8.18904696,  10.59029235],
       [  9.94109903,   9.22395667],
       [  7.65102673,   9.73916943],
       [  7.21257564,   8.61737126],
       [  5.71705294,   8.807722  ],
       [  6.97422822,   9.55896816],
       [  7.72747059,   9.03918815],
       [  7.38017969,   8.53076823],
       [  7.45272055,   9.41241034],
       [  7.14424337,   8.88917421],
       [  8.60199727,   8.32658889],
       [  7.55901182,  10.5408694 ],
       [  5.59011036,   9.91187864],
       [  6.17476741,   9.21297745],
       [  8.22943304,   8.7633443 ],
       [  8.20420692,   9.31490436],
       [  6.48363666,   9.31032935],
       [  7.42382381,   8.67836021],
       [  8.87356881,   8.78077548],
       [  8.81545663,   8.76386046],
       [  6.75445054,   9.74531933],
       [  7.79924692,  10.59576952],
       [  7.93194883,   8.4956148 ],
       [  7.57818277,   9.58629233],
       [  7.62051584,   9.37144814],
       [  9.40172978,   9.55422366],
       [  5.80172048,   8.78872812],
       [  8.29364364,   8.68894154],
       [  5.6972016 ,   9.48859002],
       [  8.28800043,  10.03299053],
       [  7.24812018,   8.01536859],
       [  7.18655077,  10.62332336],
       [  7.68679657,   8.25440697],
       [  6.36025357,   8.95139607],
       [  5.25213781,  10.85569349],
       [  8.01188636,   9.65949146],
       [  7.3952387 ,   9.14472809],
       [  7.11253478,   9.43326814],
       [  7.57929695,   8.76723087],
       [  6.71580045,  11.4067804 ],
       [  7.3549378 ,   8.77233676],
       [  7.50743473,   8.12483885],
       [  6.13735746,   8.28756952],
       [  7.94156644,  10.17017101],
       [  7.78430314,   8.43080108],
       [  7.91450732,   7.63503217],
       [  8.05292335,   9.91808017],
       [  7.55401108,   8.73824671],
       [  7.59708771,  10.81966985],
       [  7.02046668,   7.94250971],
       [  6.93627387,  11.67724789],
       [  6.18359678,   9.28930581],
       [  8.81280422,   9.50861554],
       [  7.32252126,  10.45661816],
       [  7.16251356,   9.74878714],
       [  6.17357365,   8.73756621],
       [  7.27423265,   9.18459991],
       [  8.44413666,   9.11541139],
       [  7.00846618,   9.89991474],
       [  5.27801757,   8.93474119],
       [  7.74404021,   9.38819595],
       [  8.27458019,  10.19191091],
       [  6.88995751,   8.46277023],
       [  8.46110785,  10.0425341 ],
       [  7.29262774,   9.8777463 ],
       [  5.52568734,  10.34955505],
       [  6.27877237,   8.72186382],
       [  8.07925272,  10.58820106],
       [  5.07337492,  10.52482973],
       [  6.89228905,   8.60634293],
       [  8.32245091,   9.67819196],
       [  6.65877461,   8.86341665],
       [  8.81358318,  10.38255189],
       [  7.89765814,   8.21954764],
       [  6.65349775,   8.8489284 ],
       [  7.7540569 ,   8.74810005],
       [  7.82944816,   9.62627158],
       [  7.29586565,  10.81732727],
       [  7.32512157,  10.33290733],
       [  8.61648573,  10.43674965],
       [  8.65256502,   9.60771759],
       [  5.0540846 ,  10.0790099 ],
       [  7.57170458,  10.46617282],
       [  8.73635755,   8.12292025],
       [  6.6571269 ,   7.72756233],
       [  6.87586286,   9.69703111],
       [  5.48065555,   9.80060857],
       [  7.05791529,  11.32998042],
       [  7.46623189,  10.63891852],
       [  8.00236864,  10.1691733 ],
       [  7.28724996,   7.620998  ],
       [  4.78246728,   9.3505726 ],
       [  7.1441051 ,   8.95209685],
       [  9.1056183 ,   8.30095869],
       [  7.54107319,   8.99810576],
       [  8.03127486,   8.37747522],
       [  7.09476406,   8.32545411],
       [  8.01558389,  11.14297995],
       [  6.39692172,   9.5953361 ],
       [  7.7383098 ,   9.56236852],
       [  7.44636985,  11.43674954],
       [  6.53552749,   7.70694141],
       [  9.45620912,  10.19130574],
       [  5.50959122,  10.14965175],
       [  6.65589136,   8.01092583],
       [  6.35446918,  10.86403337],
       [ 10.02616584,   9.90386125],
       [  7.14832905,   8.17626052],
       [  9.03612343,   9.90465695],
       [ 10.52905575,   7.91319456],
       [  7.57135103,   8.96715396],
       [  8.07680763,  11.74579608],
       [  7.19768959,   9.7583634 ],
       [  6.87604738,   8.50538356],
       [  6.02937898,  10.31974057],
       [  5.32780588,   8.17800448],
       [  5.52161775,   7.98446372],
       [  8.06164078,   8.43736968],
       [  6.34102578,   9.70104785],
       [  9.4920814 ,  10.82670873],
       [  5.72609648,  10.12272295],
       [  7.14461054,  11.25463189],
       [  7.15702338,  10.01903004],
       [  8.35688933,   9.04398012],
       [  8.3134123 ,   9.98384599],
       [  7.82832936,   7.76864883],
       [  6.85769503,  10.30105929],
       [  7.10793298,   9.78290023],
       [  8.18514474,   9.51337773],
       [  8.36186598,   9.08958113],
       [  8.76643119,   9.30350321],
       [  7.65401084,   9.60854166],
       [  8.75894993,   8.37509669],
       [  8.22799869,   9.81331772],
       [  8.82217399,  10.76232162],
       [  8.38057385,   9.99772143],
       [  7.43947897,   8.84212625],
       [  7.84985542,   9.96050562],
       [  7.29433984,   9.79486468],
       [  7.49814373,   9.29677019],
       [  6.91712439,   9.03844694],
       [  7.40783871,   6.93633083],
       [  6.29950869,  10.4054317 ],
       [  7.76134085,  10.0125941 ],
       [  6.25817082,   9.79505477],
       [  7.90651941,   9.2800303 ],
       [  7.66640259,   9.68832676],
       [  6.8454895 ,   9.75330571],
       [  7.46270504,   9.59628078],
       [  7.19993299,   9.22336756],
       [  7.94310647,   8.20622208],
       [  9.17061801,  10.37690696],
       [  8.30125946,   9.66394123],
       [  8.62870479,  11.26355434],
       [  9.4035308 ,   8.09592099],
       [  8.48385454,   9.82747089],
       [  8.99334153,   9.7313491 ],
       [  6.87626434,   9.22600969],
       [  8.34361718,   8.7379773 ],
       [  6.13753411,   8.37052986],
       [  6.73725224,  10.25875604],
       [  8.56258198,   8.1218301 ],
       [  8.28827095,  10.71730803],
       [  7.09276152,  12.27174213],
       [  7.55018005,  10.24765744],
       [  6.23007684,   9.80825898],
       [  8.99558412,   9.04204388],
       [  6.7985853 ,  10.14111827],
       [  8.73576354,   9.27030526],
       [  5.97463973,  10.87350623],
       [  7.49985237,   9.55274284],
       [  7.6456284 ,   8.23667253],
       [  6.32477476,   9.31217741],
       [  5.30701022,   8.67438626],
       [  5.66020319,   9.18115053],
       [  6.50518921,   9.02053654],
       [  6.8117005 ,  10.8840413 ],
       [  9.10088858,   9.14807411],
       [  7.34011795,   9.95361126],
       [  6.97504162,   8.74549857],
       [  8.09074215,   9.563037  ],
       [  5.75779231,   9.56724547],
       [  7.67997215,  10.08109486],
       [  8.03935108,   7.92755419],
       [  8.83053242,   8.75564135],
       [  7.55303352,  11.85706105],
       [  5.81482514,  10.58252039],
       [  8.00899752,   8.85821063],
       [  7.86194063,   8.79650856],
       [  6.97646583,   9.5933716 ],
       [  8.23184757,  10.84769319],
       [  8.31793002,   8.12972705],
       [  7.27055806,   8.63310412],
       [  6.34526126,   8.70677779],
       [  7.37044118,  10.127437  ],
       [  6.47033778,   9.60275986],
       [  7.02037888,   9.25644063],
       [  7.08282798,   9.71921331],
       [  5.62803952,   9.77585443],
       [  5.55451795,   9.87089999],
       [  9.12495624,   9.38974078],
       [  6.09382282,   9.38044447],
       [ 10.01367527,  10.52089453],
       [  7.63944706,   9.9927659 ],
       [  5.17209648,  11.78064756],
       [  7.32740001,  11.34359125],
       [  6.43753124,  11.27852227],
       [  8.05333078,  10.85408107],
       [  7.3325525 ,  10.08095836],
       [  6.16527528,   9.52313438],
       [  7.32742154,   9.8367948 ],
       [  8.75972117,   9.12931148],
       [  8.3359495 ,   8.22438263],
       [  7.07978644,   7.81427747],
       [  6.46106837,   8.42025476],
       [  8.80996213,  11.9021701 ],
       [  7.31294296,   9.92166331],
       [  7.59635095,   8.0197955 ],
       [  7.40292703,   9.16217702],
       [  7.51392583,  10.21349043],
       [  7.30268004,  10.59578935],
       [  7.8601977 ,   9.21744175],
       [  8.00405631,  10.53695374],
       [  6.10071356,   9.29007704],
       [  8.63716909,   9.69466317],
       [  8.40464989,   9.34469563],
       [  6.93131701,   7.89424734],
       [  7.08412928,   8.92611116],
       [  6.48258907,   8.88264116],
       [  7.06929763,   9.52602329],
       [  7.33143393,  10.2434159 ],
       [  7.09022949,   8.57919798],
       [  7.4934131 ,  11.00892356],
       [  7.33490453,   9.63942395],
       [  8.38371969,  11.1685078 ],
       [  6.49333587,   8.84803646],
       [  8.80575263,   8.40993671],
       [  8.90432957,   9.13576844],
       [  8.39976462,   9.25682536],
       [  7.75430996,   8.99861425],
       [  9.42077337,   9.17028998],
       [  6.95941729,   8.59299994],
       [  7.72360261,   8.6863845 ],
       [  7.8440213 ,  10.29060403],
       [  7.65750146,  10.54387066],
       [  6.96107809,   8.60128017],
       [  6.34639658,   9.11232333],
       [  9.29972817,  10.57497215],
       [  7.30906108,   9.18330812],
       [  7.73244799,   8.86413953],
       [  8.06216064,   8.71384374],
       [  6.12052279,   9.64064767],
       [  6.09592905,   6.51928917],
       [  8.2634157 ,  10.34723435],
       [  6.70121041,   9.90173077],
       [  8.55712204,   9.64483943],
       [  7.34590229,   7.69737569],
       [  7.97296393,   9.30310758],
       [  9.56333648,   9.51592302],
       [  7.05045059,   8.73881402],
       [  8.91111219,   9.14933265],
       [  7.51463404,  10.14107588],
       [  8.65722566,  10.56735235],
       [  7.29725473,   9.99372025],
       [  7.32191612,   9.61074241],
       [  8.52991794,  11.01738261],
       [  8.81854851,   8.45809304],
       [  5.82259795,   8.88727231],
       [  6.6034662 ,   7.38488228],
       [  7.10217012,   7.53353145],
       [  6.02367285,   9.94584899],
       [  8.45623401,   9.74180439],
       [  8.72578696,  10.34691678],
       [  8.57037457,  10.8436764 ],
       [  6.5342397 ,   9.45532341],
       [  7.53228511,   9.87750292],
       [  7.22009836,   8.11418676],
       [  7.15013321,   9.52893935],
       [  8.32433102,   9.13308248],
       [  8.89464606,  10.29806397],
       [  6.5840208 ,   9.00325664],
       [  7.94623538,   8.02372264],
       [  5.05554768,  10.44791642],
       [  7.98813333,   9.78033562],
       [  7.89167201,   9.35952581],
       [  6.89078889,  10.61298902],
       [  7.09700433,  10.14491792],
       [  8.94302415,   8.40721689],
       [  8.20409105,   8.70191361],
       [  5.28435774,  10.16972385],
       [  7.58423725,  10.70124388],
       [  6.59682064,   8.02006939],
       [  7.34893012,  10.2179733 ],
       [  7.74337594,   8.58358823],
       [  5.49953213,   9.04384494],
       [  7.89064594,  11.46433679],
       [  6.89703841,   7.98081009],
       [  7.67537424,   9.58813598],
       [  6.79947484,   9.16651778],
       [  7.20507821,   9.1870119 ],
       [  6.96776805,  10.13415453],
       [  6.34681473,  10.33075642],
       [  7.97049001,   9.54112947],
       [  7.96568357,   9.63710299],
       [  7.84627926,   9.88690656],
       [  8.56033785,   8.14735787],
       [  7.35080962,   9.83764023],
       [  6.87770146,   9.1578621 ],
       [  7.41897854,   8.58961147],
       [  8.0174604 ,   9.39705584],
       [  6.70727256,  10.79793008],
       [  6.96767867,   8.9622523 ],
       [  6.62934234,   9.65357294],
       [  7.02289183,   8.21370526],
       [  6.55819206,   8.84793239],
       [  8.0059951 ,   9.53388389],
       [  8.13027583,   9.89484387],
       [  8.9367246 ,  10.09600581],
       [  7.24211001,   7.48506871],
       [  5.38997858,   9.04811016],
       [  7.13760133,   9.84345464],
       [  6.38092997,   9.19535531],
       [  6.91820916,   8.70961873],
       [  7.70476265,   8.75745029],
       [  6.19399963,   8.19786954],
       [  7.52411268,   7.75986   ],
       [  7.55850139,   9.74816337],
       [  7.92409607,   8.64463839],
       [  7.5557312 ,   8.7589447 ],
       [  7.02010507,  10.49638253],
       [  9.19242713,   8.68432901],
       [  7.3649051 ,   8.99859657],
       [  7.19333693,  10.48449688],
       [  7.42881352,   9.64787208],
       [  8.21042514,   7.85206962],
       [  7.61227907,   9.4463627 ],
       [  8.71912964,  10.09225846],
       [  7.32514301,   9.82541566],
       [  8.0114192 ,   9.20192994],
       [  9.81508557,   7.28998985],
       [  5.93435176,   9.47611734],
       [  6.79619695,   9.0174468 ],
       [  6.48739197,   8.83865311],
       [  8.83911486,   9.59935243],
       [  8.25581951,   6.19987347],
       [  8.50049461,   9.12147855],
       [  6.66617978,  10.62116776],
       [  7.41546595,   9.72475837],
       [  8.33084899,   8.7268362 ],
       [  9.15668309,   9.59459888],
       [  8.25950027,   9.07881649],
       [  8.24576186,   8.38641674],
       [  6.51745192,   9.70095734],
       [  9.08364634,   8.57778018],
       [  7.73493228,   9.48194358],
       [  7.96481592,   8.03914659],
       [  6.13116568,   9.50155049],
       [  8.2098922 ,  10.4252942 ],
       [  8.47634027,  10.66146866],
       [  6.48465724,   8.82949947],
       [  8.16111464,   9.47672594],
       [  7.76843624,  10.31696893],
       [  8.98344372,   8.3719716 ],
       [  6.20431743,   8.96897748],
       [  7.5421396 ,   9.14622182],
       [  7.40581969,   9.53880365],
       [  5.67480711,   8.36601137],
       [  5.94205586,  10.50768333],
       [  8.45970869,  10.81917118],
       [  6.02826289,   9.36294602],
       [  7.58595983,   9.37886943],
       [  7.22095192,   8.06544414],
       [  8.56481423,   9.44492069],
       [  8.28175665,   9.02381056],
       [  6.88354578,   9.13348831],
       [  8.88715694,   9.69132974],
       [  7.60188708,  10.06902343],
       [  7.81572582,  10.58049713],
       [ 10.27954291,   9.82550566],
       [  7.73281164,  10.49095778],
       [  9.3938958 ,  11.21715689],
       [  7.10538552,   9.28955766],
       [  7.18432532,   9.31921992],
       [  7.43705216,  11.3923226 ],
       [  7.56475962,  11.24762868],
       [  5.68999163,   7.71646993],
       [  7.61342418,   8.78546818],
       [  7.5834752 ,   9.16847211],
       [  6.28516091,  11.28717687],
       [  8.50847946,   7.79973961],
       [  4.33366829,  10.51034676],
       [  8.03440506,   8.38029298],
       [  9.19642422,  11.57536954],
       [  6.41895897,  11.45135012],
       [  5.80712761,   9.95968551],
       [  7.13579057,   7.5892312 ],
       [  7.73004587,   6.86264178],
       [  6.95802459,   9.19924611],
       [  6.54466935,   8.69037242],
       [  9.56099673,  10.66290816],
       [  6.44628711,   8.7372871 ],
       [  9.15221864,   9.44444992],
       [  6.80837727,  10.32244031],
       [  6.1874664 ,   8.72464242],
       [  7.40133319,   9.39362814],
       [  6.79319218,   9.03679524],
       [  6.19731284,   9.43557599],
       [  6.26835927,   9.34765174],
       [  7.40314915,  10.42342437],
       [  9.67486753,   9.33811898],
       [  6.44917353,   8.99119393],
       [  8.67179956,  10.41538835],
       [  8.70688264,  10.68949521],
       [  9.51511734,   9.44096475],
       [  8.08034605,  10.02847377],
       [  8.76144386,   8.82718407],
       [  7.54257819,   7.02403019],
       [  8.43781917,   8.95695657],
       [  6.24679228,   8.48044506],
       [  7.6025272 ,   8.98962387],
       [  9.02255525,  10.06777901],
       [  7.95932316,  10.26579813],
       [  7.46996922,   8.44935323],
       [  6.19006108,  10.03696568],
       [  3.90912839,   8.10009397],
       [  7.40565933,   8.8292448 ],
       [  6.93618811,  10.32169218],
       [  7.51589909,   7.75608776],
       [  6.81586756,   9.96256781],
       [  8.32813617,   9.14002426],
       [  7.6120293 ,   9.02816922],
       [  8.20316159,  12.01375618],
       [  5.87915732,   9.74364732],
       [  6.17401094,   8.41749638],
       [  7.29803414,  10.26424512],
       [  9.95405977,   9.5177229 ],
       [  7.80361128,   9.74561264],
       [  6.58341965,   8.42678679],
       [  7.88357143,  10.93969515],
       [  8.66361434,  10.0421757 ],
       [  5.71557231,   9.7504382 ],
       [  7.81124436,  10.79251539],
       [  8.44653583,   8.1770972 ],
       [  7.39634594,   8.90196559],
       [  8.63564239,   8.67455727],
       [  8.19028019,   9.1921362 ],
       [  7.46074798,   9.7653785 ],
       [  7.63027116,   8.69797933],
       [  6.69074855,  11.95169837],
       [  8.2260031 ,   9.64027027],
       [  8.78218534,   7.69624737],
       [  5.91414651,   8.24443357],
       [  8.32837724,  10.25265363],
       [  6.93438565,   8.12540429],
       [  8.48269874,  11.59341452],
       [  6.69819462,  10.50100294],
       [  5.5987887 ,   7.59170022],
       [  8.86161133,   9.88867707],
       [  8.67727916,   8.46347336],
       [  7.42004913,  11.69702765],
       [  5.84561755,   8.40879925],
       [  7.73674097,  10.82855388]])

Now, we can use the compute_centroid function to compute the 3 centroids:


In [6]:
centroid_0 = compute_centroid(X[y == 0])
centroid_1 = compute_centroid(X[y == 1])
centroid_2 = compute_centroid(X[y == 2])

Each centroids is an array of size 2:


In [7]:
centroid_0


Out[7]:
array([ 7.47871593,  9.43027105])

Let's make sure the compute_centroid function function works properly by visualizing the centroids of the blobs in our dataset.


In [8]:
# Plot the data
plt.scatter(X[:, 0], X[:, 1], c=y)
# Plot the centroids
plt.scatter(centroid_0[0], centroid_0[1], c = 'm', s = 50)
plt.scatter(centroid_1[0], centroid_1[1], c = 'm', s = 50)
plt.scatter(centroid_2[0], centroid_2[1], c = 'm', s = 50)
plt.title("Initial data with centroids")


Out[8]:
<matplotlib.text.Text at 0x11099a350>

Computing the Voronoi partition

This steps consists in computing the Voronoi partition of a set of points given a set of centroids. $$S_i = \left\{ x \mid \left\lVert x - \mu_i^t\right\rVert \leq \left\lVert x - \mu_j^t\right\rVert, 1 \leq j \leq k \right\}$$


In [9]:
def voronoi_partition(X, centroids):
    labels = np.zeros(X.shape[0])
    idx = 0
    for x in X:
        smallest_distance = 500 # 500 is just some big number
        idx_centroid = 0
        for centroid in centroids:
            distance = np.linalg.norm(x-centroid)
            if distance < smallest_distance:
                smallest_distance = distance
                labels[idx] = idx_centroid
            idx_centroid += 1
        idx += 1
    return labels

In [10]:
centroids = [centroid_0, centroid_1, centroid_2]
labels = voronoi_partition(X, centroids)

We can compare the labels returned by the Voronoi partition of the dataset given the actual centroids. We should expect no difference between y and labels and this is indeed the case:


In [11]:
np.any(y - labels)


Out[11]:
False

Now we have the two building blocks of the $k$-means algorithm. As an exercice, combine them to obtain a complete version of the algorithm. As an initialization, you can either take $k$ random points in the feature space, or $k$ points randomly chosen within the dataset.