Machine Intelligence II - Team MensaNord

Sheet 11

  • Nikolai Zaki
  • Alexander Moore
  • Johannes Rieke
  • Georg Hoelger
  • Oliver Atanaszov

In [16]:
from __future__ import division, print_function
import matplotlib.pyplot as plt
%matplotlib inline
import scipy.stats
import numpy as np
from scipy.ndimage import imread
import sys

Exercise 1

  • Load the data into a vector and normalize it such that the values are between 0 and 1.
  • Create two new datasets by adding Gaussian noise with zero mean and standard deviation σ N ∈ {0.05, 0.1}.

In [2]:
# import image
img_orig = imread('testimg.jpg').flatten()
print("$img_orig")
print("shape: \t\t", img_orig.shape) # = vector
print("values: \t from ", img_orig.min(), " to ", img_orig.max(), "\n")

# "img" holds 3 vectors
img = np.zeros((3,img_orig.shape[0]))
print("$img")
print("shape: \t\t",img.shape)

std = [0, 0.05, 0.1]
for i in range(img.shape[1]):
    # normalize => img[0]
    img[0][i] = img_orig[i] / 255
    # gaussian noise => img[1] img[2]
    img[1][i] = img[0][i] + np.random.normal(0, std[1])
    img[2][i] = img[0][i] + np.random.normal(0, std[2])
    
print(img[:, 0:4])


$img_orig
shape: 		 (177500,)
values: 	 from  0  to  255 

$img
shape: 		 (3, 177500)
[[ 0.63529412  0.63137255  0.62745098  0.63137255]
 [ 0.67858136  0.63816685  0.69517821  0.6407592 ]
 [ 0.56691858  0.57284359  0.62174947  0.46852629]]
  • Create a figure showing the 3 histograms (original & 2 sets of noise corrupted data – use enough bins!). In an additional figure, show the three corresponding empirical distribution functions in one plot.

In [3]:
# histograms
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
for i, ax in enumerate(axes.flatten()):
    plt.sca(ax)
    plt.hist(img[i], 100, normed=1, alpha=0.75)
    plt.xlim(-0.1, 1.1)
    plt.ylim(0, 18)
    plt.xlabel("value")
    plt.ylabel("probability")
    plt.title('img[{}]'.format(i))



In [4]:
# divide probablity space in 100 bins
nbins = 100
bins = np.linspace(0, 1, nbins+1)

# holds data equivalent to shown histograms (but cutted from 0 to 1)
elementsPerBin = np.zeros((3,nbins))
for i in range(3):
    ind = np.digitize(img[i], bins)
    elementsPerBin[i] = [len(img[i][ind == j]) for j in range(nbins)]
    
# counts number of elements from bin '0' to bin 'j'
sumUptoBinJ = np.asarray([[0 for i in range(nbins)] for i in range(3)])
for i in range(3):
    for j in range(nbins):
        sumUptoBinJ[i][j] = np.sum(elementsPerBin[i][0:j+1])
    
# plot
plt.figure(figsize=(15, 5))
for i in range(3):
    plt.plot(sumUptoBinJ[i], '.-')
plt.legend(['img[0]', 'img[1]', 'img[2]'])
plt.xlabel('bin')
plt.ylabel('empirical distribution functions');


  • Take a subset of P = 100 observations and estimate the probability density p̂ of intensities with a rectangular kernel (“gliding window”) parametrized by window width h.
  • Plot the estimates p̂ resulting for (e.g. 10) different samples of size P

In [5]:
def H(vec, h):
    """
    (rectangular) histogram kernel function
    """
    vec = np.asarray(vec)
    return np.asarray([1 if abs(x)<.5 else 0 for x in vec])

$P(\underline{x}) = \frac{1}{h^n} \frac{1}{p} \Sigma_{\alpha=1}^{p} H(\frac{\underline{x} - \underline{x}^{(\alpha)}}{h})$


In [6]:
def P_est(x, h, data, kernel = H):
    """
    returns the probability that data contains values @ (x +- h/2)
    """
    n = 1 #= data.shape[1] #number of dimensions (for multidmensional data)
    p = len(data)
    return 1/(h**n)/p*np.sum(kernel((data - x)/h, h))

In [7]:
# take 10 data sets with 100 observations (indexes 100k to 101k)
# nomenclature: data_3(3, 10, 100) holds 3 times data(10, 100)
P = 100
offset = int(100000)
data_3 = np.zeros((3, 10,P))
for j in range(3):
    for i in range(10):
        data_3[j][i] = img[j][offset+i*P:offset+(i+1)*P]
print(data_3.shape)


(3, 10, 100)

In [8]:
# calculate probability estimation for (center +- h/2) on the 10 data sets
h = .15
nCenters = 101
Centers = np.linspace(0,1,nCenters)

fig, ax = plt.subplots(2,5,figsize=(15,6))
ax = ax.ravel()
for i in range(10):
    ax[i].plot([P_est(center,h,data_3[0][i]) for center in Centers])


  • Calculate the negative log-likelihood per datapoint of your estimator using 5000 samples from the data not used for the density estimation (i.e. the “test-set”). Get the average of the negative log-likelihood over the 10 samples.

$P(\{\underline{x}^{(\alpha)}\};\underline{w}) = - \Sigma_{\alpha=1}^{p} ln P(\underline{x}^{(\alpha)};\underline{w})$


In [9]:
testdata = img[0][50000:55000]

# calculate average negative log likelihood for 
def avg_NegLL(data, h, kernel=H):
    sys.stdout.write(".")
    average = 0
    for i in range(10):
        L_prob = [np.log(P_est(x,h,data[i],kernel)) for x in testdata]
        negLL = -1*np.sum(L_prob)
        average += negLL
    average /= 10
    return average

2) Repeat this procedure (without plotting) for a sequence of kernel widths h to get the mean log likelihood (averaged over the different samples) resulting for each value of h.

(a) Apply this procedure to all 3 datasets (original and the two noise-corruped ones) to make a plot showing the obtained likelihoods (y-axis) vs. kernel width h (x-axis) as one line for each dataset.


In [10]:
hs = np.linspace(0.001, 0.999, 20)

def plot_negLL(data_3=data_3, kernel=H):
    fig = plt.figure(figsize=(12,8))
    for j in range(3):
        print("calc data[{}]".format(j))
        LLs = [avg_NegLL(data_3[j],h,kernel=kernel) for h in hs]
        plt.plot(hs,LLs)
        print()
    plt.legend(['img[0]', 'img[1]', 'img[2]'])
    plt.show()

In [18]:
plot_negLL()


calc data[0]
.
/home/georg/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:8: RuntimeWarning: divide by zero encountered in log
...................
calc data[1]
....................
calc data[2]
....................

not plotted points have value = inf because:

$negLL = - log( \Pi_\alpha P(x^\alpha,w) )$

so if one single $P(x^\alpha,w) = 0$ occurs (x has 5000 elements)

the result is -log(0)=inf (not defined)

this only occurs with the histogram kernel.

(b) Repeat the previous step (LL & plot) for samples of size P = 500.


In [17]:
P = 500
data_3b = np.zeros((3, 10,P))
for j in range(3):
    for i in range(10):
        data_3b[j][i] = img[j][offset+i*P:offset+(i+1)*P]
        
plot_negLL(data_3=data_3b)


calc data[0]
.
/home/georg/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:8: RuntimeWarning: divide by zero encountered in log
...................
calc data[1]
....................
calc data[2]
....................

(c) Repeat the previous steps (a & b) for the Gaussian kernel with σ^2 = h.


In [12]:
def Gaussian(x,h):
    """
    gaussian kernel function
    """

    return np.exp(-x**2/h/2)/np.sqrt(2*np.pi*h)

In [13]:
fig, ax = plt.subplots(2,5,figsize=(15,6))
h = .15

ax = ax.ravel()
for i in range(10):
    ax[i].plot([P_est(center,h,data_3[0][i],kernel=Gaussian) for center in Centers])



In [13]:
hs = np.linspace(0.001, 0.4, 20)
plot_negLL(kernel=Gaussian)


calc data[0]
.
/home/georg/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:8: RuntimeWarning: divide by zero encountered in log
...................
calc data[1]
....................
calc data[2]
....................

In [22]:
plot_negLL(data_3=data_3b, kernel=Gaussian)


calc data[0]
.
/home/georg/anaconda3/lib/python3.6/site-packages/ipykernel/__main__.py:8: RuntimeWarning: divide by zero encountered in log
...................
calc data[1]
....................
calc data[2]
....................

Exercise 2

1.1 Create dataset


In [17]:
M = 2
w1, w2 = [2,2], [1,1]  # means
sigma2 = 0.2  # standard deviations
N = 100
P1, P2 = 2/3, 1/3

def create_data(sigma1=0.7):
    X = np.zeros((N, 2))
    which_gaussian = np.zeros(N)

    for n in range(N):
        if np.random.rand() < P1:  # sample from first Gaussian
            X[n] = np.random.multivariate_normal(w1, np.eye(len(w1)) * sigma1**2)
            which_gaussian[n] = 0
        else:  # sample from second Gaussian
            X[n] = np.random.multivariate_normal(w2, np.eye(len(w2)) * sigma2**2)
            which_gaussian[n] = 1
            
    return X, which_gaussian

sigma1 = 0.7
X, which_gaussian = create_data(sigma1)

In [18]:
def plot_data(X, which_gaussian, centers, stds):
    plt.scatter(*X[which_gaussian == 0].T, c='r', label='Cluster 1')
    plt.scatter(*X[which_gaussian == 1].T, c='b', label='Cluster 2')
    plt.plot(centers[0][0], centers[0][1], 'k+', markersize=15, label='Centers')
    plt.plot(centers[1][0], centers[1][1], 'k+', markersize=15)
    plt.gca().add_artist(plt.Circle(centers[0], stds[0], ec='k', fc='none'))
    plt.gca().add_artist(plt.Circle(centers[1], stds[1], ec='k', fc='none'))
    plt.xlabel('x1')
    plt.ylabel('x2')
    plt.legend()
    
plot_data(X, which_gaussian, [w1, w2], [sigma1, sigma2])
plt.title('Ground truth')


Out[18]:
<matplotlib.text.Text at 0x7f1ea7ddc2b0>

1.2 Run Expectation-Maximization algorithm

See slide 18 of the lecture for an outline of the algorithm.


In [19]:
from scipy.stats import multivariate_normal

def variance(X):
    """Calculate a single variance value for the vectors in X."""
    mu = X.mean(axis=0)
    return np.mean([np.linalg.norm(x - mu)**2 for x in X])

def run_expectation_maximization(X, w=None, sigma_squared=None, verbose=False):
    # Initialization.
    P_prior = np.ones(2) * 1 / M
    P_likelihood = np.zeros((N, M))
    P_posterior = np.zeros((M, N))
    
    mu = X.mean(axis=0)  # mean of the original data
    var = variance(X)  # variance of the original data

    if w is None:
        w = np.array([mu + np.random.rand(M) - 0.5, mu + np.random.rand(M) - 0.5])

    if sigma_squared is None:
        sigma_squared = np.array([var + np.random.rand() - 0.5,var + np.random.rand() - 0.5])
        #sigma_squared = np.array([var, var])

    if verbose:
        print('Initial centers:', w)
        print('Initial variances:', sigma_squared)
        print()
        print()


    theta = 0.001
    distance = np.inf
    step = 0


    # Optimization loop.
    while distance > theta:
    #for i in range(1):
        step += 1
        if verbose:
            print('Step', step)
            print('-'*50)

        # Store old parameter values to calculate distance later on.
        w_old = w.copy()
        sigma_squared_old = sigma_squared.copy()
        P_prior_old = P_prior.copy()

        if verbose:
            print('Distances of X[0] to proposed centers:', np.linalg.norm(X[0] - w[0]), np.linalg.norm(X[0] - w[1]))

        # E-Step: Calculate likelihood for each data point.
        for (alpha, q), _ in np.ndenumerate(P_likelihood):
            P_likelihood[alpha, q] = multivariate_normal.pdf(X[alpha], w[q], sigma_squared[q])

        if verbose:
            print('Likelihoods of X[0]:', P_likelihood[0])

        # E-Step: Calculate assignment probabilities (posterior) for each data point.
        for (q, alpha), _ in np.ndenumerate(P_posterior):
            P_posterior[q, alpha] = (P_likelihood[alpha, q] * P_prior[q]) / np.sum([P_likelihood[alpha, r] * P_prior[r] for r in range(M)])

        if verbose:
            print('Assignment probabilities of X[0]:', P_posterior[:, 0])
            print()

        distance = 0
        # M-Step: Calculate new parameter values.
        for q in range(M):
            w[q] = np.sum([P_posterior[q, alpha] * X[alpha] for alpha in range(N)], axis=0) / np.sum(P_posterior[q])
            #print(np.sum([P_posterior[q, alpha] * X[alpha] for alpha in range(N)], axis=0))
            #print(np.sum(P_posterior[q]))
            w_distance = np.linalg.norm(w[q] - w_old[q])
            if verbose:
                print('Distance of centers:', w_distance)
            distance = max(distance, w_distance)

            sigma_squared[q] = 1 / M * np.sum([np.linalg.norm(X[alpha] - w_old[q])**2 * P_posterior[q, alpha] for alpha in range(N)]) / np.sum(P_posterior[q])
            sigma_squared_distance = np.abs(sigma_squared[q] - sigma_squared_old[q])
            if verbose:
                print('Distance of variances:', sigma_squared_distance)
            distance = max(distance, sigma_squared_distance)

            P_prior[q] = np.mean(P_posterior[q])
            P_prior_distance = np.abs(P_prior[q] - P_prior_old[q])
            if verbose:
                print('Distance of priors:', P_prior_distance)
            distance = max(distance, P_prior_distance)


        if verbose:
            print('Maximum distance:', distance)

            print()
            print('New centers:', w)
            print('New variances:', sigma_squared)
            print('New priors:', P_prior)

            print('='*50)
            print()

    which_gaussian_EM = P_posterior.argmax(axis=0)
    
    return which_gaussian_EM, w, np.sqrt(sigma_squared), step

which_gaussian_em, cluster_centers_em, cluster_stds_em, num_steps_em = run_expectation_maximization(X, verbose=True)


Initial centers: [[ 1.53518244  1.92693386]
 [ 1.49432597  1.95412241]]
Initial variances: [ 1.20752752  1.40122925]


Step 1
--------------------------------------------------
Distances of X[0] to proposed centers: 1.07545424393 1.11929886145
Likelihoods of X[0]: [ 0.08164578  0.07263752]
Assignment probabilities of X[0]: [ 0.52919388  0.47080612]

Distance of centers: 0.280723619496
Distance of variances: 0.65552846138
Distance of priors: 0.0234334427061
Distance of centers: 0.321569568555
Distance of variances: 0.805770461625
Distance of priors: 0.0234334427061
Maximum distance: 0.805770461625

New centers: [[ 1.68565079  1.68994225]
 [ 1.69703919  1.70449401]]
New variances: [ 0.55199906  0.59545879]
New priors: [ 0.52343344  0.47656656]
==================================================

Step 2
--------------------------------------------------
Distances of X[0] to proposed centers: 0.925878542461 0.912824005713
Likelihoods of X[0]: [ 0.13263358  0.13277191]
Assignment probabilities of X[0]: [ 0.52317341  0.47682659]

Distance of centers: 0.0192030419474
Distance of variances: 0.0473781660305
Distance of priors: 0.00150350251231
Distance of centers: 0.0212764995336
Distance of variances: 0.0431780185607
Distance of priors: 0.00150350251231
Maximum distance: 0.0473781660305

New centers: [[ 1.67119465  1.67730196]
 [ 1.71304903  1.71850737]]
New variances: [ 0.50462089  0.55228077]
New priors: [ 0.52493695  0.47506305]
==================================================

Step 3
--------------------------------------------------
Distances of X[0] to proposed centers: 0.941885882031 0.895414167829
Likelihoods of X[0]: [ 0.13094807  0.13945031]
Assignment probabilities of X[0]: [ 0.50922972  0.49077028]

Distance of centers: 0.034645355738
Distance of variances: 0.0156526663467
Distance of priors: 8.38670160043e-05
Distance of centers: 0.0383057699216
Distance of variances: 0.0146373996722
Distance of priors: 8.38670160042e-05
Maximum distance: 0.0383057699216

New centers: [[ 1.644676    1.65500708]
 [ 1.74236895  1.74315842]]
New variances: [ 0.48896822  0.56691817]
New priors: [ 0.52502081  0.47497919]
==================================================

Step 4
--------------------------------------------------
Distances of X[0] to proposed centers: 0.971410487325 0.86405440215
Likelihoods of X[0]: [ 0.12401567  0.14532242]
Assignment probabilities of X[0]: [ 0.48540913  0.51459087]

Distance of centers: 0.0622563380927
Distance of variances: 0.0326395643517
Distance of priors: 0.00067057867809
Distance of centers: 0.0691865901543
Distance of variances: 0.0258015361643
Distance of priors: 0.000670578678089
Maximum distance: 0.0691865901543

New centers: [[ 1.5969027   1.61508762]
 [ 1.79545575  1.78752706]]
New variances: [ 0.45632866  0.59271971]
New priors: [ 0.52569139  0.47430861]
==================================================

Step 5
--------------------------------------------------
Distances of X[0] to proposed centers: 1.02536198356 0.808804573737
Likelihoods of X[0]: [ 0.11021528  0.15463673]
Assignment probabilities of X[0]: [ 0.44132485  0.55867515]

Distance of centers: 0.0993497657434
Distance of variances: 0.0535732256338
Distance of priors: 0.00372595132951
Distance of centers: 0.113852319876
Distance of variances: 0.0271191988279
Distance of priors: 0.00372595132951
Maximum distance: 0.113852319876

New centers: [[ 1.5200144   1.55217049]
 [ 1.88352912  1.85967574]]
New variances: [ 0.40275543  0.61983891]
New priors: [ 0.52941734  0.47058266]
==================================================

Step 6
--------------------------------------------------
Distances of X[0] to proposed centers: 1.11370823607 0.722350442971
Likelihoods of X[0]: [ 0.08473076  0.16855566]
Assignment probabilities of X[0]: [ 0.361241  0.638759]

Distance of centers: 0.12330273969
Distance of variances: 0.0647816201387
Distance of priors: 0.0105383254449
Distance of centers: 0.155622432838
Distance of variances: 0.00276876124102
Distance of priors: 0.0105383254449
Maximum distance: 0.155622432838

New centers: [[ 1.42335168  1.47562061]
 [ 2.0053096   1.9565667 ]]
New variances: [ 0.33797381  0.61707015]
New priors: [ 0.53995567  0.46004433]
==================================================

Step 7
--------------------------------------------------
Distances of X[0] to proposed centers: 1.22647342252 0.616925347701
Likelihoods of X[0]: [ 0.05087084  0.18947564]
Assignment probabilities of X[0]: [ 0.23961227  0.76038773]

Distance of centers: 0.117359171367
Distance of variances: 0.0622659419178
Distance of priors: 0.0129591484737
Distance of centers: 0.167013470522
Distance of variances: 0.0543293144634
Distance of priors: 0.0129591484737
Maximum distance: 0.167013470522

New centers: [[ 1.33058698  1.4037326 ]
 [ 2.1369012   2.05941196]]
New variances: [ 0.27570787  0.56274083]
New priors: [ 0.55291482  0.44708518]
==================================================

Step 8
--------------------------------------------------
Distances of X[0] to proposed centers: 1.33596557602 0.530603647746
Likelihoods of X[0]: [ 0.02268094  0.22022801]
Assignment probabilities of X[0]: [ 0.11297731  0.88702269]

Distance of centers: 0.0924744944473
Distance of variances: 0.0511594059267
Distance of priors: 0.00368934491626
Distance of centers: 0.124728014526
Distance of variances: 0.0811092626758
Distance of priors: 0.00368934491626
Maximum distance: 0.124728014526

New centers: [[ 1.25693266  1.34781683]
 [ 2.23607009  2.1350599 ]]
New variances: [ 0.22454847  0.48163157]
New priors: [ 0.55660416  0.44339584]
==================================================

Step 9
--------------------------------------------------
Distances of X[0] to proposed centers: 1.42341825659 0.492476368895
Likelihoods of X[0]: [ 0.00778345  0.25689601]
Assignment probabilities of X[0]: [ 0.03664024  0.96335976]

Distance of centers: 0.0664779937534
Distance of variances: 0.0368096312509
Distance of priors: 0.0102343512425
Distance of centers: 0.0517668751858
Distance of variances: 0.0570055158742
Distance of priors: 0.0102343512425
Maximum distance: 0.0664779937534

New centers: [[ 1.20332784  1.30849958]
 [ 2.2785435   2.16465413]]
New variances: [ 0.18773883  0.42462605]
New priors: [ 0.54636981  0.45363019]
==================================================

Step 10
--------------------------------------------------
Distances of X[0] to proposed centers: 1.48698428474 0.483045152717
Likelihoods of X[0]: [ 0.00234845  0.28476773]
Assignment probabilities of X[0]: [ 0.0098352  0.9901648]

Distance of centers: 0.0500792209054
Distance of variances: 0.026396668762
Distance of priors: 0.0203731889162
Distance of centers: 0.00483099785673
Distance of variances: 0.0203735371012
Distance of priors: 0.0203731889162
Maximum distance: 0.0500792209054

New centers: [[ 1.16240658  1.27963128]
 [ 2.27773944  2.15989052]]
New variances: [ 0.16134217  0.40425252]
New priors: [ 0.52599662  0.47400338]
==================================================

Step 11
--------------------------------------------------
Distances of X[0] to proposed centers: 1.53528122455 0.480081979528
Likelihoods of X[0]: [ 0.00066331  0.29604931]
Assignment probabilities of X[0]: [ 0.00248012  0.99751988]

Distance of centers: 0.0438107206922
Distance of variances: 0.0221599650755
Distance of priors: 0.0255540486246
Distance of centers: 0.0289013131314
Distance of variances: 0.00222089911955
Distance of priors: 0.0255540486246
Maximum distance: 0.0438107206922

New centers: [[ 1.12684941  1.254037  ]
 [ 2.25630657  2.14050191]]
New variances: [ 0.1391822   0.40203162]
New priors: [ 0.50044257  0.49955743]
==================================================

Step 12
--------------------------------------------------
Distances of X[0] to proposed centers: 1.57751647562 0.481293448292
Likelihoods of X[0]: [  1.49856023e-04   2.96786123e-01]
Assignment probabilities of X[0]: [  5.05568280e-04   9.99494432e-01]

Distance of centers: 0.0412635766071
Distance of variances: 0.0205624170031
Distance of priors: 0.0269013447725
Distance of centers: 0.0362585917611
Distance of variances: 0.00314020266009
Distance of priors: 0.0269013447725
Maximum distance: 0.0412635766071

New centers: [[ 1.09403184  1.22902321]
 [ 2.22811169  2.11770424]]
New variances: [ 0.11861978  0.40517182]
New priors: [ 0.47354123  0.52645877]
==================================================

Step 13
--------------------------------------------------
Distances of X[0] to proposed centers: 1.61706946095 0.487228253126
Likelihoods of X[0]: [  2.19159834e-05   2.93058072e-01]
Assignment probabilities of X[0]: [  6.72622654e-05   9.99932738e-01]

Distance of centers: 0.0368826852498
Distance of variances: 0.0184880950663
Distance of priors: 0.0253403229755
Distance of centers: 0.0362227430727
Distance of variances: 0.00501397246481
Distance of priors: 0.0253403229755
Maximum distance: 0.0368826852498

New centers: [[ 1.06554237  1.20559922]
 [ 2.1991719   2.09591947]]
New variances: [ 0.10013169  0.41018579]
New priors: [ 0.4482009  0.5517991]
==================================================

Step 14
--------------------------------------------------
Distances of X[0] to proposed centers: 1.65209605115 0.496944791077
Likelihoods of X[0]: [  1.91505002e-06   2.87147530e-01]
Assignment probabilities of X[0]: [  5.41706976e-06   9.99994583e-01]

Distance of centers: 0.0289642968125
Distance of variances: 0.0151109615223
Distance of priors: 0.0219533207456
Distance of centers: 0.0339340503721
Distance of variances: 0.00719732945309
Distance of priors: 0.0219533207456
Maximum distance: 0.0339340503721

New centers: [[ 1.04528239  1.18489986]
 [ 2.17084755  2.07723122]]
New variances: [ 0.08502073  0.41738312]
New priors: [ 0.42624758  0.57375242]
==================================================

Step 15
--------------------------------------------------
Distances of X[0] to proposed centers: 1.67854869354 0.510163738383
Likelihoods of X[0]: [  1.19176868e-07   2.79176612e-01]
Assignment probabilities of X[0]: [  3.17139400e-07   9.99999683e-01]

Distance of centers: 0.0209165415285
Distance of variances: 0.0114332089595
Distance of priors: 0.0183674286834
Distance of centers: 0.031433896499
Distance of variances: 0.00928659756389
Distance of priors: 0.0183674286834
Maximum distance: 0.031433896499

New centers: [[ 1.0343967   1.16703919]
 [ 2.14343134  2.06185456]]
New variances: [ 0.07358752  0.42666972]
New priors: [ 0.40788016  0.59211984]
==================================================

Step 16
--------------------------------------------------
Distances of X[0] to proposed centers: 1.69533083432 0.526039807449
Likelihoods of X[0]: [  7.14136481e-09   2.69709870e-01]
Assignment probabilities of X[0]: [  1.82392649e-08   9.99999982e-01]

Distance of centers: 0.0155300585895
Distance of variances: 0.00806068710331
Distance of priors: 0.0150271189902
Distance of centers: 0.0268461582306
Distance of variances: 0.00836776468178
Distance of priors: 0.0150271189902
Maximum distance: 0.0268461582306

New centers: [[ 1.02837401  1.15272452]
 [ 2.11987928  2.04896982]]
New variances: [ 0.06552683  0.43503748]
New priors: [ 0.39285304  0.60714696]
==================================================

Step 17
--------------------------------------------------
Distances of X[0] to proposed centers: 1.70635182258 0.540910927797
Likelihoods of X[0]: [  5.45284252e-10   2.61367143e-01]
Assignment probabilities of X[0]: [  1.34992068e-09   9.99999999e-01]

Distance of centers: 0.0113521026495
Distance of variances: 0.00509576594463
Distance of priors: 0.0117048115268
Distance of centers: 0.0205084097319
Distance of variances: 0.00554599932695
Distance of priors: 0.0117048115268
Maximum distance: 0.0205084097319

New centers: [[ 1.02324918  1.14259504]
 [ 2.10239118  2.03825716]]
New variances: [ 0.06043106  0.44058348]
New priors: [ 0.38114822  0.61885178]
==================================================

Step 18
--------------------------------------------------
Distances of X[0] to proposed centers: 1.71498820963 0.552059119007
Likelihoods of X[0]: [  7.11192187e-11   2.55612720e-01]
Assignment probabilities of X[0]: [  1.71360916e-10   1.00000000e+00]

Distance of centers: 0.00804376433808
Distance of variances: 0.00315826766644
Distance of priors: 0.00871131810292
Distance of centers: 0.0149355037642
Distance of variances: 0.00356496681351
Distance of priors: 0.00871131810292
Maximum distance: 0.0149355037642

New centers: [[ 1.01871177  1.1359532 ]
 [ 2.09010421  2.02976601]]
New variances: [ 0.0572728   0.44414845]
New priors: [ 0.37243691  0.62756309]
==================================================

Step 19
--------------------------------------------------
Distances of X[0] to proposed centers: 1.72175190244 0.559824904487
Likelihoods of X[0]: [  1.60092066e-11   2.51806329e-01]
Assignment probabilities of X[0]: [  3.77310146e-11   1.00000000e+00]

Distance of centers: 0.00565022741598
Distance of variances: 0.0020573452826
Distance of priors: 0.00632100138488
Distance of centers: 0.0107116204567
Distance of variances: 0.0024201026715
Distance of priors: 0.00632100138488
Maximum distance: 0.0107116204567

New centers: [[ 1.01504994  1.13165016]
 [ 2.08153542  2.02333836]]
New variances: [ 0.05521545  0.44656855]
New priors: [ 0.36611591  0.63388409]
==================================================

Step 20
--------------------------------------------------
Distances of X[0] to proposed centers: 1.72680425454 0.565217842259
Likelihoods of X[0]: [  5.40690844e-12   2.49222226e-01]
Assignment probabilities of X[0]: [  1.25305746e-11   1.00000000e+00]

Distance of centers: 0.00401381314907
Distance of variances: 0.00140105202929
Distance of priors: 0.00455947711593
Distance of centers: 0.00766775605956
Distance of variances: 0.0016870718744
Distance of priors: 0.00455947711593
Maximum distance: 0.00766775605956

New centers: [[ 1.01222621  1.12879758]
 [ 2.07551816  2.01858577]]
New variances: [ 0.0538144   0.44825562]
New priors: [ 0.36155643  0.63844357]
==================================================

Step 21
--------------------------------------------------
Distances of X[0] to proposed centers: 1.73052006682 0.569006300177
Likelihoods of X[0]: [  2.43762411e-12   2.47429945e-01]
Assignment probabilities of X[0]: [  5.57915010e-12   1.00000000e+00]

Distance of centers: 0.00289135878611
Distance of variances: 0.000981814700836
Distance of priors: 0.00329856730383
Distance of centers: 0.00551421092664
Distance of variances: 0.00119126491967
Distance of priors: 0.00329856730383
Maximum distance: 0.00551421092664

New centers: [[ 1.0100884   1.12685086]
 [ 2.07124627  2.01509901]]
New variances: [ 0.05283258  0.44944689]
New priors: [ 0.35825786  0.64174214]
==================================================

Step 22
--------------------------------------------------
Distances of X[0] to proposed centers: 1.73325048532 0.571702172882
Likelihoods of X[0]: [  1.35372255e-12   2.46166854e-01]
Assignment probabilities of X[0]: [  3.06997797e-12   1.00000000e+00]

Distance of centers: 0.00210468797278
Distance of variances: 0.00070072516141
Distance of priors: 0.00239917190976
Distance of centers: 0.00399157975505
Distance of variances: 0.000849768881194
Distance of priors: 0.00239917190976
Maximum distance: 0.00399157975505

New centers: [[ 1.00848296  1.12548987]
 [ 2.06818081  2.01254252]]
New variances: [ 0.05213186  0.45029666]
New priors: [ 0.35585869  0.64414131]
==================================================

Step 23
--------------------------------------------------
Distances of X[0] to proposed centers: 1.73526160616 0.573642123014
Likelihoods of X[0]: [  8.75601193e-13   2.45265101e-01]
Assignment probabilities of X[0]: [  1.97227291e-12   1.00000000e+00]

Distance of centers: 0.00154282920953
Distance of variances: 0.000506222805001
Distance of priors: 0.00175417558998
Distance of centers: 0.00290759055391
Distance of variances: 0.000611698815093
Distance of priors: 0.00175417558998
Maximum distance: 0.00290759055391

New centers: [[ 1.00728211  1.12452123]
 [ 2.06596115  2.01066444]]
New variances: [ 0.05162564  0.45090836]
New priors: [ 0.35410451  0.64589549]
==================================================

Step 24
--------------------------------------------------
Distances of X[0] to proposed centers: 1.73674663655 0.575050504258
Likelihoods of X[0]: [  6.33680593e-13   2.44614599e-01]
Assignment probabilities of X[0]: [  1.42022530e-12   1.00000000e+00]

Distance of centers: 0.00113622670268
Distance of variances: 0.000368777745241
Distance of priors: 0.00128820504523
Distance of centers: 0.00212912248047
Distance of variances: 0.000443740556361
Distance of priors: 0.00128820504523
Maximum distance: 0.00212912248047

New centers: [[ 1.00638574  1.12382299]
 [ 2.06434253  2.00928124]]
New variances: [ 0.05125686  0.4513521 ]
New priors: [ 0.35281631  0.64718369]
==================================================

Step 25
--------------------------------------------------
Distances of X[0] to proposed centers: 1.73784542942 0.57607982557
Likelihoods of X[0]: [  4.98344763e-13   2.44141554e-01]
Assignment probabilities of X[0]: [  1.11277992e-12   1.00000000e+00]

Distance of centers: 0.000839405863479
Distance of variances: 0.000270238215344
Distance of priors: 0.000949261512683
Distance of centers: 0.00156552434747
Distance of variances: 0.000323913147843
Distance of priors: 0.000949261512683
Maximum distance: 0.00156552434747

New centers: [[ 1.00571742  1.1233151 ]
 [ 2.06315586  2.00826012]]
New variances: [ 0.05098662  0.45167601]
New priors: [ 0.35186705  0.64813295]
==================================================

Step 26
--------------------------------------------------
Distances of X[0] to proposed centers: 1.7386597047 0.576835832263
Likelihoods of X[0]: [  4.16827216e-13   2.43795457e-01]
Assignment probabilities of X[0]: [  9.28207239e-13   1.00000000e+00]

Distance of centers: 0.000621465231868
Distance of variances: 0.00019886742465
Distance of priors: 0.000701326468433
Distance of centers: 0.00115474754148
Distance of variances: 0.000237596975625
Distance of priors: 0.000701326468433
Maximum distance: 0.00115474754148

New centers: [[ 1.00521944  1.1229433 ]
 [ 2.06228239  2.00750481]]
New variances: [ 0.05078775  0.45191361]
New priors: [ 0.35116572  0.64883428]
==================================================

Step 27
--------------------------------------------------
Distances of X[0] to proposed centers: 1.73926383997 0.577393107135
Likelihoods of X[0]: [  3.64970655e-13   2.43541091e-01]
Assignment probabilities of X[0]: [  8.11080627e-13   1.00000000e+00]

Distance of centers: 0.000460809572332
Distance of variances: 0.000146794826474
Distance of priors: 0.000519165540053
Distance of centers: 0.000853775923449
Distance of variances: 0.00017493192483
Distance of priors: 0.000519165540053
Maximum distance: 0.000853775923449

New centers: [[ 1.00484852  1.12266987]
 [ 2.06163756  2.00694524]]
New variances: [ 0.05064096  0.45208854]
New priors: [ 0.35064655  0.64935345]
==================================================


In [20]:
plot_data(X, which_gaussian_em, cluster_centers_em, cluster_stds_em)
plt.title('Predicted by Expectation-Maximization')


Out[20]:
<matplotlib.text.Text at 0x7f1ea8125b38>

1.3 Run K-means algorithm

For simplicity, we use the sklearn version of K-means here. The detailed algorithm was already implemented in a previous exercise.


In [21]:
from sklearn.cluster import KMeans

def run_k_means(X):
    km = KMeans(2)
    km.fit(X)
    which_gaussian_km = km.predict(X)
    cluster_stds = np.array([np.sqrt(variance(X[which_gaussian_km == 0])), np.sqrt(variance(X[which_gaussian_km == 1]))])
    return which_gaussian_km, km.cluster_centers_, cluster_stds

which_gaussian_km, cluster_centers_km, cluster_stds_km = run_k_means(X)

plot_data(X, which_gaussian_km, cluster_centers_km, cluster_stds_km)
plt.title('Predicted by K-Means')


Out[21]:
<matplotlib.text.Text at 0x7f1ea182ddd8>

K-means clusters the data point by establishing a straight separation line. This cannot fully capture the nature of the data, e.g. the points around the lower left Gaussian, which actually belong to the upper right Gaussian.

1.4 Initialize EM algorithm with cluster parameters from K-Means


In [22]:
_, _, _, num_steps_em_km = run_expectation_maximization(X, cluster_centers_km, cluster_stds_km**2)

print('Took', num_steps_em, 'steps with random initalization')
print('Took', num_steps_em_km, 'steps with initialization from K-means')


Took 27 steps with random initalization
Took 18 steps with initialization from K-means

1.5 Repeat analysis for different $\sigma_1$ values


In [23]:
sigma1s = [0.1, 0.5, 1, 1.5]
fig, axes = plt.subplots(len(sigma1s), 3, figsize=(15, 15), sharex=True, sharey=True)

for i, (sigma1, horizontal_axes) in enumerate(zip(sigma1s, axes)):
    X, which_gaussian = create_data(sigma1)
    
    plt.sca(horizontal_axes[0])
    plot_data(X, which_gaussian, [w1, w2], [sigma1, sigma2])
    if i == 0:
        plt.title('Ground truth')
        
    which_gaussian_em, cluster_centers_em, cluster_stds_em, num_steps_em = run_expectation_maximization(X)
    plt.sca(horizontal_axes[1])
    plot_data(X, which_gaussian_em, cluster_centers_em, cluster_stds_em)
    if i == 0:
        plt.title('Predicted by Expectation-Maximization')
        
    which_gaussian_km, cluster_centers_km, cluster_stds_km = run_k_means(X)
    plt.sca(horizontal_axes[2])
    plot_data(X, which_gaussian_km, cluster_centers_km, cluster_stds_km)
    if i == 0:
        plt.title('Predicted by K-Means')


Each row corresponds to increasing $\sigma_1$ (the values are 0.1, 0.5, 1, 1.5).

K-means and Expectation-Maximization show similar results for small $\sigma_1$, i.e. if the clusters are clearly separated. With increasing $\sigma_1$, the Gaussians overlap more and more, and K-means fails to cluster them correctly.


In [ ]: