Pattern Recognition - ViBOT MsCV

Guillaume Lemaitre - Fabrice Meriaudeau - Johan Massich

Clustering


In [ ]:
%matplotlib inline
%pprint off

# Matplotlib library
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import matplotlib.pyplot as plt

# MPLD3 extension
import mpld3

# Numpy library
import numpy as np

# Import the Scipy library for griddata
from scipy.interpolate import griddata

Import the library to perform the clustering with k-means and fuzzy c-means.


In [ ]:
# Import k-means clustering method from scikit-learn
from sklearn.cluster import KMeans
# Import fuzzy c-means from scikit-fuzzy
import skfuzzy as fuzz

Assuming the following generated points:

  • Two classes with respective labels 0 and 1,
  • Class #1 follows with labels 0 a multivariate normal distribution with:
$$\mu_1 = \left[ 1, 1 \right]$$$$\Sigma_1 = \left[ \begin{matrix} 1 && 0 \\ 0 && 1 \end{matrix} \right]$$
  • Class #2 with labels 1 follows a multivariate normal distribution with:
$$\mu_2 = \left[ -1, -1 \right]$$$$\Sigma_2 = \left[ \begin{matrix} 1 && 0 \\ 0 && 1 \end{matrix} \right]$$

In [ ]:
# Size of points in the dataset
N = 1000

# Define the property of the gaussian distribution
mean1, mean2 = np.array([1., 1.]), np.array([-1., -1.])
cov1, cov2 = np.diagflat([1, 1]), np.diagflat([1, 1])

class_1 = np.random.multivariate_normal(mean1, cov1, N / 2)
class_2 = np.random.multivariate_normal(mean2, cov2, N / 2)

data = np.concatenate((class_1, class_2), axis=0)
gt = np.squeeze(np.concatenate((np.zeros((1, N / 2), dtype = int), np.ones((1, N / 2), dtype = int)), axis = 1))

fig = plt.figure()
# Find the indexes of the first cluster
plt.plot(class_1[:, 0], class_1[:, 1], 'xb', label='Cluster #1')
plt.plot(class_2[:, 0], class_2[:, 1], 'xr', label='Cluster #2')
plt.legend()
# Show the figure
mpld3.display(fig)

Clustering via k-means

(a) Use k-means clustering method to find the cluster centers for $k=2$. To do so, you will:

  • Call the constructor KMeans(),
  • Use the function predict of the object build in order to apply the clustering,
  • Get the centers of each cluster,
  • Display these centers.

In [ ]:
# Define the number of clusters k
k = 2

# Define the parameters of k-means
### use init 'random' and only one try
k_means_cluster = KMeans(...)
# Run k-means
### Use the function predict()
...
# Get the centers of k-means
centers_k_means = ...

print 'The centers found by k-means are \n {}'.format(centers_k_means)

(b) Plot the cluster centers and the data labelled by the k-means fitting.


In [ ]:
...

(c) Complete the following function to compute the misclassification rate.


In [ ]:
# Compute the misclassification rate
def compute_error_rate(k_means_labels, gt_labels):
    ### Use the function nonzero()
    return float(np.size(np.nonzero(np.squeeze(k_means_labels != gt_labels)))) / float(np.size(gt_labels)) * 100.

(d) What is the misclassification for the current fitting? Highlight inside a plot the element which have been misclasified.

Hint: think at swapping the label if the error rate is really high. The label affected is performed in an unsupervised manner.


In [ ]:
# Show the misclassification rate
print 'The error rate is {} %'.format(...) 

# Plot the misclassified samples
# Find the samples
idx_wellclass = ...
idx_misclass = ...

# Maybe we have to swap the cluster
if (np.size(idx_misclass) > np.size(idx_wellclass)):
    tmp = idx_wellclass[:]
    idx_wellclass = idx_misclass[:]
    idx_misclass = tmp[:]
    del tmp
    
# Get the data
data_wellclass = ...
data_misclass = ...

# Make the plot
fig = plt.figure()
# Find the indexes of the first cluster
legend_tptn = ...
legend_fpfn = ...
plt.legend([legend_tptn[0], legend_fpfn[0]], ["TP & TN", "FP & FN"])
# Show the figure
mpld3.display(fig)

(e) Repeat 10 times the k-means fitting and compute the mean error.


In [ ]:
# Define the number of repetitions
rep_t = 10

# Accumulate the error
acc_err = 0.
for rep in range(0, 10):
    # Run k-means predict()
    ...
    # Check the error and accumulate
    acc_err += np.minimum(...)
    print 'The error rate is {} %'.format(...) 
    
# Average the error
acc_err ...

# Show the mean misclassification rate
print 'The mean error rate is {} %'.format(acc_err)

Clustering via fuzzy c-means

(a) Use fuzzy c-means clustering method to find the cluster centers for $c=2$. Check the following link for an example: https://github.com/scikit-fuzzy/scikit-fuzzy/blob/master/skfuzzy/cluster/tests/test_cmeans.py


In [ ]:
# Define the number of clusters
c = 2
# Exponentiation parameter
m = 2.

# Run the fuzzy c-means - need to transpose the data
...

(b) Plot the cluster centers and the membership degree of the data to each one of the two clusters.


In [ ]:
# Plot a representation depending of the membership
### Create a mesh grid using np.grid()
grid_x, grid_y = np.mgrid[-4.:5.:200j, -4.:5.:200j]
### Use the function griddata() in order to create the surface based on the membership degree 
grid_z0 = griddata(...)
grid_z1 = griddata(...)
fig = plt.figure()
plt.imshow(grid_z0.T, extent=(-4,5,-4,5), origin='lower')
plt.title('Membership to belong to the class #1')
plt.figure()
plt.imshow(grid_z1.T, extent=(-4,5,-4,5), origin='lower')
plt.title('Membership to belong to the class #2')
plt.show()

(c) Plot in each data point to the most probable cluster to which it will belongs. Plot also the centroids.


In [ ]:
...

(d) Compute the misclassifcation error rate.


In [ ]:
...

Retina segmentation using k-means and fuzzy c-means


In [ ]:
# Import scikit-image for input-output manipulation
from skimage import io
from skimage import img_as_float

Assuming that the image can be clustered with four classes:

  • One cluster with artefacts at the edges of the image
  • One cluster with the optic nerve and other artefacts
  • One cluster with noise across the image
  • One cluster with the vessels

In [ ]:
# Number of classes
nb_classes = 4

(a) From the data folder, load the retina image retina.jpg. Convert it into float type.


In [ ]:
# Load the images
# Use the function img_as_float()
# Use the function io.imread()
retina_im = ...

# Show the results
fig, ax = plt.subplots()
ax.imshow(retina_im)
ax.set_title('Original image')
ax.axis('off')

plt.show()

(b) Complete the following Python function.

  • Compute a background image using a median filtering for each colour channel with a square kernel of size 30.
  • Subtract each background channel to the original channel.
  • Normalise each channel using min-max normalisation.

In [ ]:
# Import morpho element
from skimage.morphology import square
# Import the median filtering
from skimage.filter.rank import median

# Function to pre process the images
def PreProcessing(rgb_image):
    output = np.zeros(np.shape(rgb_image))
    
    # Obtain the background image for each channel through median filtering
    background_im_r = ...
    background_im_g = ...
    background_im_b = ...
    
    # Remove the background to the original channels
    output[:, :, 0] = ...
    output[:, :, 1] = ...
    output[:, :, 2] = ...
    
    # Normalise the image
    output[:, :, 0] = normalise_im(...)
    output[:, :, 1] = normalise_im(...)
    output[:, :, 2] = normalise_im(...)
    
    return output

# Function to apply min-max normalisation
def normalise_im(im_2d):
    return ...

(c) Apply the pre-processing to retina image and plot the resulting image.


In [ ]:
...

(d) Extract the characteristic features from the pre-processed image.


In [ ]:
# Extraction of the data
### You can use np.reshape()
data = ...

(e) Run k-means with 10 iterations and k-means++ as initialisation of the cluster.


In [ ]:
...

(f) Plot each cluster to observe the segmentation.


In [ ]:
...

(g) Run fuzzy c-means.


In [ ]:
...

(h) Plot the degree of membership for each cluster to depict the segmentation.


In [ ]:
...