BONUS

In this question, we'll look at clustering, an unsupervised machine learning technique for unlabeled data.

A

In this part, you'll compare two different clustering algorithms on two different datasets.

First, you'll use K-Means to cluster the data. The documentation is here. Some relevant points include:

  • You'll set the number of clusters as n_clusters.
  • You fit the data using the fit() method.

Second, you'll use Spectral Clustering to cluster the data. The documentation is here. The arguments and methods are the same as K-Means.

Third, you'll test both methods on two datasets: X1 and X2.

  • Both datasets have 100 data points, and are 2-dimensional.
  • X1 has 3 clusters. X2 has only 2.

Fourth, you'll call predict() with the same dataset on which you called fit() in order to predict what cluster each point belongs to. This will return a vector y with the predictions. Plot the 2D data and color the points using the predicted vector to see how the points are clustered.


In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import sklearn.cluster as cluster
import sklearn.datasets as datasets

np.random.seed(342585)
X1, _ = datasets.make_blobs(shuffle = False)   # DATASET 1: there are 3 clusters
X2, _ = datasets.make_circles(shuffle = False) # DATASET 2: there are 2 clusters

### BEGIN SOLUTION

### END SOLUTION

B

Mention at the beginning of your answer whether or not you successfully completed Part A above, then continue:

If you did Part A: Discuss your results. How do the two algorithms differ, if at all? Can you speculate as to the relative complexities of the two algorithms? Do you have any ideas for when you might use one over the other?

If you did NOT do Part A: Read the K-means and spectral clustering sections of the scikit-learn user guide. What are the strengths and weaknesses of the two algorithms? Under what circumstances would you advise using one over the other?