This notebook demonstrates using scikit-learn methods to reduce the dimensionality of a feature set and visualization of the resulting 3 dimensional sets with datapoints colored according to their labels.


In [2]:
# You want to be able to rotate scatterplots in 3D, so don't show them inline 
%matplotlib tk

# 'pip install bunch' if you do not have 'bunch' package
import bunch

# Our utility code resides in module dim_reduce.py, which we import here:
import dim_reduce

Now let us apply a PCA dimensionality reduction method to the "iris" dataset (which is 4D).


In [3]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

data = load_iris()  # a JavaScript-like object ('Bunch'), holding the training data for the built-in 'iris' dataset
y = data.target # labels
X = data.data # features (4 features for each sample)
pca = PCA(n_components=3)  # reduce feture set to 3 dimensions
reduced_X = pca.fit_transform(X) # reduced 3D feature set

visData = bunch.Bunch()
visData.target = data.target
visData.data = reduced_X
dim_reduce.vis3D(visData, title="3D PCA", dotsize=30)

A 3D scatterplot should display in a separate window. Rotating it by clicking and dragging you can observe that 2D is enough to separate the classes in this case! Indeed, let's transform directly to 2D:


In [5]:
pca = PCA(n_components=2)  
visData.data = pca.fit_transform(X)
dim_reduce.vis2D(visData, title="2D PCA", dotsize=30)

Similarly for t-SNE:


In [7]:
from sklearn.manifold import TSNE
tsne = TSNE(n_components=3, random_state=0, learning_rate = 100)
visData.data = tsne.fit_transform(X)
dim_reduce.vis3D(visData, title="3D t-SNE", dotsize=30)

In [ ]: