This notebook demonstrates using scikit-learn methods to reduce the dimensionality of a feature set and visualization of the resulting 3 dimensional sets with datapoints colored according to their labels.
In [2]:
# You want to be able to rotate scatterplots in 3D, so don't show them inline
%matplotlib tk
# 'pip install bunch' if you do not have 'bunch' package
import bunch
# Our utility code resides in module dim_reduce.py, which we import here:
import dim_reduce
Now let us apply a PCA dimensionality reduction method to the "iris" dataset (which is 4D).
In [3]:
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
data = load_iris() # a JavaScript-like object ('Bunch'), holding the training data for the built-in 'iris' dataset
y = data.target # labels
X = data.data # features (4 features for each sample)
pca = PCA(n_components=3) # reduce feture set to 3 dimensions
reduced_X = pca.fit_transform(X) # reduced 3D feature set
visData = bunch.Bunch()
visData.target = data.target
visData.data = reduced_X
dim_reduce.vis3D(visData, title="3D PCA", dotsize=30)
A 3D scatterplot should display in a separate window. Rotating it by clicking and dragging you can observe that 2D is enough to separate the classes in this case! Indeed, let's transform directly to 2D:
In [5]:
pca = PCA(n_components=2)
visData.data = pca.fit_transform(X)
dim_reduce.vis2D(visData, title="2D PCA", dotsize=30)
Similarly for t-SNE:
In [7]:
from sklearn.manifold import TSNE
tsne = TSNE(n_components=3, random_state=0, learning_rate = 100)
visData.data = tsne.fit_transform(X)
dim_reduce.vis3D(visData, title="3D t-SNE", dotsize=30)
In [ ]: