Examples of unsupervised learning activities include:
In [1]:
    
% matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
    
In [2]:
    
from sklearn.datasets import load_digits
digits = load_digits()
digits.keys()
    
    Out[2]:
In [3]:
    
digits.images.shape
    
    Out[3]:
In [4]:
    
print(digits.images[0])
    
    
In [5]:
    
plt.matshow(digits.images[23], cmap=plt.cm.Greys)
    
    Out[5]:
    
In [6]:
    
digits.data.shape
    
    Out[6]:
In [7]:
    
digits.target.shape
    
    Out[7]:
In [8]:
    
digits.target[23]
    
    Out[8]:
SciKit-Learn,  data contains the design matrix $X$, and is a numpy array of shape $(N, P)$target contains the response variables $y$, and is a numpy array of shape $(N)$
In [9]:
    
print(digits.DESCR)
    
    
Splitting the data:
In [10]:
    
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target)
    
In [11]:
    
X_train.shape,y_train.shape
    
    Out[11]:
In [12]:
    
X_test.shape,y_test.shape
    
    Out[12]:
In [13]:
    
?train_test_split
    
SciKit-Learn provides 5 "toy" datasets for tutorial purposes, all load-able in the same way:
| Name | Description | 
|---|---|
boston | 
Boston house-prices, with 13 associated measurements (R) | 
iris | 
Fisher's iris classifications (based on 4 characteristics) (C) | 
diabetes | 
Diabetes (x vs y) (R) | 
digits | 
Hand-written digits, 8x8 images with classifications (C) | 
linnerud | 
Linnerud: 3 exercise and 3 physiological data (R) | 
In [14]:
    
from sklearn.datasets import load_boston
boston = load_boston()
print(boston.DESCR)
    
    
In [15]:
    
# Visualizing the Boston house price data:
import corner
X = boston.data
y = boston.target
plot = np.concatenate((X,np.atleast_2d(y).T),axis=1)
labels = np.append(boston.feature_names,'MEDV')
corner.corner(plot,labels=labels);
    
    
    
Talk to your neighbor for a few minutes about the things you have just heard about machine learning. In this course have we been talking about regression or classification problems? Have our models been supervised or unsupervised? How are our example astronomical datasets similar to the toy datasets in SciKit-Learn? And how are they different?