Examples of unsupervised learning activities include:
In [1]:
% matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
In [2]:
from sklearn.datasets import load_digits
digits = load_digits()
digits.keys()
Out[2]:
In [3]:
digits.images.shape
Out[3]:
In [4]:
print(digits.images[0])
In [5]:
plt.matshow(digits.images[23], cmap=plt.cm.Greys)
Out[5]:
In [6]:
digits.data.shape
Out[6]:
In [7]:
digits.target.shape
Out[7]:
In [8]:
digits.target[23]
Out[8]:
SciKit-Learn
, data
contains the design matrix $X$, and is a numpy
array of shape $(N, P)$target
contains the response variables $y$, and is a numpy
array of shape $(N)$
In [9]:
print(digits.DESCR)
Splitting the data:
In [10]:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target)
In [11]:
X_train.shape,y_train.shape
Out[11]:
In [12]:
X_test.shape,y_test.shape
Out[12]:
In [13]:
?train_test_split
SciKit-Learn
provides 5 "toy" datasets for tutorial purposes, all load
-able in the same way:
Name | Description |
---|---|
boston |
Boston house-prices, with 13 associated measurements (R) |
iris |
Fisher's iris classifications (based on 4 characteristics) (C) |
diabetes |
Diabetes (x vs y) (R) |
digits |
Hand-written digits, 8x8 images with classifications (C) |
linnerud |
Linnerud: 3 exercise and 3 physiological data (R) |
In [14]:
from sklearn.datasets import load_boston
boston = load_boston()
print(boston.DESCR)
In [15]:
# Visualizing the Boston house price data:
import corner
X = boston.data
y = boston.target
plot = np.concatenate((X,np.atleast_2d(y).T),axis=1)
labels = np.append(boston.feature_names,'MEDV')
corner.corner(plot,labels=labels);
Talk to your neighbor for a few minutes about the things you have just heard about machine learning. In this course have we been talking about regression or classification problems? Have our models been supervised or unsupervised? How are our example astronomical datasets similar to the toy datasets in SciKit-Learn
? And how are they different?