What is machine learning ?

Supervised learning

Data Representations

Dataset Split



In [ ]:

    
% matplotlib nbagg
import matplotlib.pyplot as plt
import numpy as np



In [ ]:

    
from sklearn.datasets import load_digits
digits = load_digits()
digits.keys()



In [ ]:

    
digits.images.shape



In [ ]:

    
print(digits.images[0])



In [ ]:

    
plt.matshow(digits.images[0], cmap=plt.cm.Greys)



In [ ]:

    
digits.data.shape



In [ ]:

    
digits.target.shape



In [ ]:

    
digits.target

Data is always a numpy array (or sparse matrix) of shape (n_samples, n_features)

Splitting the data:



In [ ]:

    
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target)

Exercises

Load the iris dataset from the sklearn.datasets module using the load_iris function. The function returns a dictionary-like object that has the same attributes as digits.

What is the number of classes, features and data points in this dataset? Use a scatterplot to visualize the dataset.

You can look at DESCR attribute to learn more about the dataset.



In [ ]:

    
# %load solutions/load_iris.py



In [ ]: