Machine Learning 101: General Concepts

ML involves building programs with tunable parameters that are adjusted automatically so as to improve their behaviour by adapting to previously seen data.

The ML algorithms in the scikit-learn library take a numpy array as an input $x$, with shape (n_samples, n_features).

Iris example

We consider the iris data set. Each sample in this data set has 4 features:

1. sepal length
1. sepal width
1. petal length
1. petal width

From this we want to classify into one of:

Iris Setosa
Iris Versicolour
Iris Virginica

First let's load the data:



In [2]:

    
from sklearn.datasets import load_iris
iris = load_iris()
n_samples, n_features = iris.data.shape

The features of each flower are stored row-wise in the data attribute of the dataset



In [3]:

    
iris.data[0]









    Out[3]:





array([ 5.1,  3.5,  1.4,  0.2])

The information about the class of each sample is stored in the target attribute, whilst the names are stored in the attribute target_names



In [6]:

    
iris.target









    Out[6]:





array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])



In [7]:

    
iris.target_names









    Out[7]:





array(['setosa', 'versicolor', 'virginica'], 
      dtype='<U10')



In [ ]:

    
### Supervised and Unsupervised Learning