Machine Learning 101: General Concepts


ML involves building programs with tunable parameters that are adjusted automatically so as to improve their behaviour by adapting to previously seen data.

The ML algorithms in the scikit-learn library take a numpy array as an input $x$, with shape (n_samples, n_features).

Iris example

We consider the iris data set. Each sample in this data set has 4 features:

    1. sepal length
    1. sepal width
    1. petal length
    1. petal width

From this we want to classify into one of:

  • Iris Setosa

  • Iris Versicolour

  • Iris Virginica

First let's load the data:


In [2]:
from sklearn.datasets import load_iris
iris = load_iris()
n_samples, n_features = iris.data.shape

The features of each flower are stored row-wise in the data attribute of the dataset


In [3]:
iris.data[0]


Out[3]:
array([ 5.1,  3.5,  1.4,  0.2])

The information about the class of each sample is stored in the target attribute, whilst the names are stored in the attribute target_names


In [6]:
iris.target


Out[6]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [7]:
iris.target_names


Out[7]:
array(['setosa', 'versicolor', 'virginica'], 
      dtype='<U10')

In [ ]:
### Supervised and Unsupervised Learning