Iris Data Set - Multi Class Classification ML Problem

  • The data is of the Flowering data set - Iris
  • Example of Classification algorithm (Supervised Learning)
  • Author: Rishu Shrivastava
  • last updated: Dec 23, 2017

In [1]:
# Import the necessary ML Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import neighbors, datasets
from sklearn.model_selection import train_test_split

In [2]:
# Read the Iris data set from pre-build scikit learn library


dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names'])

In [3]:
# printing the feature names


['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

In [4]:
# displaying the first 5 rows[:5]

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2]])

In [5]:
# Assigning the features and results

X =[:] # reading all of the data features (1-4)
y =[:] # reading all of the target features (1)



In [6]:
# plotting some graphs to show the relationship between flowering dataset using matplotlib

# Relationship between Sepal length and Sepal width for the 3 classes of flowers

plt.figure(1, figsize=(8, 6))

plt.scatter(X[:,0], X[:,1], c=y, s=60,, edgecolor='k')
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('Sepal length (cm) vs Sepal Width (cm)')

In [7]:
# Relationship between Petal length and Petal width for the 3 classes of flowers
plt.figure(1, figsize=(8, 6))
plt.scatter(X[:,2], X[:,3], c=y, s=60,, edgecolor='k')
plt.xlabel('Petal length')
plt.ylabel('Petal width')
plt.title('Petal length (cm) vs Petal Width (cm)')


The Petal length and width seems to be less distored and clearly classified than the Sepal Length and width plot.

In [8]:
# Splitting the Iris dataset into Train and Test data set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

print('The length of Training data set',len(X_train))
print('The length of Test data set',len(X_test))

The length of Training data set 120
The length of Test data set 30

Using Classification algorithms to train our Iris data set.

1. K Nearest Neighbor Classifier

In [21]:
# Training the train data set using KNN Classifier

from sklearn.neighbors import KNeighborsClassifier
knn_clf = neighbors.KNeighborsClassifier(n_neighbors, weights='distance'),y_train)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=10, p=2,

In [22]:
# Calculating the score

print('Algorithm Score (KNN): {:.2f}'.format(knn_clf.score(X_test,y_test) * 100))

Algorithm Score (KNN): 100.00

In [25]:
# Plotting a 2D image based on 2 features of the Iris dataset - Sepal Length and Sepal Width

from matplotlib.colors import ListedColormap

cmap_light = ListedColormap(['#e74c3c', '#f1c40f','#bdc3c7'])
cmap_bold = ListedColormap(['#ecf0f1', '#2c3e50','#2ecc71'])

h=.05 # step size in the mesh

# Fitting only the Sepal Length and width data set to the KNN Classifier for plotting
knn_clf2 = neighbors.KNeighborsClassifier(n_neighbors, weights='distance')[:,:2], y_train)

# calculate min, max and limits for creating the boundaries
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# predict class using data and kNN classifier
Z = knn_clf2.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(10, 8))
plt.pcolormesh(xx, yy, Z, cmap=cmap_light)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_bold)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.title("3-Class classification (k = %i)" % (n_neighbors))

In [26]:
# Making Sample Prediction based on manual data entry
print('Sample Prediction :'), 
if manual_dataentry == 0:
    print('Iris Setosa')
elif manual_dataentry == 1:
    print('Iris Versicolour')
    print('Iris Virginica')

Sample Prediction :
Iris Setosa

In [27]:
# printing all of the test data set predictions

for i in range(0,len_test_data):
    if test_predict == 0:
    elif test_predict == 1:

[[ 6.1  3.   4.6  1.4]] [1] Versicolour
[[ 6.4  2.9  4.3  1.3]] [1] Versicolour
[[ 6.5  2.8  4.6  1.5]] [1] Versicolour
[[ 6.   2.9  4.5  1.5]] [1] Versicolour
[[ 5.4  3.4  1.5  0.4]] [0] Setosa
[[ 6.4  3.2  5.3  2.3]] [2] Virginica
[[ 6.1  2.8  4.   1.3]] [1] Versicolour
[[ 5.8  2.8  5.1  2.4]] [2] Virginica
[[ 6.1  2.6  5.6  1.4]] [2] Virginica
[[ 6.6  3.   4.4  1.4]] [1] Versicolour
[[ 6.4  2.8  5.6  2.2]] [2] Virginica
[[ 5.5  2.6  4.4  1.2]] [1] Versicolour
[[ 4.4  3.   1.3  0.2]] [0] Setosa
[[ 5.4  3.4  1.7  0.2]] [0] Setosa
[[ 4.5  2.3  1.3  0.3]] [0] Setosa
[[ 7.1  3.   5.9  2.1]] [2] Virginica
[[ 5.6  2.7  4.2  1.3]] [1] Versicolour
[[ 6.8  3.   5.5  2.1]] [2] Virginica
[[ 5.2  3.5  1.5  0.2]] [0] Setosa
[[ 7.7  3.8  6.7  2.2]] [2] Virginica
[[ 6.4  2.8  5.6  2.1]] [2] Virginica
[[ 6.   2.2  5.   1.5]] [2] Virginica
[[ 5.5  4.2  1.4  0.2]] [0] Setosa
[[ 4.6  3.6  1.   0.2]] [0] Setosa
[[ 5.5  2.5  4.   1.3]] [1] Versicolour
[[ 6.5  3.   5.5  1.8]] [2] Virginica
[[ 7.7  2.6  6.9  2.3]] [2] Virginica
[[ 6.4  2.7  5.3  1.9]] [2] Virginica
[[ 4.8  3.   1.4  0.3]] [0] Setosa
[[ 5.1  2.5  3.   1.1]] [1] Versicolour

2. Logistic Regression

In [28]:
# Training the data set using Logistic regression

from sklearn.linear_model import LogisticRegression
logistic_reg= LogisticRegression(),y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [29]:
print('Algorithm Score (Logistic Regression): {:.2f}'.format(logistic_reg.score(X_test, y_test) * 100))

Algorithm Score (Logistic Regression): 100.00

Algorithm Score Chart

  1. K Nearest Neighbor Classifier : 100.00 %
  2. Logistic Regression : 96.67 %

