Problem-set-Jupyter-Pyplot-and-Numpy

Write a note about the data set

Fisher's Iris Data Set is a well known data set that has become a common test case in machine learning. Each row in the data set is comprised of four numeric values for petal length, petal width, sepal length and sepal width. The row also contains the type of iris flower (one of three: Iris setosa, Iris versicolor, or Iris virginica).

According to Lichman [1],

"One class is linearly separable from the other 2; the latter are NOT linearly separable from each other".

Types are clustered together and can be analysed to distinguish or predict the type of iris flower by it's measurements (petal length, petal width, sepal length and sepal width)[2].

References:

[1] Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

[2] True, Joseph - Content Data Scientist (2015). IMB Watson Analytics [https://www.ibm.com/communities/analytics/watson-analytics-blog/watson-analytics-use-case-the-iris-data-set/].

Get and load the data



In [1]:

    
import numpy as np

# Load in data from csv file.
sepal_length, sepal_width, petal_length, petal_width = np.genfromtxt('../data/IRIS.csv', delimiter=',', usecols=(0,1,2,3), unpack=True, dtype=float)
iris_class = np.genfromtxt('../data/IRIS.csv', delimiter=',', usecols=(4), unpack=True, dtype=str)

# Loaded the columns into separate variables for ease of use.

Create a simple plot



In [2]:

    
import matplotlib.pyplot as plt

# Plot Sepal Length on the x-axis and Sepal Width on the y-axis; complete with labels.

# Scale graph to a bigger size
plt.rcParams['figure.figsize'] = (14.0, 6.0)

# Set title
plt.title('Iris Data Set: Sepal Measurements', fontsize=16)

# plot scatter graph
plt.scatter(sepal_length, sepal_width)

# Add labels
plt.xlabel('Sepal Length', fontsize=14)
plt.ylabel('Sepal Width', fontsize=14)

# Output Graph
plt.show()

Create a more complex plot



In [3]:

    
# https://matplotlib.org/users/legend_guide.html
import matplotlib.patches as mp

# https://stackoverflow.com/questions/27318906/python-scatter-plot-with-colors-corresponding-to-strings
colours = {'Iris-setosa': 'r', 'Iris-versicolor': 'g', 'Iris-virginica': 'b'}

plt.scatter(sepal_length, sepal_width, c=[colours[i] for i in iris_class], label=[colours[i] for i in colours])

# Add title
plt.title('Iris Setosa, Versicolor, and Virginica: Sepal Measurements', fontsize=16)

# Add labels
plt.xlabel('Sepal Length', fontsize=14)
plt.ylabel('Sepal Width', fontsize=14)

# https://matplotlib.org/api/patches_api.html
plt.legend(handles = [mp.Patch(color=colour, label=label) for label, colour in [('Iris Setosa', 'r'), ('Iris Versicolor', 'g'), ('Iris Virginica', 'b')]])
plt.show()

Use seaborn



In [4]:

    
import seaborn as sns
import pandas as pd 

# Prepare data with pandas DataFrame for seaborn usage.
df = pd.DataFrame(dict(zip(['Sepal Length', 'Sepal Width','Petal Length', 'Petal Width', 'Iris Class'], [sepal_length, sepal_width, petal_length, petal_width, iris_class])))
df









    Out[4]:







  
    
      
      Iris Class
      Petal Length
      Petal Width
      Sepal Length
      Sepal Width
    
  
  
    
      0
      Iris-setosa
      1.4
      0.2
      5.1
      3.5
    
    
      1
      Iris-setosa
      1.4
      0.2
      4.9
      3.0
    
    
      2
      Iris-setosa
      1.3
      0.2
      4.7
      3.2
    
    
      3
      Iris-setosa
      1.5
      0.2
      4.6
      3.1
    
    
      4
      Iris-setosa
      1.4
      0.2
      5.0
      3.6
    
    
      5
      Iris-setosa
      1.7
      0.4
      5.4
      3.9
    
    
      6
      Iris-setosa
      1.4
      0.3
      4.6
      3.4
    
    
      7
      Iris-setosa
      1.5
      0.2
      5.0
      3.4
    
    
      8
      Iris-setosa
      1.4
      0.2
      4.4
      2.9
    
    
      9
      Iris-setosa
      1.5
      0.1
      4.9
      3.1
    
    
      10
      Iris-setosa
      1.5
      0.2
      5.4
      3.7
    
    
      11
      Iris-setosa
      1.6
      0.2
      4.8
      3.4
    
    
      12
      Iris-setosa
      1.4
      0.1
      4.8
      3.0
    
    
      13
      Iris-setosa
      1.1
      0.1
      4.3
      3.0
    
    
      14
      Iris-setosa
      1.2
      0.2
      5.8
      4.0
    
    
      15
      Iris-setosa
      1.5
      0.4
      5.7
      4.4
    
    
      16
      Iris-setosa
      1.3
      0.4
      5.4
      3.9
    
    
      17
      Iris-setosa
      1.4
      0.3
      5.1
      3.5
    
    
      18
      Iris-setosa
      1.7
      0.3
      5.7
      3.8
    
    
      19
      Iris-setosa
      1.5
      0.3
      5.1
      3.8
    
    
      20
      Iris-setosa
      1.7
      0.2
      5.4
      3.4
    
    
      21
      Iris-setosa
      1.5
      0.4
      5.1
      3.7
    
    
      22
      Iris-setosa
      1.0
      0.2
      4.6
      3.6
    
    
      23
      Iris-setosa
      1.7
      0.5
      5.1
      3.3
    
    
      24
      Iris-setosa
      1.9
      0.2
      4.8
      3.4
    
    
      25
      Iris-setosa
      1.6
      0.2
      5.0
      3.0
    
    
      26
      Iris-setosa
      1.6
      0.4
      5.0
      3.4
    
    
      27
      Iris-setosa
      1.5
      0.2
      5.2
      3.5
    
    
      28
      Iris-setosa
      1.4
      0.2
      5.2
      3.4
    
    
      29
      Iris-setosa
      1.6
      0.2
      4.7
      3.2
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      120
      Iris-virginica
      5.7
      2.3
      6.9
      3.2
    
    
      121
      Iris-virginica
      4.9
      2.0
      5.6
      2.8
    
    
      122
      Iris-virginica
      6.7
      2.0
      7.7
      2.8
    
    
      123
      Iris-virginica
      4.9
      1.8
      6.3
      2.7
    
    
      124
      Iris-virginica
      5.7
      2.1
      6.7
      3.3
    
    
      125
      Iris-virginica
      6.0
      1.8
      7.2
      3.2
    
    
      126
      Iris-virginica
      4.8
      1.8
      6.2
      2.8
    
    
      127
      Iris-virginica
      4.9
      1.8
      6.1
      3.0
    
    
      128
      Iris-virginica
      5.6
      2.1
      6.4
      2.8
    
    
      129
      Iris-virginica
      5.8
      1.6
      7.2
      3.0
    
    
      130
      Iris-virginica
      6.1
      1.9
      7.4
      2.8
    
    
      131
      Iris-virginica
      6.4
      2.0
      7.9
      3.8
    
    
      132
      Iris-virginica
      5.6
      2.2
      6.4
      2.8
    
    
      133
      Iris-virginica
      5.1
      1.5
      6.3
      2.8
    
    
      134
      Iris-virginica
      5.6
      1.4
      6.1
      2.6
    
    
      135
      Iris-virginica
      6.1
      2.3
      7.7
      3.0
    
    
      136
      Iris-virginica
      5.6
      2.4
      6.3
      3.4
    
    
      137
      Iris-virginica
      5.5
      1.8
      6.4
      3.1
    
    
      138
      Iris-virginica
      4.8
      1.8
      6.0
      3.0
    
    
      139
      Iris-virginica
      5.4
      2.1
      6.9
      3.1
    
    
      140
      Iris-virginica
      5.6
      2.4
      6.7
      3.1
    
    
      141
      Iris-virginica
      5.1
      2.3
      6.9
      3.1
    
    
      142
      Iris-virginica
      5.1
      1.9
      5.8
      2.7
    
    
      143
      Iris-virginica
      5.9
      2.3
      6.8
      3.2
    
    
      144
      Iris-virginica
      5.7
      2.5
      6.7
      3.3
    
    
      145
      Iris-virginica
      5.2
      2.3
      6.7
      3.0
    
    
      146
      Iris-virginica
      5.0
      1.9
      6.3
      2.5
    
    
      147
      Iris-virginica
      5.2
      2.0
      6.5
      3.0
    
    
      148
      Iris-virginica
      5.4
      2.3
      6.2
      3.4
    
    
      149
      Iris-virginica
      5.1
      1.8
      5.9
      3.0
    
  

150 rows × 5 columns



In [5]:

    
# Adapted from: https://seaborn.pydata.org/examples/scatterplot_matrix.html

%matplotlib inline
sns.pairplot(df, hue="Iris Class")









    Out[5]:





<seaborn.axisgrid.PairGrid at 0x198449fe668>

Fit a line



In [6]:

    
# Reset size after seaborn
plt.rcParams['figure.figsize'] = (14.0, 6.0)

# https://github.com/emerging-technologies/emerging-technologies.github.io/blob/master/notebooks/simple-linear-regression.ipynb
# Calculate the best values for m and c.
m, c = np.polyfit(petal_length, petal_width, 1)

# Plot Setosa measurements 
plt.scatter(petal_length, petal_width,marker='o', label='Data Set')

# Plot best fit line 
plt.plot(petal_length, m * petal_length + c, 'forestgreen', label='Best fit line')

# Add title
plt.title('Iris Data Set: Petal Measurements', fontsize=16)

# Add labels
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')
plt.legend()

# Print graph
plt.show()

Calculate the R-squared value



In [7]:

    
# Calculate the R-squared value for our data set using numpy.
np.corrcoef(petal_length, petal_width)[0][1]**2









    Out[7]:





0.92690122792200302

Fit another line



In [8]:

    
# https://stackoverflow.com/questions/27947487/is-zip-the-most-efficient-way-to-combine-arrays-with-respect-to-memory-in-nump
# Combine arrays
iris_data = np.column_stack((sepal_length, sepal_width, petal_length, petal_width,iris_class))

# https://docs.scipy.org/doc/numpy/reference/generated/numpy.in1d.html
# Filter Data with 'Iris-setosa' & transpose after
filter_setosa = (iris_data[np.in1d(iris_data[:,4],'Iris-setosa')]).transpose()

# https://stackoverflow.com/questions/3877491/deleting-rows-in-numpy-array
# https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.chararray.astype.html
# Prepare data - delete row of unnecessary data and cover types to float
setosa_data = (np.delete(filter_setosa, (4), axis=0)).astype(np.float)
setosa_data









    Out[8]:





array([[ 5.1,  4.9,  4.7,  4.6,  5. ,  5.4,  4.6,  5. ,  4.4,  4.9,  5.4,
         4.8,  4.8,  4.3,  5.8,  5.7,  5.4,  5.1,  5.7,  5.1,  5.4,  5.1,
         4.6,  5.1,  4.8,  5. ,  5. ,  5.2,  5.2,  4.7,  4.8,  5.4,  5.2,
         5.5,  4.9,  5. ,  5.5,  4.9,  4.4,  5.1,  5. ,  4.5,  4.4,  5. ,
         5.1,  4.8,  5.1,  4.6,  5.3,  5. ],
       [ 3.5,  3. ,  3.2,  3.1,  3.6,  3.9,  3.4,  3.4,  2.9,  3.1,  3.7,
         3.4,  3. ,  3. ,  4. ,  4.4,  3.9,  3.5,  3.8,  3.8,  3.4,  3.7,
         3.6,  3.3,  3.4,  3. ,  3.4,  3.5,  3.4,  3.2,  3.1,  3.4,  4.1,
         4.2,  3.1,  3.2,  3.5,  3.1,  3. ,  3.4,  3.5,  2.3,  3.2,  3.5,
         3.8,  3. ,  3.8,  3.2,  3.7,  3.3],
       [ 1.4,  1.4,  1.3,  1.5,  1.4,  1.7,  1.4,  1.5,  1.4,  1.5,  1.5,
         1.6,  1.4,  1.1,  1.2,  1.5,  1.3,  1.4,  1.7,  1.5,  1.7,  1.5,
         1. ,  1.7,  1.9,  1.6,  1.6,  1.5,  1.4,  1.6,  1.6,  1.5,  1.5,
         1.4,  1.5,  1.2,  1.3,  1.5,  1.3,  1.5,  1.3,  1.3,  1.3,  1.6,
         1.9,  1.4,  1.6,  1.4,  1.5,  1.4],
       [ 0.2,  0.2,  0.2,  0.2,  0.2,  0.4,  0.3,  0.2,  0.2,  0.1,  0.2,
         0.2,  0.1,  0.1,  0.2,  0.4,  0.4,  0.3,  0.3,  0.3,  0.2,  0.4,
         0.2,  0.5,  0.2,  0.2,  0.4,  0.2,  0.2,  0.2,  0.2,  0.4,  0.1,
         0.2,  0.1,  0.2,  0.2,  0.1,  0.2,  0.2,  0.3,  0.3,  0.2,  0.6,
         0.4,  0.3,  0.2,  0.2,  0.2,  0.2]])



In [15]:

    
# https://github.com/emerging-technologies/emerging-technologies.github.io/blob/master/notebooks/simple-linear-regression.ipynb
# Calculate the best values for m and c.
m, c = np.polyfit(setosa_data[2], setosa_data[3], 1)

# Plot Setosa measurements 
plt.scatter(setosa_data[2],setosa_data[3],marker='o', label='Iris Setosa')

# Plot best fit line 
plt.plot(setosa_data[2], m * setosa_data[2] + c, 'forestgreen', label='Best fit line')

# Add title
plt.title('Iris Setosa: Petal Measurements', fontsize=16)

# Add labels
plt.xlabel('Petal Length', fontsize=14)
plt.ylabel('Petal Width', fontsize=14)

plt.legend()
# Print graph
plt.show()

Calculate the R-squared value



In [10]:

    
# Calculate the R-squared value for the Setosa data using numpy.
np.corrcoef(setosa_data[2], setosa_data[3])[0][1]**2









    Out[10]:





0.09382472022283582

Use gradient descent

Gradient Descent is an approximation technique. To utilize this approximation technique, we guess the value that we wish to approximate and iteratively improve that guess.



In [11]:

    
# Calculate the partial derivative of cost with respect to m while treating c as a constant.
def gradient_descent_m(x, y, m, c):
  return -2.0 * np.sum(x * (y - m * x - c))

# Calculate the partial derivative of cost with respect to c while treating m as a constant.
def gradient_descent_c(x, y, m , c):
  return -2.0 * np.sum(y - m * x - c)



In [12]:

    
eta = 0.0001
g_m, g_c = 1.0, 1.0
change = True

# Iterate the partial derivatives until the outcomes do not change
while change:
  g_m_new = g_m - eta * gradient_descent_m(setosa_data[2], setosa_data[3], g_m, g_c)
  g_c_new = g_c - eta * gradient_descent_c(setosa_data[2], setosa_data[3], g_m, g_c)
  if g_m == g_m_new and g_c == g_c_new:
    change = False
  else:
    g_m, g_c = g_m_new, g_c_new

To the human eye it is difficult to see a difference between the best fit line and the best fit line approximated by gradient descent.



In [16]:

    
# Plot Setosa measurements 
plt.scatter(setosa_data[2],setosa_data[3],marker='o', label='Iris Setosa')

# Plot best fit line according to Gradient Descent
plt.plot(setosa_data[2], g_m * setosa_data[2] + g_c, 'forestgreen', label='Best fit line: Gradient Descent')

# Add title
plt.title('Iris Setosa: Petal Measurements', fontsize=16)

# Add labels
plt.xlabel('Petal Length')
plt.ylabel('Petal Width')

plt.legend()
# Print graph
plt.show()

However, the results from the two techniques are in fact different. With both results for m and c differing after the eleventh decimal point, the gradient descent technique did manage to approximate adequate results; although inexact and inaccurate.



In [17]:

    
print("BEST LINE:  m: %20.16f  c: %20.16f" % (m, c))
print()
print("GRADIENT DESCENT:  m: %20.16f  c: %20.16f" % (g_m, g_c))









    



BEST LINE:  m:   0.1892624728850328  c:  -0.0330802603036879

GRADIENT DESCENT:  m:   0.1892624728849683  c:  -0.0330802603035933

	Iris Class	Petal Length	Petal Width	Sepal Length	Sepal Width
0	Iris-setosa	1.4	0.2	5.1	3.5
1	Iris-setosa	1.4	0.2	4.9	3.0
2	Iris-setosa	1.3	0.2	4.7	3.2
3	Iris-setosa	1.5	0.2	4.6	3.1
4	Iris-setosa	1.4	0.2	5.0	3.6
5	Iris-setosa	1.7	0.4	5.4	3.9
6	Iris-setosa	1.4	0.3	4.6	3.4
7	Iris-setosa	1.5	0.2	5.0	3.4
8	Iris-setosa	1.4	0.2	4.4	2.9
9	Iris-setosa	1.5	0.1	4.9	3.1
10	Iris-setosa	1.5	0.2	5.4	3.7
11	Iris-setosa	1.6	0.2	4.8	3.4
12	Iris-setosa	1.4	0.1	4.8	3.0
13	Iris-setosa	1.1	0.1	4.3	3.0
14	Iris-setosa	1.2	0.2	5.8	4.0
15	Iris-setosa	1.5	0.4	5.7	4.4
16	Iris-setosa	1.3	0.4	5.4	3.9
17	Iris-setosa	1.4	0.3	5.1	3.5
18	Iris-setosa	1.7	0.3	5.7	3.8
19	Iris-setosa	1.5	0.3	5.1	3.8
20	Iris-setosa	1.7	0.2	5.4	3.4
21	Iris-setosa	1.5	0.4	5.1	3.7
22	Iris-setosa	1.0	0.2	4.6	3.6
23	Iris-setosa	1.7	0.5	5.1	3.3
24	Iris-setosa	1.9	0.2	4.8	3.4
25	Iris-setosa	1.6	0.2	5.0	3.0
26	Iris-setosa	1.6	0.4	5.0	3.4
27	Iris-setosa	1.5	0.2	5.2	3.5
28	Iris-setosa	1.4	0.2	5.2	3.4
29	Iris-setosa	1.6	0.2	4.7	3.2
...	...	...	...	...	...
120	Iris-virginica	5.7	2.3	6.9	3.2
121	Iris-virginica	4.9	2.0	5.6	2.8
122	Iris-virginica	6.7	2.0	7.7	2.8
123	Iris-virginica	4.9	1.8	6.3	2.7
124	Iris-virginica	5.7	2.1	6.7	3.3
125	Iris-virginica	6.0	1.8	7.2	3.2
126	Iris-virginica	4.8	1.8	6.2	2.8
127	Iris-virginica	4.9	1.8	6.1	3.0
128	Iris-virginica	5.6	2.1	6.4	2.8
129	Iris-virginica	5.8	1.6	7.2	3.0
130	Iris-virginica	6.1	1.9	7.4	2.8
131	Iris-virginica	6.4	2.0	7.9	3.8
132	Iris-virginica	5.6	2.2	6.4	2.8
133	Iris-virginica	5.1	1.5	6.3	2.8
134	Iris-virginica	5.6	1.4	6.1	2.6
135	Iris-virginica	6.1	2.3	7.7	3.0
136	Iris-virginica	5.6	2.4	6.3	3.4
137	Iris-virginica	5.5	1.8	6.4	3.1
138	Iris-virginica	4.8	1.8	6.0	3.0
139	Iris-virginica	5.4	2.1	6.9	3.1
140	Iris-virginica	5.6	2.4	6.7	3.1
141	Iris-virginica	5.1	2.3	6.9	3.1
142	Iris-virginica	5.1	1.9	5.8	2.7
143	Iris-virginica	5.9	2.3	6.8	3.2
144	Iris-virginica	5.7	2.5	6.7	3.3
145	Iris-virginica	5.2	2.3	6.7	3.0
146	Iris-virginica	5.0	1.9	6.3	2.5
147	Iris-virginica	5.2	2.0	6.5	3.0
148	Iris-virginica	5.4	2.3	6.2	3.4
149	Iris-virginica	5.1	1.8	5.9	3.0