Imports


In [1]:
import (
    "fmt"
    "github.com/sjwhitworth/golearn/base"
    "github.com/sjwhitworth/golearn/evaluation"
    "github.com/sjwhitworth/golearn/knn"
)

Problem/Data Description

<img style="float: left;", src="iris.jpg">

Information about this dataset comes from here

Data Set Information:

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Predicted attribute: class of iris plant.

Attribute Information:

  1. sepal length in cm
  2. sepal width in cm
  3. petal length in cm
  4. petal width in cm
  5. class: -- Iris Setosa -- Iris Versicolour -- Iris Virginica

Building a Pattern Recognition Model

This example model comes from the golearn documentation.


In [2]:
// Load in a dataset, with headers. Header attributes will be stored.
// Think of instances as a Data Frame structure in R or Pandas.
// You can also create instances from scratch.
rawData, err := base.ParseCSVToInstances("iris.csv", false)

//Initialises a new KNN classifier
cls := knn.NewKnnClassifier("euclidean", 2)

//Do a training-test split
trainData, testData := base.InstancesTrainTestSplit(rawData, 0.50)
cls.Fit(trainData)

//Calculates the Euclidean distance and returns the most popular label
predictions := cls.Predict(testData)

// Calculate precision/recall metrics, and summarize results
confusionMat, err := evaluation.GetConfusionMatrix(testData, predictions)
fmt.Println(evaluation.GetSummary(confusionMat))


Out[2]:
Optimisations are switched off
KNN: 1.14 % done
KNN: 2.27 % done
KNN: 3.41 % done
KNN: 4.55 % done
KNN: 5.68 % done
KNN: 6.82 % done
KNN: 7.95 % done
KNN: 9.09 % done
KNN: 10.23 % done
KNN: 11.36 % done
KNN: 12.50 % done
KNN: 13.64 % done
KNN: 14.77 % done
KNN: 15.91 % done
KNN: 17.05 % done
KNN: 18.18 % done
KNN: 19.32 % done
KNN: 20.45 % done
KNN: 21.59 % done
KNN: 22.73 % done
KNN: 23.86 % done
KNN: 25.00 % done
KNN: 26.14 % done
KNN: 27.27 % done
KNN: 28.41 % done
KNN: 29.55 % done
KNN: 30.68 % done
KNN: 31.82 % done
KNN: 32.95 % done
KNN: 34.09 % done
KNN: 35.23 % done
KNN: 36.36 % done
KNN: 37.50 % done
KNN: 38.64 % done
KNN: 39.77 % done
KNN: 40.91 % done
KNN: 42.05 % done
KNN: 43.18 % done
KNN: 44.32 % done
KNN: 45.45 % done
KNN: 46.59 % done
KNN: 47.73 % done
KNN: 48.86 % done
KNN: 50.00 % done
KNN: 51.14 % done
KNN: 52.27 % done
KNN: 53.41 % done
KNN: 54.55 % done
KNN: 55.68 % done
KNN: 56.82 % done
KNN: 57.95 % done
KNN: 59.09 % done
KNN: 60.23 % done
KNN: 61.36 % done
KNN: 62.50 % done
KNN: 63.64 % done
KNN: 64.77 % done
KNN: 65.91 % done
KNN: 67.05 % done
KNN: 68.18 % done
KNN: 69.32 % done
KNN: 70.45 % done
KNN: 71.59 % done
KNN: 72.73 % done
KNN: 73.86 % done
KNN: 75.00 % done
KNN: 76.14 % done
KNN: 77.27 % done
KNN: 78.41 % done
KNN: 79.55 % done
KNN: 80.68 % done
KNN: 81.82 % done
KNN: 82.95 % done
KNN: 84.09 % done
KNN: 85.23 % done
KNN: 86.36 % done
KNN: 87.50 % done
KNN: 88.64 % done
KNN: 89.77 % done
KNN: 90.91 % done
KNN: 92.05 % done
KNN: 93.18 % done
KNN: 94.32 % done
KNN: 95.45 % done
KNN: 96.59 % done
KNN: 97.73 % done
KNN: 98.86 % done
Reference Class	True Positives	False Positives	True Negatives	Precision	Recall	F1 Score
---------------	--------------	---------------	--------------	---------	------	--------
Iris-setosa	30		0		58		1.0000		1.0000	1.0000
Iris-virginica	28		1		58		0.9655		0.9655	0.9655
Iris-versicolor	28		1		58		0.9655		0.9655	0.9655
Overall accuracy: 0.9773