In Segmentation: KNN
, we perform KNN classification of pixels as crop or non-crop. One parameter in the KNN classifier is the number of neighbors (the K in KNN). To determine what value this parameter should be, we perform cross-validation and pick the k that corresponds to the highest accuracy score. In this notebook, we demonstrate that cross-validation, using the training data X (values) and y (classifications) that was generated in Segmentation: KNN
. The k value is then fed back into Segmentation: KNN
to create the KNN Classifier that is used to predict pixel crop/non-crop designation.
In this notebook, we find that increasing the number of neighbors from 3 to 9 increases accuracy only marginally, while it also increases run time. Therefore, we will use the smallest number of neighbors: 3.
In [1]:
from __future__ import print_function
import os
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import classification_report
from sklearn.neighbors import KNeighborsClassifier as KNN
First we load the data that was saved in Segmentation: KNN
In [2]:
# Load data
def load_cross_val_data(datafile):
npzfile = np.load(datafile)
X = npzfile['X']
y = npzfile['y']
return X,y
datafile = os.path.join('data', 'knn_cross_val', 'xy_file.npz')
X, y = load_cross_val_data(datafile)
Next, we perform a grid search over the number of neighbors, looking for the value that corresponds to the highest accuracy.
In [ ]:
tuned_parameters = {'n_neighbors': range(3,11,2)}
clf = GridSearchCV(KNN(n_neighbors=3),
tuned_parameters,
cv=3,
verbose=10)
clf.fit(X, y)
print("Best parameters set found on development set:\n")
print(clf.best_params_)
print("Grid scores on development set:\n")
means = clf.cv_results_['mean_test_score']
stds = clf.cv_results_['std_test_score']
res_params = clf.cv_results_['params']
for mean, std, params in zip(means, stds, res_params):
print("%0.3f (+/-%0.03f) for %r"
% (mean, std * 2, params))
It turns out that increasing the number of neighbors from 3 to 9 increases accuracy only marginally, while it also increases run time. Therefore, we will use the smallest number of neighbors: 3.
In [ ]: